AI Transcription
Real-time speech-to-text transcription powered by AI. Convert meetings, voice chats, and audio content into searchable text automatically.
Overview
Jyv Desktop's AI transcription feature automatically converts speech to text in real-time. Perfect for meetings, interviews, podcasts, lectures, and any scenario where you need accurate transcripts without manual note-taking.
Key Features
- Real-Time Transcription: Live captions with <1 second delay
- 95%+ Accuracy: Industry-leading speech recognition
- 90+ Languages: Support for major world languages
- Speaker Identification: Automatically detect and label different speakers
- Punctuation & Formatting: Intelligent capitalization and punctuation
- Searchable Archive: Find past conversations instantly
- Export Options: TXT, DOCX, PDF, SRT, VTT formats
Features
Real-Time Transcription
Watch your words appear as you speak with minimal lag:
- Live Display: Floating overlay shows real-time captions
- Auto-Correction: AI refines transcripts as context becomes clear
- Confidence Scoring: See which words may need manual review
- Instant Search: Find specific topics during live conversations
Speaker Diarization
Automatically identify and label different speakers in conversations:
{
"transcription": {
"speakers": [
{
"id": "speaker_1",
"label": "John (You)",
"voiceprint": "...",
"color": "#4dabf7"
},
{
"id": "speaker_2",
"label": "Sarah",
"voiceprint": "...",
"color": "#51cf66"
}
],
"autoDetect": true,
"maxSpeakers": 10
}
}Smart Punctuation
AI adds proper punctuation, capitalization, and formatting:
- Sentence Detection: Automatic periods, commas, question marks
- Capitalization: Proper names, sentence starts, acronyms
- Paragraph Breaks: Detect topic changes and speaker switches
- Numbers & Dates: Convert spoken numbers to digits
- Filler Word Removal: Optionally remove "um", "uh", "like"
Live Captions
Display real-time subtitles on screen:
Caption Display Options:
- Floating Overlay: Transparent window anywhere on screen
- Bottom Bar: Classic subtitle-style display
- Notification Style: Toast notifications for key points
- OBS Integration: Send captions directly to streaming software
Transcript History
All transcripts are automatically saved and searchable:
- Unlimited Storage: Keep all transcripts (configurable retention)
- Full-Text Search: Find specific words or phrases instantly
- Date/Time Filtering: Browse by meeting date and time
- Tagging System: Organize transcripts with custom tags
- Favorites: Mark important conversations
Setup & Configuration
Enable Transcription
Open Jyv Desktop → Settings → Features → Transcription
Toggle "Enable AI Transcription" to ON.
First-time setup downloads the AI model (~500MB). Subsequent transcriptions are instant.Select Language
Choose your primary language from the dropdown:
- Auto-Detect: Automatically identify language being spoken
- Single Language: Better accuracy if you always speak the same language
- Multi-Language: Detect and switch between languages mid-conversation
Configure Speaker Detection
Enable "Identify Speakers" to distinguish between different voices.
Train the system by speaking for 10-15 seconds to create your voiceprint.
Speaker Configuration{ "speakerDetection": { "enabled": true, "minSpeakers": 1, "maxSpeakers": 10, "autoLabel": true, "trainOnFirst": true // Create voiceprint from first words } }Set Quality vs. Performance
Choose transcription quality level:
- Fast Mode: 90% accuracy, low CPU, real-time
- Balanced: 95% accuracy, moderate CPU (recommended)
- Accurate: 98% accuracy, high CPU, slight delay
Enable Live Captions (Optional)
Toggle "Show Live Captions" to display text on screen.
Customize appearance, position, and size in caption settings.
Language Support
Jyv Desktop supports 90+ languages with varying levels of accuracy:
Tier 1: 98%+ Accuracy
- English (US, UK, AU)
- Spanish (ES, LATAM)
- French
- German
- Italian
- Portuguese (BR, PT)
- Mandarin Chinese
- Japanese
Tier 2: 95%+ Accuracy
- Korean
- Russian
- Dutch
- Polish
- Turkish
- Swedish
- Arabic
- Hindi
Tier 3: 90%+ Accuracy
- Czech
- Danish
- Finnish
- Greek
- Hebrew
- Norwegian
- Thai
- + 70 more languages
Use Cases
Meeting Transcription
Scenario: Capture every detail from Zoom/Teams meetings
Setup:
- Enable transcription for Zoom/Teams in meeting profile
- Turn on speaker detection to identify participants
- Auto-export transcript at meeting end
- Share via email or save to note-taking app
{
"meetingTranscription": {
"autoStart": true,
"speakerDetection": true,
"removeFillerswords": true,
"highlightActionItems": true,
"autoSummarize": true,
"exportFormat": ["docx", "pdf"]
}
}Content Creation
Scenario: Transcribe podcast interviews and YouTube videos
Setup:
- High accuracy mode for clean transcripts
- Speaker labels for host and guests
- Timestamp markers every 30 seconds
- Export as SRT/VTT for video subtitles
Accessibility
Scenario: Live captions for deaf/hard-of-hearing users
Setup:
- Enable floating caption window
- Large font size for readability
- High contrast color scheme
- Fast mode for minimal delay
- Send captions to second monitor
Learning & Education
Scenario: Transcribe lectures and study sessions
Setup:
- Auto-save all transcripts with tags
- Enable keyword highlighting (key terms)
- Create summary notes automatically
- Export to Notion, OneNote, or Evernote
Export & Integration
Export Formats
Export transcripts in multiple formats:
- Plain Text (.txt): Simple, compatible everywhere
- Microsoft Word (.docx): Formatted with speakers and timestamps
- PDF: Professional, shareable format
- SRT/VTT: Video subtitle formats for editing software
- JSON: Raw data with metadata for developers
- Markdown: For documentation and note-taking apps
Integration with Note Apps
Automatically send transcripts to your favorite apps:
{
"integrations": {
"notion": {
"enabled": true,
"autoSend": true,
"database": "Meeting Notes"
},
"onenote": {
"enabled": true,
"notebook": "Work Notes",
"section": "Meetings"
},
"obsidian": {
"enabled": true,
"vault": "Personal",
"folder": "Transcripts"
}
}
}API Access
Developers can access transcripts via REST API:
// Fetch recent transcripts
const response = await fetch('http://localhost:8080/api/transcripts', {
headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});
const transcripts = await response.json();
// Get specific transcript
const transcript = await fetch(
'http://localhost:8080/api/transcripts/abc123'
);
const data = await transcript.json();
console.log(data.text); // Full transcript text
console.log(data.speakers); // Speaker information
console.log(data.timestamps); // Word-level timestampsPrivacy & Security
Data Storage
- Encrypted Database: Transcripts stored in encrypted SQLite database
- Local Only: No cloud storage or external servers
- Configurable Retention: Auto-delete transcripts after X days
- Manual Deletion: Delete individual transcripts anytime
Security Features
{
"privacy": {
"encryptTranscripts": true,
"encryptionKey": "user-password-derived",
"retentionDays": 90, // Auto-delete after 90 days
"excludeApps": ["signal.exe"], // Never transcribe these apps
"pauseDuringPrivacy": true, // Stop when screen locked
"redactSensitive": true // Hide credit cards, SSNs, etc.
}
}Troubleshooting
Transcription Not Starting
Solution:
Check Feature Status
Verify transcription is enabled in Settings → Features → Transcription
Verify AI Model Download
Ensure AI model is fully downloaded (Settings shows "Ready" status)
Check Microphone Input
Confirm audio is being received in Jyv Desktop's audio meter
Restart Transcription Engine
Settings → Transcription → Restart Engine
Poor Transcription Accuracy
Cause: Background noise, poor microphone, or wrong language
Solution:
- Enable Accurate Mode for better quality
- Ensure Noise Suppression is active
- Verify correct language is selected
- Use better microphone or improve room acoustics
- Speak clearly and at consistent volume
Speaker Detection Not Working
Solution:
- Train your voiceprint in transcription settings
- Increase minimum speaker detection confidence
- Manually label speakers after transcription
- Ensure speakers have distinct voices (not too similar)
- Reduce background noise for better voice separation
High CPU Usage
Solution:
- Switch from Accurate to Fast or Balanced mode
- Enable GPU Acceleration if available
- Disable speaker detection if not needed
- Close other resource-intensive applications
- Reduce transcription sample rate (24kHz instead of 48kHz)
Was this helpful?
Need more help?
Can't find what you're looking for? Our support team is here to help.