AI Transcription

Features

AI Transcription

Real-time speech-to-text transcription powered by AI. Convert meetings, voice chats, and audio content into searchable text automatically.

5 min readUpdated December 2025

Overview

Jyv Desktop's AI transcription feature automatically converts speech to text in real-time. Perfect for meetings, interviews, podcasts, lectures, and any scenario where you need accurate transcripts without manual note-taking.

Local Processing: All transcription happens on your computer using AI. Your audio never leaves your device unless you explicitly share transcripts.

Key Features

  • Real-Time Transcription: Live captions with <1 second delay
  • 95%+ Accuracy: Industry-leading speech recognition
  • 90+ Languages: Support for major world languages
  • Speaker Identification: Automatically detect and label different speakers
  • Punctuation & Formatting: Intelligent capitalization and punctuation
  • Searchable Archive: Find past conversations instantly
  • Export Options: TXT, DOCX, PDF, SRT, VTT formats

Features

Real-Time Transcription

Watch your words appear as you speak with minimal lag:

  • Live Display: Floating overlay shows real-time captions
  • Auto-Correction: AI refines transcripts as context becomes clear
  • Confidence Scoring: See which words may need manual review
  • Instant Search: Find specific topics during live conversations

Speaker Diarization

Automatically identify and label different speakers in conversations:

Speaker Detection Example
{
  "transcription": {
    "speakers": [
      {
        "id": "speaker_1",
        "label": "John (You)",
        "voiceprint": "...",
        "color": "#4dabf7"
      },
      {
        "id": "speaker_2",
        "label": "Sarah",
        "voiceprint": "...",
        "color": "#51cf66"
      }
    ],
    "autoDetect": true,
    "maxSpeakers": 10
  }
}
Jyv Desktop learns to recognize frequent conversation partners and automatically labels them in future transcripts.

Smart Punctuation

AI adds proper punctuation, capitalization, and formatting:

  • Sentence Detection: Automatic periods, commas, question marks
  • Capitalization: Proper names, sentence starts, acronyms
  • Paragraph Breaks: Detect topic changes and speaker switches
  • Numbers & Dates: Convert spoken numbers to digits
  • Filler Word Removal: Optionally remove "um", "uh", "like"

Live Captions

Display real-time subtitles on screen:

Caption Display Options:

  • Floating Overlay: Transparent window anywhere on screen
  • Bottom Bar: Classic subtitle-style display
  • Notification Style: Toast notifications for key points
  • OBS Integration: Send captions directly to streaming software

Transcript History

All transcripts are automatically saved and searchable:

  • Unlimited Storage: Keep all transcripts (configurable retention)
  • Full-Text Search: Find specific words or phrases instantly
  • Date/Time Filtering: Browse by meeting date and time
  • Tagging System: Organize transcripts with custom tags
  • Favorites: Mark important conversations

Setup & Configuration

  1. Enable Transcription

    Open Jyv Desktop → Settings → Features → Transcription

    Toggle "Enable AI Transcription" to ON.

    First-time setup downloads the AI model (~500MB). Subsequent transcriptions are instant.
  2. Select Language

    Choose your primary language from the dropdown:

    • Auto-Detect: Automatically identify language being spoken
    • Single Language: Better accuracy if you always speak the same language
    • Multi-Language: Detect and switch between languages mid-conversation
  3. Configure Speaker Detection

    Enable "Identify Speakers" to distinguish between different voices.

    Train the system by speaking for 10-15 seconds to create your voiceprint.

    Speaker Configuration
    {
      "speakerDetection": {
        "enabled": true,
        "minSpeakers": 1,
        "maxSpeakers": 10,
        "autoLabel": true,
        "trainOnFirst": true        // Create voiceprint from first words
      }
    }
  4. Set Quality vs. Performance

    Choose transcription quality level:

    • Fast Mode: 90% accuracy, low CPU, real-time
    • Balanced: 95% accuracy, moderate CPU (recommended)
    • Accurate: 98% accuracy, high CPU, slight delay
  5. Enable Live Captions (Optional)

    Toggle "Show Live Captions" to display text on screen.

    Customize appearance, position, and size in caption settings.

Language Support

Jyv Desktop supports 90+ languages with varying levels of accuracy:

Tier 1: 98%+ Accuracy

  • English (US, UK, AU)
  • Spanish (ES, LATAM)
  • French
  • German
  • Italian
  • Portuguese (BR, PT)
  • Mandarin Chinese
  • Japanese

Tier 2: 95%+ Accuracy

  • Korean
  • Russian
  • Dutch
  • Polish
  • Turkish
  • Swedish
  • Arabic
  • Hindi

Tier 3: 90%+ Accuracy

  • Czech
  • Danish
  • Finnish
  • Greek
  • Hebrew
  • Norwegian
  • Thai
  • + 70 more languages
Multi-Language Mode: Enable auto-detection to transcribe conversations where people speak multiple languages. AI will detect language switches automatically.

Use Cases

Meeting Transcription

Scenario: Capture every detail from Zoom/Teams meetings

Setup:

  • Enable transcription for Zoom/Teams in meeting profile
  • Turn on speaker detection to identify participants
  • Auto-export transcript at meeting end
  • Share via email or save to note-taking app
Meeting Transcription Config
{
  "meetingTranscription": {
    "autoStart": true,
    "speakerDetection": true,
    "removeFillerswords": true,
    "highlightActionItems": true,
    "autoSummarize": true,
    "exportFormat": ["docx", "pdf"]
  }
}

Content Creation

Scenario: Transcribe podcast interviews and YouTube videos

Setup:

  • High accuracy mode for clean transcripts
  • Speaker labels for host and guests
  • Timestamp markers every 30 seconds
  • Export as SRT/VTT for video subtitles

Accessibility

Scenario: Live captions for deaf/hard-of-hearing users

Setup:

  • Enable floating caption window
  • Large font size for readability
  • High contrast color scheme
  • Fast mode for minimal delay
  • Send captions to second monitor

Learning & Education

Scenario: Transcribe lectures and study sessions

Setup:

  • Auto-save all transcripts with tags
  • Enable keyword highlighting (key terms)
  • Create summary notes automatically
  • Export to Notion, OneNote, or Evernote

Export & Integration

Export Formats

Export transcripts in multiple formats:

  • Plain Text (.txt): Simple, compatible everywhere
  • Microsoft Word (.docx): Formatted with speakers and timestamps
  • PDF: Professional, shareable format
  • SRT/VTT: Video subtitle formats for editing software
  • JSON: Raw data with metadata for developers
  • Markdown: For documentation and note-taking apps

Integration with Note Apps

Automatically send transcripts to your favorite apps:

Integration Settings
{
  "integrations": {
    "notion": {
      "enabled": true,
      "autoSend": true,
      "database": "Meeting Notes"
    },
    "onenote": {
      "enabled": true,
      "notebook": "Work Notes",
      "section": "Meetings"
    },
    "obsidian": {
      "enabled": true,
      "vault": "Personal",
      "folder": "Transcripts"
    }
  }
}

API Access

Developers can access transcripts via REST API:

API Example
// Fetch recent transcripts
const response = await fetch('http://localhost:8080/api/transcripts', {
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' }
});

const transcripts = await response.json();

// Get specific transcript
const transcript = await fetch(
  'http://localhost:8080/api/transcripts/abc123'
);

const data = await transcript.json();
console.log(data.text);       // Full transcript text
console.log(data.speakers);   // Speaker information
console.log(data.timestamps); // Word-level timestamps

Privacy & Security

100% Local Processing: All transcription happens on your computer. Your audio and transcripts never leave your device unless you explicitly export them.

Data Storage

  • Encrypted Database: Transcripts stored in encrypted SQLite database
  • Local Only: No cloud storage or external servers
  • Configurable Retention: Auto-delete transcripts after X days
  • Manual Deletion: Delete individual transcripts anytime

Security Features

Privacy Settings
{
  "privacy": {
    "encryptTranscripts": true,
    "encryptionKey": "user-password-derived",
    "retentionDays": 90,           // Auto-delete after 90 days
    "excludeApps": ["signal.exe"], // Never transcribe these apps
    "pauseDuringPrivacy": true,    // Stop when screen locked
    "redactSensitive": true        // Hide credit cards, SSNs, etc.
  }
}

Troubleshooting

Transcription Not Starting

Solution:

  1. Check Feature Status

    Verify transcription is enabled in Settings → Features → Transcription

  2. Verify AI Model Download

    Ensure AI model is fully downloaded (Settings shows "Ready" status)

  3. Check Microphone Input

    Confirm audio is being received in Jyv Desktop's audio meter

  4. Restart Transcription Engine

    Settings → Transcription → Restart Engine

Poor Transcription Accuracy

Cause: Background noise, poor microphone, or wrong language

Solution:

  • Enable Accurate Mode for better quality
  • Ensure Noise Suppression is active
  • Verify correct language is selected
  • Use better microphone or improve room acoustics
  • Speak clearly and at consistent volume

Speaker Detection Not Working

Solution:

  • Train your voiceprint in transcription settings
  • Increase minimum speaker detection confidence
  • Manually label speakers after transcription
  • Ensure speakers have distinct voices (not too similar)
  • Reduce background noise for better voice separation

High CPU Usage

Solution:

  • Switch from Accurate to Fast or Balanced mode
  • Enable GPU Acceleration if available
  • Disable speaker detection if not needed
  • Close other resource-intensive applications
  • Reduce transcription sample rate (24kHz instead of 48kHz)
For transcription issues, check our Transcription Troubleshooting Guide or contact support.

Need more help?

Can't find what you're looking for? Our support team is here to help.