Audio extraction.
yt-dlp handles the media download path where supported.
The thing you do with videos also works for long audio: RSS feed in, speaker-aware transcript out, local corpus ready for your model.
Drop in a public podcast RSS URL. Uoink can fetch new episodes, run transcription, and file them beside your video library.
yt-dlp handles the media download path where supported.
Transcription runs on your machine. Choose speed or quality based on the model size.
WhisperX separates speakers as labels you can rename later.
Pull every factual claim a guest made and keep timestamp citations beside the transcript.
Map a competitor's interview circuit and search every mention of a product, market, or investor.
Build a private guest-claim corpus across a niche, then ask your model to compare positions.
Study how long-form interviewers frame questions, interrupts, topic shifts, and sponsor reads.
Search multiple shows for every mention of a company, product category, or technical term.
Turn lectures and seminar feeds into searchable notes you can cite later.
Expect roughly 10 to 15 minutes of compute per hour of audio on CPU, depending on model size and machine. That is the cost of keeping raw audio and transcripts local. Uoink runs the job in the background and files the corpus when it finishes.
Diarization separates speakers as Speaker 1, Speaker 2, and so on. You can rename them later when you know who is speaking.
No. CPU transcription works, but larger models take longer. Apple Silicon and dedicated GPUs can speed it up when available.
No. Feed polling, audio download, transcription, and corpus writing run from your machine.