How to Transcribe a Discord Voice Channel or Stage

Discord has no native recording — and that's the right starting point for this guide. Voice channels and Stages are conversational, multi-speaker, and routinely have people in them who have no idea anyone might be recording. This guide covers how to do it the safe way (multi-track, with explicit consent), turn the result into clean structured notes, and stay on the right side of Discord's Community Guidelines and applicable two-party-consent laws.

What AudioToNotes returns from a Discord recording

Once you've captured the audio, AudioToNotes returns:

A 3-sentence executive summary at the top.
A decisions list and a separately delegated action-items list — useful for community-of-practice servers, learning groups, and DAO calls.
A structured outline with H2 headers grouped by topic shift.
A diarized full transcript that separates each speaker (Craig multi-track produces particularly clean diarization).
One-click exports to Markdown, Notion, Google Docs, and Anki.

A 90-minute conversation usually processes in 3–5 minutes.

Consent comes before everything

Before any tool, before any bot:

Tell the channel. Discord's Community Guidelines and many countries' wiretap laws make secret recording of conversations a serious problem. Post in-channel that you're going to record, give people a chance to leave, and pin a message to the channel description so late-joiners see it.
Two-party-consent jurisdictions (most of the EU, California, Florida, Illinois, Maryland, Massachusetts, Montana, New Hampshire, Pennsylvania, Washington — check your local rules before recording across borders) require every participant's affirmative consent, not just notification.
Don't record DMs without the other person's explicit permission. It's both a legal and a trust problem.

Communities that publish regular podcasts from Discord conversations make this easy: a pinned message saying "this server is recorded for podcast publication; muted-by-default users won't be captured" plus an emoji-reaction opt-in.

Recording approach 1: Craig (multi-track bot)

Craig is the most-used Discord recording bot. It records each speaker on a separate track — which produces dramatically cleaner diarization than a single mixed file, and lets AudioToNotes label speakers correctly without guessing.

Craig + AudioToNotes workflow

Invite Craig to your server
Go to craig.chat → Invite → grant Connect, Speak, View Channel, and Manage Roles on the target voice channel, or scope tighter via a dedicated role.
Announce the recording
Post in the text channel that the next session will be recorded, pin the message, and tag everyone. Give people 5+ minutes to opt out.
Run /join in the voice channel
Type /join. Craig joins and starts recording. It posts a private link to whoever ran the command.
Run /stop when done
Craig ends recording. The download link expires in 7 days, or longer for paid Craig tiers.
Download the multi-track FLAC or MP3
Pick the single-file mixed download for quickest upload, or the per-speaker zip if you want maximum diarization quality.
Upload to AudioToNotes
Drop the mixed file in. Multi-track speakers are auto-detected; the per-speaker zip can be uploaded as individual files for explicit per-track labelling.

Recording approach 2: OBS or QuickTime (single recorder)

If a bot isn't an option (small private server, no admin permission to invite Craig), record your own outgoing audio via a desktop tool:

macOS: install a loopback like BlackHole, set it as the system audio output, and use QuickTime → File → New Audio Recording with BlackHole as the input. Only record yourself and audio the other participants explicitly consent to being captured.
Windows: Voicemeeter Banana + OBS Studio's "Audio Output Capture" source.
Both platforms: OBS Studio's audio-only profile can record to a single MP3.

Single-track recordings produce a single audio file with everyone mixed together. Diarization still works but accuracy on rapid speaker turns is lower than Craig's multi-track output.

Recording approach 3: Discord Stage native (channel-specific)

Some Discord Stages support native recording for the host. If the server uses Stages for AMAs, town halls, or community calls, the host may have a built-in "Record" toggle. Output is a single MP4. Upload that to AudioToNotes.

Get the file ready for AudioToNotes

Craig: download either the mixed .mp3 (quickest) or the per-speaker .flac.zip. Mixed is easier; per-speaker is best when you have 5+ speakers.
OBS / QuickTime: export as .mp3 or .wav (default is fine).
Stage native: download the MP4 from the channel's recording link.

Drop any of those into AudioToNotes. The browser strips audio from MP4 client-side before upload to save bandwidth.

Privacy and platform policy

AudioToNotes does not join your Discord server. There is no bot account, no OAuth flow, no permissions to grant on our end. You record on Discord (or via your desktop), download the file, and upload it.
Customer audio is not used for training. Your community conversation is not surfaced in any model output.
Encrypted in transit. Files travel over TLS to processing.
You control retention. Delete the upload immediately after processing, or keep it in your account for later reprocessing.

Common pitfalls

The bot dropped mid-call. Craig occasionally disconnects on flaky Discord voice gateways. Restart with /join; the recording will be split across two files — upload them sequentially to AudioToNotes.
One speaker is missing from the transcript. They were probably muted server-side or running push-to-talk and didn't talk. Check the multi-track download — each speaker has their own file.
Background music is being transcribed as words. Stage / voice channels with background music produce false transcript content. Mute music during the parts you care about.
Server admins can't see Craig. Check the role assignment — Craig needs View Channel, Connect, Speak, and (for the slash command) Manage Roles on the target channel.

Use cases AudioToNotes handles well from Discord

DAO governance calls → structured decisions list paired with the per-speaker transcript for accountability.
Open-source SIG / WG calls → topic-grouped outline plus action items you paste into a GitHub Discussion.
Community-of-practice book club / study groups → flashcards generated for whatever was being studied.
Twitch / YouTube podcast recorded inside a Discord stage → show notes plus repurposed Twitter thread per episode.

FAQ

Can AudioToNotes connect to Discord and record live? No. We don't run a recording bot. The right tool inside Discord is Craig (or Discord's own Stage recording where available) — we process the file Craig hands you.

Does Discord allow recording at all? Discord's Terms of Service and Community Guidelines forbid recording without participants' consent. Recording with explicit consent is fine, and the dominant pattern for podcast-style servers is to make consent a pinned condition of joining.

My recording is just one mixed track — will diarization still work? Yes. Whisper-class speech models do reasonable speaker separation on mixed audio. Per-speaker Craig recordings are sharper, but mixed is fine for most conversations under 6 people.

Can I transcribe a 6-hour podcast recording session? Up to about 4 hours processes in one go. For longer, split the file in a free desktop editor (Audacity / Ocenaudio) and upload each segment.

Direct answer: how do I transcribe a Discord voice channel or Stage?

Disclose recording to every participant first. Use Craig (most common, multi-track) or a desktop loopback recorder (OBS, QuickTime with BlackHole on macOS) to capture the audio. Download the resulting MP3 or FLAC. Drop the file into AudioToNotes — you'll get a structured summary, decisions, action items, and a diarized transcript in minutes. We never join the server.

Ready for cleaner Discord community notes? Join the AudioToNotes waitlist for early access.

Built for your workflow

See how teams in your space use Discord transcripts day-to-day.

AI show notes for creators →

Turn any audio into notes

Join the AudioToNotes waitlist for early access — automatic notes, summaries, and flashcards from any recording.

Join the waitlist