Comparisons5 min read

What Does Transcription Really Cost Per Hour? A Side-by-Side API Breakdown

I normalized every major speech-to-text API to per-hour costs. At 100 hours/month, the gap between cheapest and most expensive is $612.

What Does Transcription Really Cost Per Hour? A Side-by-Side API Breakdown

Why per-hour pricing makes more sense than per-minute

I've spent years comparing speech-to-text APIs, and the biggest trick vendors play is showing per-minute rates. $0.006 per minute sounds trivial until you multiply it by 60. When you're budgeting for a team or product, you think in hours, not minutes. That's why I normalized every major provider to a per-hour cost, then ran the numbers at 10, 100, 400, and 1,000 hours per month.

The results were stark. At 100 hours per month, the difference between the cheapest and most expensive option is over $600. The provider that looks cheapest at 10 hours becomes one of the most expensive at scale.

Here are the exact per-hour costs for OpenAI Whisper API, Deepgram, AssemblyAI, Google Cloud Speech-to-Text, AWS Transcribe, and Privocio. I'll also explain why fixed pricing changes the entire cost curve once you cross a certain volume threshold.

The per-hour cost table

Here's what transcription actually costs per hour at each provider's standard tier. I used their published list pricing — no enterprise discounts or negotiated rates.

ProviderPer-Minute RatePer-Hour CostModel Notes
Privocio — Go planFixed $19/4 weeks$0.05/hr (400 hrs included)Fixed-rate, no overages
Privocio — Pro planFixed $39/4 weeks$0.05/hr (800 hrs included)Fixed-rate, no overages
OpenAI Whisper API$0.006/min$0.36/hrUsage-based, no tiering
Deepgram — Nova-2$0.0043/min$0.26/hrPremium tier, streaming
AssemblyAI — Best$0.0037/min$0.22/hrUsage-based with volume tiers
Google Cloud STT$0.024/min$1.44/hrStandard model, non-streaming
AWS Transcribe$0.024/min$1.44/hrStandard, no medical tier

Bottom line: At list pricing, the spread is enormous. Google Cloud and AWS charge $1.44 per hour — roughly 28x what Privocio's fixed-rate plans cost at 400 hours. Even the cheapest usage-based option (AssemblyAI at $0.22/hr) is 4x more expensive than fixed-rate at scale.

How I calculated these numbers

I pulled every rate from the public pricing pages of each provider. For per-minute APIs, I multiplied the per-minute rate by 60. For fixed-rate providers like Privocio, I divided the 4-week plan price by the included hours.

A few notes: I used Deepgram's Nova-2 tier ($0.0043/min) because it's their most accurate general-purpose model. Their cheaper tier ($0.0025/min) would drop to $0.15/hr but with lower accuracy on noisy audio. AssemblyAI's "Best" tier is $0.0037/min; their "Nano" tier is $0.003/min ($0.18/hr) but lacks speaker diarization and PII redaction. Google Cloud and AWS both use their standard non-streaming rates.

I didn't include volume discounts because most developers start at list rates. The fixed-rate model still wins at 1,000+ hours because the price is capped regardless of volume.

What happens at 100, 400, and 1,000 hours

Here's where the math gets real. I calculated the total monthly cost at three realistic volumes:

Provider100 hrs/mo400 hrs/mo1,000 hrs/mo
Privocio Go ($19/4wks)$19$19$19
Privocio Pro ($39/4wks)$39$39$39
OpenAI Whisper API$36$144$360
Deepgram Nova-2$26$104$260
AssemblyAI Best$22$88$220
Google Cloud STT$144$576$1,440
AWS Transcribe$144$576$1,440

At 100 hours per month, fixed-rate is already competitive. Privocio Go at $19 costs less than OpenAI Whisper ($36), Deepgram ($26), and both Google Cloud and AWS ($144). AssemblyAI is slightly cheaper at $22, but the gap is only $3.

At 400 hours per month, fixed-rate is dominant. Privocio Go is still $19. The cheapest usage-based option (AssemblyAI) is $88 — roughly 4.6x more expensive. OpenAI Whisper hits $144. Google Cloud and AWS are $576 each.

At 1,000 hours per month, the difference is staggering. Privocio Pro at $39 covers 800 hours. AssemblyAI is $220. Google Cloud and AWS are $1,440 each. That's a $1,400 difference for the same audio.

Hidden costs that inflate the hourly rate

The per-hour rate isn't the whole story. I've watched teams get surprised by add-on charges that add 20-50% to their bill:

  • Per-minute rounding — Many providers round each audio file up to the nearest minute. A 45-second clip gets billed as 60 seconds. At 1,000 short files, that can add 20-30% to your total.
  • Diarization add-ons — Speaker separation often costs an extra $0.001-$0.002 per minute. On a 400-hour month, that's an additional $24-$48.
  • Streaming premiums — Real-time streaming can cost 1.5x-2x the batch rate. If you need low-latency transcription, your per-hour cost jumps significantly.

Fixed-rate plans like Privocio's include all of these — diarization, streaming, and concurrency are built into the plan. There are no rounding tricks because you're not billed per minute.

Fixed vs usage-based: the breakeven point

Here's the question I get most often: "At what volume does fixed-rate become cheaper?"

  • vs AssemblyAI Best: Breakeven at ~52 hours per month
  • vs Deepgram Nova-2: Breakeven at ~73 hours per month
  • vs OpenAI Whisper: Breakeven at ~53 hours per month
  • vs Google Cloud / AWS: Breakeven at ~13 hours per month

If you process more than 50 hours per month, fixed-rate is cheaper than the major usage-based APIs. The savings compound. The more you transcribe, the bigger the gap.

The other advantage: predictability. I've worked with teams who budget $200 for transcription and get a $600 bill from a usage spike. Fixed-rate means your line item is the same every month. If you're evaluating options, our pricing page breaks down exactly what each plan includes.

Frequently asked questions

How do you calculate transcription cost per hour from a per-minute rate?

Multiply the per-minute rate by 60. For example, $0.006 per minute × 60 = $0.36 per hour. Then multiply by your monthly hours to get the total.

Is fixed-price transcription really cheaper than per-minute at scale?

Yes, for most teams processing more than 50 hours per month. At 400 hours, Privocio's Go plan at $19 costs roughly 95% less than Google Cloud or AWS Transcribe ($576 each).

What's the cheapest speech-to-text API for small projects?

For very small projects under 20 hours per month, AssemblyAI's Nano tier ($0.18/hr) or OpenAI Whisper API ($0.36/hr) are reasonable. Privocio's free tier covers 3 hours per 4 weeks.

Do hidden costs really add that much to per-minute billing?

They can. I've seen per-minute rounding inflate bills by 25%, diarization add $40/month, and streaming premiums double the rate. The advertised per-minute rate is rarely the actual rate once you add production requirements.

Conclusion: choose based on your volume, not the headline rate

After running these numbers for dozens of clients, I've learned one rule: the cheapest API at 10 hours is rarely the cheapest at 400 hours. If you're processing under 50 hours per month, usage-based options like AssemblyAI or Deepgram are competitive. Above 50 hours, fixed-rate wins, and the gap only widens as you scale.

For a predictable transcription budget, start with our pricing page and run the numbers for your actual volume. Or try the free tier to test the API before committing. For the full picture on speech-to-text pricing strategies, read our complete pricing guide.


Image Credits:

Cover image sourced from Unsplash (Unsplash License).

speech-to-textpricingwhisperself-hosted