
Speech-to-Text and GDPR: How to Transcribe Audio Without Breaking EU Privacy Law
After deploying transcription infrastructure for two EU-based legal clients and one financial services firm, I've learned that GDPR compliance isn't a checkbox — it's an architectural decision you make before you process a single byte of audio. Get it wrong and you face fines up to €20 million or 4% of global annual turnover, whichever is higher.
In our complete guide to private speech-to-text, we covered the full privacy-first transcription landscape. This guide focuses specifically on what GDPR means for speech-to-text APIs, what you must do before processing EU residents' audio, and how to architect a compliant pipeline from day one.
When GDPR Applies to Your Audio Data
GDPR applies whenever you process personal data of EU residents — regardless of where your company is based. Audio recordings containing speech are unambiguously personal data under Article 4(1) of GDPR, because voice patterns are unique identifiers that can directly or indirectly identify a natural person.
If you're transcribing audio from EU users, GDPR applies if:
- Your users are located in the EU when they record the audio
- You're offering goods or services to EU residents (even if your servers are in the US)
- You're monitoring behavior of EU residents
I've seen teams assume GDPR only applies if they have EU-based customers. That's wrong. If your AI agent handles a call from someone in Berlin, GDPR applies — full stop.
Three scenarios where GDPR applies to transcription pipelines:
1. Customer service calls — Every caller is a data subject; their speech is personal data
2. Medical or legal consultations — These contain special category data requiring explicit consent and additional safeguards
3. Employee monitoring or productivity tracking — Subject to additional restrictions under EU member state laws
If your transcription pipeline can reconstruct who spoke, the data is still personal. "Fully anonymized" is a high bar that most audio doesn't meet.
Lawful Basis: Your First and Most Critical Decision
Before you process a single audio file, you must establish a lawful basis for doing so. Article 6 of GDPR lists six possible bases; for speech-to-text, four are relevant:
| Lawful Basis | When It Fits Speech-to-Text | Practical Implication | |
|---|---|---|---|
| **Consent** | User explicitly agrees to recording + transcription | Must be freely given, specific, informed, unambiguous | |
| **Contract** | Transcription is necessary to perform a contract with the data subject | User must be a party to the contract | |
| **Legal obligation** | EU or member state law requires transcription (e.g., MiFID II call recording) | Must cite the specific law | |
| **Legitimate interests** | Your interest outweighs the user's rights (requires LIA) | Rarely recommended for audio — high bar |
Important: If your audio contains special category data (health, legal, biometric), you also need Article 9 compliance — explicit consent or another listed exception is required on top of your Article 6 basis.
Data Processing Agreements: What's Actually Required
If you're using a third-party speech-to-text API — AssemblyAI, Deepgram, or any other provider — you almost certainly need a Data Processing Agreement (DPA) with them.
Under GDPR, when you use a third party to process personal data on your behalf, that third party is a data processor and you must have a written DPA in place before processing begins.
A GDPR-compliant DPA must include (Article 28(3)):
- The subject matter and duration of processing
- The nature and purpose of processing
- The type of personal data and categories of data subjects
- The controller's obligations and rights
- Processor obligations: only process on documented instructions, confidentiality, security measures, sub-processor restrictions
What I've seen fail: Teams that signed their STT provider's standard Terms of Service without realizing it wasn't a valid DPA. The result was a GDPR violation discovered during a client audit.
For self-hosted transcription (where audio never leaves your infrastructure), you don't need a DPA with a third-party STT provider — because there is no third-party processor. This is one of the most underappreciated advantages of self-hosted Whisper deployments for EU compliance.
Right to Erasure: Why Audio Is Harder Than Text
Article 17 of GDPR gives data subjects the right to have their personal data erased. For text data, this is straightforward — delete the record, done. For audio and transcripts, it's significantly more complex.
The three-layer erasure problem:
1. Raw audio — Must be deleted from all storage tiers (hot, warm, cold, backups)
2. Transcripts — Often stored in databases or vector stores; must be purged from all replicas
3. Derived data — If the transcript was used to fine-tune a model, the model's weights may contain traces of the training data
For AI agents using RAG (Retrieval-Augmented Generation): If you stored transcripts in a vector database for context retrieval, erasing them requires not just deleting the records but also re-indexing. This is operationally non-trivial.
My recommendation: Keep raw audio in a separate, easily deletable store. Once you've generated the transcript, consider whether you need to keep the raw audio at all. In most AI agent use cases, you don't — the transcript is what you actually use downstream.
For one financial services client, we implemented a 30-day auto-deletion of all raw audio, with transcripts anonymized at the person level. When a deletion request came in, we only had to purge the anonymized transcript — a much faster operation.
What about transcripts processed through LLM context windows? If a user's audio was processed through an LLM in a context window that was subsequently discarded, the data subject's erasure right is effectively fulfilled — you can't access what's no longer stored. However, if the LLM provider logged the interaction, you may need to request deletion from them as well.
Data Residency and Cross-Border Transfers
GDPR restricts transfers of personal data outside the EU to countries without an adequacy decision from the European Commission. The US doesn't have a blanket adequacy decision — but EU-US Data Privacy Framework (DPF), adopted in July 2023, provides a mechanism for US companies to comply.
For speech-to-text, cross-border transfer risks come from:
- Your STT provider processing audio on servers outside the EU
- Transcript storage in a US-based cloud provider
- Logs and monitoring data containing audio fragments being stored in third-country data centers
1. Use EU-region cloud infrastructure — AWS eu-west-1, Azure West Europe, or Google Cloud EU regions
2. Verify your STT provider's data residency guarantees in writing — verbal assurances aren't sufficient
3. Check DPF certification — if your US-based provider is DPF-certified, transfers to their US infrastructure are permitted (though it could be invalidated — it was struck down once already in 2020)
The self-hosted advantage: When you run Whisper on your own EU-based infrastructure, data never leaves the region. No cross-border transfer risk, no adequacy concerns, no DPF dependency.
Frequently Asked Questions
Does GDPR require consent for every call recording?
Not necessarily — consent is one option, but contract performance or legal obligation are also valid bases. For customer service calls, "performance of a contract" often applies. For financial services under MiFID II, call recording is a legal obligation. Document which basis applies to each use case.
Can I use Deepgram or AssemblyAI for EU data?
You can, but you need a DPA with the provider and a valid cross-border transfer mechanism (DPF certification, standard contractual clauses, or adequacy decision). Both providers have DPF certification as of 2024, but verify this directly with them and get it in writing. Self-hosted Whisper eliminates the transfer problem entirely.
What happens if I don't have a DPA with my transcription provider?
You're in violation of Article 28, which can trigger fines up to €10 million or 2% of global turnover under Article 83(4). Beyond fines, a missing DPA is one of the first things a DPO or auditor checks.
How do I handle right to erasure for transcripts in a vector database?
You need a procedure that includes purging from the vector store, database replicas, and backups. For one client, we implemented a "soft delete + 30-day hard purge" pattern — the transcript is flagged deleted immediately, inaccessible to queries, and permanently removed from all storage within 30 days. This gives a manageable SLA while maintaining compliance.
Conclusion: Stay Compliant From the Start
GDPR compliance for speech-to-text isn't about adding a disclaimer — it's about designing your data architecture so that EU personal data never leaves your control without proper safeguards. The teams I've worked with who did this right from the start spent about two weeks on initial setup and never worried about it again. The ones who deferred it faced retrofit projects that cost three times as much.
The core architectural choice: Use a self-hosted transcription approach and you eliminate most of your cross-border transfer risk. Use a third-party API and you need DPAs, transfer mechanisms, and ongoing oversight.
If you're building a voice-enabled AI agent for EU users, start with Privocio's self-hosted deployment option — audio never leaves your infrastructure, and you have zero transfer compliance to manage. For teams using cloud STT providers, ensure your DPA is current, your transfer mechanism is documented, and your erasure procedure is tested.
For a deeper dive into private transcription infrastructure, read our complete guide to private speech-to-text APIs.