Pseudonymization
What is pseudonymization? Learn how this technique protects personal data, how it differs from anonymization, and what the GDPR requires for implementation.
Pseudonymization
Definition
Pseudonymization is defined under Article 4(5) of the General Data Protection Regulation (GDPR) as the processing of personal data in such a manner that the data can no longer be attributed to a specific data subject without the use of additional information. The prerequisite is that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data is not attributed to an identified or identifiable natural person.
Unlike anonymization, where the attribution to a person is irreversibly removed, pseudonymization is reversible: by combining the pseudonymized data with the separately stored mapping information, the data can again be attributed to a specific individual. Therefore, pseudonymized data continues to be considered personal data and remains subject to GDPR regulations. Articles 25 and 32 GDPR explicitly cite pseudonymization as a recommended protective measure.
Simply Explained
Imagine a medical study: instead of using the real names of participants, each receives a number: Patient 4721, Patient 4722, and so on. The doctors work only with these numbers. The list that reveals which number belongs to which name is kept in a separate, locked safe.
That is how pseudonymization works: the actual identifying features (name, address, date of birth) are replaced by pseudonyms (numbers, codes, tokens). The link between pseudonym and real identity is stored in a separate, specially protected location. Anyone who sees only the pseudonymized data cannot identify the person. Only when both parts are combined does the attribution become possible.
Why Does It Matter?
Pseudonymization is one of the most important data protection techniques and offers significant advantages:
- Recommended by the GDPR: Articles 25 and 32 GDPR explicitly cite pseudonymization as a suitable protective measure. Its implementation is positively assessed by supervisory authorities.
- Risk Reduction: Pseudonymized data is significantly less valuable to attackers because it cannot be attributed to a specific person without the mapping information. In a data breach, the risk to affected individuals is considerably lower.
- Facilitated Processing: In certain cases, pseudonymization can facilitate processing, for example when the controller's legitimate interest is weighed against the data subjects' interests (Article 6(1)(f) GDPR).
- Research and Analysis: Pseudonymized data can be used for statistical evaluations and research purposes without revealing the identity of data subjects.
- Combination with Other Measures: Pseudonymization complements other protective measures such as encryption, access control, and data minimization to form a comprehensive protection concept.
Practical Example
A recruitment company collects application documents through an upload platform. The documents are initially reviewed by a pre-screening department before being forwarded to the responsible recruitment consultants. Management wants to ensure that the pre-screening is as unbiased as possible.
The company implements a pseudonymization procedure: application documents are automatically pseudonymized after upload. Names, photos, addresses, and dates of birth are removed and replaced by internal identifiers. The pre-screening department sees only professional qualifications, work experience, and certificates without knowing the applicants' identities.
Only when a candidate reaches the shortlist are the complete data released to the responsible consultant. The mapping table is stored separately with restricted access. The result: a fairer pre-screening process, reduced discrimination risk, and better data protection.
How SendMeSafe Implements This
SendMeSafe supports the principle of pseudonymization through various features:
- Token-Based Links: Upload links and share links use randomly generated tokens instead of identifying information. The URL contains no hints about the organization, the client, or the content.
- Minimal Identification: Uploading individuals do not need to create an account or provide personal data. The assignment of uploaded files is based on the link token, not on personal identifiers.
- Separated Data Storage: File contents are stored in S3 storage, while metadata (assignment to clients and organizations) is stored in the database. This separation makes it harder to establish connections if unauthorized access occurs to only one of the systems.
- Role-Based Access Control: Within an organization, different roles with varying access rights can be assigned. Not every employee needs access to all client data.
- Encryption: AES-256 encryption of all stored files, together with pseudonymization, forms a multi-layered protection concept.
- Audit Trail: All data access is logged, making it traceable who established the link between pseudonymized and identifying data and when.
Frequently Asked Questions
What is the difference between pseudonymization and anonymization?
The crucial difference: pseudonymization is reversible; anonymization is not. With pseudonymized data, the attribution to a person can be restored when the separately stored mapping information is added. Anonymized data cannot be attributed to a person even with additional information. Therefore, pseudonymized data continues to be considered personal data under the GDPR, while anonymized data does not.
Must pseudonymized data be deleted?
Yes, since pseudonymized data remains personal data, all GDPR regulations apply, including the principle of storage limitation and the right to erasure. A deletion policy must also cover pseudonymized data and the associated mapping information. Deleting only the mapping information is not sufficient if the pseudonymized data could be re-attributed to a person through other means.
What pseudonymization techniques exist?
Common techniques include: replacement with randomly generated codes or tokens, cryptographic hash functions, encryption with separate key storage, and tokenization. The choice of technique depends on the use case. It is important that the mapping information is specially protected through technical and organizational measures.
Does pseudonymization protect against fines?
Pseudonymization can indirectly reduce the risk of fines. If a data breach affects only pseudonymized data and the mapping information was not compromised, the risk to affected individuals is considerably lower. This can have a positive impact on the supervisory authority's assessment and lead to a lower fine. Additionally, pseudonymization fulfills the Article 32 GDPR requirement for appropriate protective measures.