Tokenization vs. Encryption for Data Protection Compliance
Tokenization is a branch of cryptography, but should not be confused with encryption. Encryption is used to hide strings of text based on mathematics. Tokenization replaces individual characters with a different character based on randomness. If you can reverse the encryption mathematics, you get access to the entire string of encrypted text. If you can reverse the token randomness, you get access to a single character.
In this sense, tokenization mirrors the current movement within cybersecurity towards granularity. Zero Trust is another good example – it is generally considered more secure to separately protect every individual asset than to rely on a wall around the entire data center. Similarly, tokenization individually changes every single character (rather than entire strings of text) in a way that has no mathematical reversibility.
In simple terms, tokenization cannot be cracked. Encryption can be cracked. For these reasons alone, tokenization deserves a closer examination for its potential role in data security and in ensuring compliance with data protection regulations, such as PCI DSS, GDPR and CCPA/CPRA.
Encryption is a mathematical calculation usually using exponent moduli to result in a binary representation that cannot be understood without a key to reverse the process. The arrival of computers made the mathematical calculations easy and comparatively fast, and the process rapidly became the de facto method for protecting sensitive data in the computer age.
Assuming a strong algorithm is used, encryption is fundamentally safe, making it nearly impossible to retrieve encrypted text without a decryption key. But this is its weakness. Any adversary with access to the decryption key can reverse the process and retrieve the text. One key gives access to everything. A successful attack against the key is a successful attack against the encryption – and this is where adversaries focus their efforts.
Symmetric encrypted data uses the same key for encryption and decryption. That key is usually distributed via asymmetric encryption, which relies on two keys – a private and secret key for the sender and a public and known key for the recipient. The two keys are mathematically related, usually two sole factors of a very large number. Factoring large numbers (to obtain the second key when only one is known) is very hard – well beyond the capability of modern computers to do so in a realistic time frame.
But here is another problem with encryption. The arrival of quantum computing will change this equation. While general purpose quantum computing is still many years away, focused-task quantum computing may be quite close. In October 2019, Google announced a success in its own ‘quantum supremacy‘ program. With a 54-qubit quantum computer, it solved a pre-defined problem in 200 seconds. It would have taken an existing classical supercomputer 10,000 years to do the same.
It is not inconceivable that a special purpose quantum computer designed specifically to run Shor’s algorithm will be available within a few years. Quantum power and Shor’s algorithm will be able to crack asymmetric encryption relatively easily. If that is done, the attacker will have access to the key that will provide access to everything encrypted with the same symmetric key.
Any encrypted data that has been intercepted and stored along with the relevant asymmetric key will be vulnerable. This poses two questions. Firstly, are current encryption practices still adequate? The answer here is clearly, no. Business should move towards alternative methods of protecting sensitive data. This could be through quantum-proof encryption (which is already available), or tokenization (which is quantum proof by nature).
Secondly, are companies in strict compliance with data protection regulations if they use a protection methodology that they know will soon become unsafe when there is a safe alternative that can be used today.
SecurityWeek asked David Flint for his opinion on this in relation to GDPR. Flint is a commercial law consultant with Scottish law firm Inksters and a visiting professor at Creighton University School of Law, Omaha, Nebraska. “GDPR is not a fixed standard and controllers are only required to take appropriate technical steps to protect data. Thus,” he said, “one would expect MI5 to be more secure than the local takeaway restaurant. While the much-vaunted quantum computers may make much of the present encryption less secure, this is still some way off and for 99.99% of controllers and processors, it would be an unnecessary step. I do not think that requiring an undertaking to use quantum-proof keys is ‘reasonable’ given the state of technology. However, that may change – in the same way as we have moved from 64bit encryption keys to 2048, 4096 or 8192bit keys.”
In short, current encryption practices are all that can be expected by the law. Quantum computing remains primarily theory, even though we know it is coming. The current interpretation of the law suggests that quantum-proof encryption will only become necessary after quantum computing becomes a reality – by which time it will be too late to protect any encrypted data already in the hands of nation-state or criminal gang adversaries.
Tokenization involves replacing individual characters with random other characters. The random character is a token for the original character. The process is performed by a token engine.
Different vendors have different and often proprietary methods for generating the tokens. The basic process is often compared to a one-time pad. Microsoft says of the one-time pad, “As long as the encryption key (the ‘pad’) is the same length as, or longer than the message being encrypted, and is never re-used, it is mathematically impossible to decipher messages encrypted using this technique.”
Tokenization can use a ‘pad’ that is many times longer than the original text. The process is often compared to randomly selecting characters from one or many books. “It’s like taking all the books in the world and putting them in a library,” explains David Johnson, CEO of Rixon Technology. “The random generator goes to that library and chooses one book randomly. It goes to page 37, paragraph 3, fourth character – and that becomes the token for the original character. All the engine remembers is the route to the token. It repeats the process for every single character.”
The primary advantages of this process are that it is ‘format preserving’ and can be controlled in scope. The majority of encryption – in fact all encryption other than the new format-preserving encryption (FPE) – produces encrypted text that bears no relation to the original in either size or content. It cannot be processed or analyzed without being first decrypted.
Tokenized text retains the format of the original. Since individual characters are tokenized, select characters can be left in their original form. The first few letters of an email address or the first four numbers of a bank card could be left un-tokenized. This would confirm the record without providing enough information to be of value to cybercriminals, and without violating data protection regulations. Since each record retains its original format, it can still be processed and analyzed by existing tools.
The evolution of modern tokenization
Tokenization is not a new concept. It offers huge potential for protecting data without limiting business. Nevertheless, it has been slow to gain acceptance. The primary reason is that it requires large amounts of compute power to generate all the tokens and remember the route for each. Advances in technology over the last decade are, however, beginning to change this.
Vaulted tokenization was the original and still common approach for tokenization. The vault is a large database – kept securely – that maps tokens to the corresponding clear text data. The vault can be maintained in-house or transferred to a tokenization firm.
There are several problems with this approach. The first is the time it takes to do detokenizing lookups, introducing latency into business processing. The second is that vaults provide a single point of failure in the tokenization infrastructure; and also a high risk target for attack since they contain the cleartext data.
The newer form of vaultless tokenization replaces the mapping database with an algorithm. This eliminates the need to maintain and protect a large mapping database, reduces the latency, and can eliminate the need to store the clear text anywhere.
This reduces compliance scope and cost, provides better security than that of vaulted models, and increases efficiency. The algorithm – or tokenization engine – remembers the route taken for each token for each character, and can consequently reverse the process without having to store the clear text.
The latest developments in tokenization are combining this vaultless approach and cloud computing to effectively provide tokenization-as-a-service. This has huge potential for tokenization within data compliance. The algorithm-based tokenization engine is maintained in the cloud, but does not need to store any clear text. But neither does the business. Sensitive data can immediately be tokenized with only the tokenized data stored locally. Since this is format preserving and can leave enough data exposed without compromising compliance, much of the necessary business processing can be achieved without ever storing any compliance-regulated data locally.
Will tokenization replace encryption?
The simple answer to this is probably, no. There will always be areas where encryption makes sense – for example, archives that must be kept but are rarely used.
Tokenization’s forte is where data must conform to data protection regulations, but is still being regularly used by the business. Here it offers advantages in both usability and security above encryption.
But tokenization can also be used in conjunction with encryption. A future threat to communications will come with the arrival of basic quantum computers able to crack public key infrastructure (PKI) – and this may be only a few years away. Rather than change the entire communication infrastructure, tokenization could be used to tokenize all verbs or all nouns within a message before it is encrypted and transmitted using the existing infrastructure. Even if the message is intercepted and decrypted by quantum computing, the quantum-proof tokenization will still render the message content meaningless.
Cloud-based vaultless tokenization offers many advantages over current methods of protecting data and ensuring data compliance conformance. But it is in its infancy. It offers the potential for many new possibilities in the coming years.
Related: NSA Eyes Encryption-breaking ‘Quantum’ Machine: Report
Related: Security Startup Quantum Xchange Promises Unbreakable Quantum-Safe Encryption