The Tokenization framework allows for masking data and reversing its masking. For example, a Tokenization algorithm can be used to mask data before its sent to an external vendor for analysis. The vendor can then identify accounts that need attention without having any access to the original, sensitive data. Once the vendor’s feedback is obtained, you can reverse the masking and take action on the appropriate accounts.
The Tokenization algorithm is designed to be used in Tokenization/Re-Identification jobs, though it can also be used in Masking.
The algorithm tokenizes values using AES-128 encryption in CBC-CTS mode, with an optional initialization vector (IV), and Base64 encoding. The results are alpha-numeric strings that are longer than the original values. If the result is too long to fit in the field, the algorithm can be configured to either (a) fallback to a reversible masking algorithm, which produces a result that is the same length as the original value, or (b) fail the job.
The algorithm has the following properties:
The masked value for each input is consistent when using the same algorithm and the initialization vector length is 0. Changing the key for the algorithm or using an initialization vector length greater than 0 will result in different masked values.
As long as at least one maskable character is present in the input, the masked value will never match the input.
The algorithm used to mask a value can change depending on the length of the input.
The algorithm only works on string data types. Numbers can be masked if the column data type is a String type, such as VARCHAR or TEXT.
This new algorithm framework was introduced in version 220.127.116.11 to replace the existing Tokenization algorithm and adds the ability to select a fallback algorithm. Below is example data showing before and after Tokenization:
1,Erasmus,245 Park Ave,123-45-6789
2,Salathiel,245 park ave,123-45-6789
3,Salathiel,1003 Stant Drive,111-11-1111