Column-Level Encryption in MySQL
In a post written earlier this year – Percona Server for MySQL Encryption Options and Choices – I discussed some of the options around encryption in MySQL. Being such a complex topic, that post was meant to clarify and highlight various aspects of “encryption” at different levels. I recently had this topic come up again, but specifically around column-level encryption and various options so I wanted to touch on this in more detail.As of the current release of Percona Server for MySQL, there is no built-in way to define a single column as encrypted. Ideally, there could be some metadata passed in a create statement and this would just automatically happen, such as this:CREATE TABLE pii_data (user_id int unsigned PRIMARY KEY,super_secret varchar(255) ENCRYPTED,…… ) ENGINE=InnoDBUnfortunately, this option isn’t available and we need to do some data manipulation at or prior to read/write time.Built-in MySQL encryption functionsOne of the most common approaches is to use the built-in MySQL encryption functions described here: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html. The standard flows look roughly like this: INSERT INTO mytable (id, secret_data) VALUES (1, TO_BASE64(AES_ENCRYPT(“ultra-secret-data”, “my-key”)));…SELECT AES_DECRYPT(FROM_BASE64(secret_data), “my-key”) as plaintext FROM mytable; This works perfectly fine and the data will be stored encrypted for that column. If an unauthorized person were to gain access to the running table, they would be unable to read that column without the key. The biggest concern with this approach is that the plaintext key and plaintext data are BOTH specified in the query. This leads to potential leaks in log files (slow query, binary log, etc) as well as potential sniffing over non-secure networks (i.e. connections not using SSL).Also, key storage can become more cumbersome. If you plan to share the same key for the entire table, key rotation and management can become non-trivial. General best practices recommend that you rotate keys at least once a year. On large tables, this could be a massive undertaking. Let’s look at the envelope encryption pattern as an alternative.Envelope encryptionIn contrast to using built-in encryption, envelope encryption uses the concept of individual data keys. While there are many ways to approach this, my personal experience uses the AWS KMS service. In KMS, you can create what is called a “Customer Master Key” (CMK). This is great for encrypting small strings like passwords but is limited to encrypting strings up to 4KB.A more flexible approach is to use individual data keys, encrypt the data locally in the application, and then store the encrypted data along with the encrypted data key. This can be done at various levels of granularity – from the per-row level to the table level, to the database level. Here is the general process for envelope encryption:plaintextData = “super secret content”datakey = kms.generate_data_key(“alias/mymasterkey”)encryptedData = crypto.encrypt(plaintextData, datakey.plaintext)storableKey = datakey.ciphertext INSERT INTO mytable (id, secretData, key) VALUES (NULL, encryptedData, storableKey);When you need to decrypt the data, you first decrypt the key and then use that to decrypt the data:SELECT secretData, key FROM mytable;datakey = kms.decrypt(key)decryptedData = crypto.decrypt(secretData, datakey)Because both the data key and the data are encrypted, both are safe to store together. This is one of the main benefits of this method. You can have hundreds to millions of data keys independent of one another but protected by a single master key. This allows you to deactivate all of the individual keys by disabling one single master key. It also simplifies key rotation – you can simply rotate the master key and start using the new key to generate new data keys. Note that this doesn’t re-encrypt the data, but it does allow you to follow the best practice of periodically rotating keys. As long as you don’t delete any of the old keys, KMS can determine which key to use for decryption from the key itself and automatically decrypt your data (i.e. you don’t have to specify which key encrypted the data). I’ve included a link to a sample script in Python that shows this process in more detail. While there isn’t actually an active database in the demo, the code prints out the INSERT and SELECT statements that could be run against a live server: https://github.com/mbenshoof/kms-envelope-demoChallenges with column-level encryptionSearchingIt should go without saying that introducing column-level encryption isn’t a trivial task. One of the biggest challenges is reviewing how encrypted data is retrieved. For example, if I store social security numbers encrypted individually, then how do I search for the user with an SSN of 123-45-6789? If I use a shared key for the entire table/schema, then it is possible with proper indexing. I just have to pass the encrypted value to the where clause and if it exists, it should be found.However, on a per-row model where each row uses a unique key, this is no longer possible. As I don’t know the key the value was encrypted with, I can’t search for the encrypted value in the table. In cases like this, you might consider a one-way hash field that could be searched against. For example, you could store the SHA256 hash of an SSN as an additional column for searching but then decrypt any other sensitive information.CPU and space overheadThe other challenge is adding additional write/read overhead to handle encryption and decryption. While this may or may not be an issue depending on the use case, the extra CPU needed either on the application side or MySQL side could come into play. You will need to consider the extra processing required and factor that in when planning the size of your resources.Additionally, depending on the encryption libraries used, there can be additional space needed to store encrypted values. In some cases, the encrypted value (when stored in base64 specifically) may end up requiring a higher storage footprint. The space could be compounded if using an index on an additional hash column. For small values (like SSN), the hash may be much larger than the actual data. This can result in a much higher storage footprint when applied to millions of records. Wrapping upEncryption and security are very important and complicated topics. When considering column-level encryption in MySQL, you definitely have some options. The easiest way would be to just leverage the built-in encryption functions in MySQL. However, you can take things a step further and handle all encryption and decryption in your application. As is always the case with complicated topics like this, the choice and approach depend entirely on your requirements and use case. Just know that with the flexibility of MySQL, there is most likely a design and approach that works for you!