Ever since I read about a pretty fundamental flaw with the
aes gem, I have been bugged with the questions, "So, what's the right way to do it? And, if we should encrypt all the things, why is it so hard?". This post tries to walk you through what's going on, so that you can feel more confident about adding encryption to your applications.
Things are better these days
There have been some very good efforts to increase the usability of cryptoraphy recently. It used to be the case that you needed to compose several primitives together, each with their own parameters, to encrypt a message. If one of them was improperly used, bad people could easily make bad things happen.
Bernstein et al's NaCl (prononced "salt")
secret_box API, which is growing nicely via the Sodium project, is a massive step forward. Python's
cryptography package and the Fernet format spec make it especially easy to send secure messages securely over the wire.
Q: How do I encrypt something with a password?
I think this is the question that led the bug. Given that's how computer users are trained to enter passwords, it's not a big step to expect that it's possible for someone to produce code like this and expect it to work:
require 'aes' message = "Super secret message" key = "password" encrypted = AES.encrypt(message, key)
The problem here—irrespective of Ruby's silent error handling of
#hex that the article discusses—is that a password is not a key.
Deriving a key from a password
We can use a password to derive a key, which is what PBKDF2, bcrypt, scrypt and other key derivation functions do. That key is what can be used to encrypt a message. The key can have authentication added via HMAC and a hash code. We will use
cryptography, the whole key derivation process looks like this:
import os from cryptography.hazmat.backends import default_backend from cryptography.hazmat.primitives.hashes import SHA256 from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC password = b"password" salt = os.urandom(16) # bytes, i.e. 128 bits (minimum as per NIST SP 800-132) n_iter = 100*1000 kdf = PBKDF2HMAC( algorithm=SHA256(), length=32, # bytes, i.e. 256 bits salt=salt, iterations=n_iter, backend=default_backend() ) key = kdf.derive(password)
I am not a cryptographer, but here are a few remarks about some of the code in this snippet:
saltneeds to be randomly generated to be useful
iterationsshould be very high. To some extent, increasing the iteration count can offset a weak password.
- You will need to transmit
iterationsalong with your message to enable the recipient to derive the same key later on
How to encrypt message with a key
Now that we have a usable key, how to we use it to actually encrypt something?
Here I present two options. One is almost certainly how you should do it, the second is an implementation of a scheme proposed in the Security Stack Exchange that seemed quite interesting. I'm including the second to demonstrate that it's quite difficult to build a strong fortification from the crypto jenga-like primitives that higher-level developers have struggled with for decades.
Example 1: Using
Given the key that we have already generated, a few more imports allow us to generate URL-safe hidden messages that can be sent over the wire with some level of confidence that the contents will remain hidden:
from base64 import urlsafe_b64encode as base64 from cryptography.fernet import Fernet # [include snippet above] encryptor = Fernet(base64(key)) hidden_message_token = encryptor.encrypt(b"yolo")
hidden_message_token is a base64-encoded message in that complies with the Fernet spec. It includes a timestamp, the ciphertext and some other metadata that enables the token to be authenticated.
To allow someone else to decrypt
hidden_message_token from the password, they need to be able to generate
key. That means you will also need to send
key is generated, create another
encryptor, then all they will need is the following:
message = encryptor.decrypt(hidden_message_token)
Example 2: DIY crypto
Somewhere you learn that Fermat uses the following under the hood:
All encryption ... is done with AES 128 in CBC mode.
What? Surely not! Surely that's not what happens. And, in a sense, you would be right. NaCl/libsodium uses AES 256 GCM. And to make things a tad more complex, ChaCha20-Poly1305 would probably be the first choice.
So anyway, you decide to take a look online for how to encrypt messages with a password and you find a promising answer. Here is an implementation of the setup and encryption phases of that answer.
import os import random import cryptography from cryptography.hazmat.primitives import hashes from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC from cryptography.hazmat.backends import default_backend message = "Papa Bear, this is Little Bear. The Monkey has the Green Banana." password = "yolo" ## SETUP STEP # crypto_backend = default_backend() k1 = os.urandom(16) # 128 bits iv = os.urandom(16) # 128 bits salt = os.urandom(16) # 128 bits, minimum NIST SP 800-132 n_iters = 100*1000 kdf = PBKDF2HMAC( algorithm=hashes.SHA256(), length=32, salt=salt, iterations=n_iters, backend=crypto_backend ) intermediate_key = kdf.derive(password) k2 = intermediate_key[:16] k3 = intermediate_key[16:] cipher = Cipher(algorithms.AES(k2), modes.CBC(iv), backend=crypto_backend) encryptor = cipher.encryptor() ct = encryptor.update(k1) + encryptor.finalize() # At this stage, we can store ct, k3, salt and iv # for later use to encyrpt and decrypt data ## ENCRYPTION STEP # assume that we have loaded ct, k3, salt and iv # from storage, we need to regenerate k1 and k2 # from the password kdf = PBKDF2HMAC( algorithm=hashes.SHA256(), length=32, salt=salt, iterations=n_iters, backend=crypto_backend ) intermediate_key_2 = kdf.derive(password) k2_2 = intermediate_key[:16] k3_2 = intermediate_key[16:] assert k3 == k3_2 # means that we know the password is ok cipher = Cipher(algorithms.AES(k2_2), modes.CBC(iv), backend=crypto_backend) decryptor = cipher.decryptor() k1_2 = decryptor.update(ct) + decryptor.finalize() assert k1 == k1_2 # just checking file_cipher = Cipher(algorithms.AES(k1_2), modes.CBC(iv), backend=crypto_backend) # to encrypt, use file_cipher and make sure you finalize so that padding is # dealt with properly file_encryptor = file_cipher.encryptor() hidden_message = file_encryptor.update(message) + file_encryptor.finalize()
There's quite a lot going on here. I am sure that I will receive an email about how terrible this code is. That's probably why it's best to stick with the higher-level APIs.