Avoid DIY Encryption by Learning About Encryption

Ever since I read about a pretty fundamental flaw with the aes gem, I have been bugged with the questions, "So, what's the right way to do it? And, if we should encrypt all the things, why is it so hard?". This post tries to walk you through what's going on, so that you can feel more confident about adding encryption to your applications.

Things are better these days

There have been some very good efforts to increase the usability of cryptoraphy recently. It used to be the case that you needed to compose several primitives together, each with their own parameters, to encrypt a message. If one of them was improperly used, bad people could easily make bad things happen.

Bernstein et al's NaCl (prononced "salt") secret_box API, which is growing nicely via the Sodium project, is a massive step forward. Python's cryptography package and the Fernet format spec make it especially easy to send secure messages securely over the wire.

Q: How do I encrypt something with a password?

I think this is the question that led the bug. Given that's how computer users are trained to enter passwords, it's not a big step to expect that it's possible for someone to produce code like this and expect it to work:

require 'aes'

message = "Super secret message"  
key = "password"

encrypted = AES.encrypt(message, key)  

The problem here—irrespective of Ruby's silent error handling of #hex that the article discusses—is that a password is not a key.

Deriving a key from a password

We can use a password to derive a key, which is what PBKDF2, bcrypt, scrypt and other key derivation functions do. That key is what can be used to encrypt a message. The key can have authentication added via HMAC and a hash code. We will use SHA256.

Using cryptography, the whole key derivation process looks like this:

import os  
from cryptography.hazmat.backends import default_backend  
from cryptography.hazmat.primitives.hashes import SHA256  
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC

password = b"password"  
salt = os.urandom(16) # bytes, i.e. 128 bits (minimum as per NIST SP 800-132)  
n_iter = 100*1000

kdf = PBKDF2HMAC(  
  algorithm=SHA256(),
  length=32, # bytes, i.e. 256 bits 
  salt=salt,
  iterations=n_iter,
  backend=default_backend()
)

key = kdf.derive(password)  

I am not a cryptographer, but here are a few remarks about some of the code in this snippet:

  • salt needs to be randomly generated to be useful
  • n_iter/iterations should be very high. To some extent, increasing the iteration count can offset a weak password.
  • You will need to transmit salt and iterations along with your message to enable the recipient to derive the same key later on

How to encrypt message with a key

Now that we have a usable key, how to we use it to actually encrypt something?

Here I present two options. One is almost certainly how you should do it, the second is an implementation of a scheme proposed in the Security Stack Exchange that seemed quite interesting. I'm including the second to demonstrate that it's quite difficult to build a strong fortification from the crypto jenga-like primitives that higher-level developers have struggled with for decades.

Example 1: Using crytography.fernet

Given the key that we have already generated, a few more imports allow us to generate URL-safe hidden messages that can be sent over the wire with some level of confidence that the contents will remain hidden:

from base64 import urlsafe_b64encode as base64  
from cryptography.fernet import Fernet

# [include snippet above]

encryptor = Fernet(base64(key))  
hidden_message_token = encryptor.encrypt(b"yolo")  

hidden_message_token is a base64-encoded message in that complies with the Fernet spec. It includes a timestamp, the ciphertext and some other metadata that enables the token to be authenticated.

To allow someone else to decrypt hidden_message_token from the password, they need to be able to generate key. That means you will also need to send salt and iterations with hidden_message_token. Once key is generated, create another encryptor, then all they will need is the following:

message = encryptor.decrypt(hidden_message_token)  

Example 2: DIY crypto

Somewhere you learn that Fermat uses the following under the hood:

All encryption ... is done with AES 128 in CBC mode.

What? Surely not! Surely that's not what happens. And, in a sense, you would be right. NaCl/libsodium uses AES 256 GCM. And to make things a tad more complex, ChaCha20-Poly1305 would probably be the first choice.

So anyway, you decide to take a look online for how to encrypt messages with a password and you find a promising answer. Here is an implementation of the setup and encryption phases of that answer.

import os  
import random  
import cryptography  
from cryptography.hazmat.primitives import hashes  
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes  
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC  
from cryptography.hazmat.backends import default_backend

message = "Papa Bear, this is Little Bear. The Monkey has the Green Banana."  
password = "yolo"


## SETUP STEP
# 

crypto_backend = default_backend()  
k1 = os.urandom(16) # 128 bits  
iv = os.urandom(16) # 128 bits  
salt = os.urandom(16) # 128 bits, minimum NIST SP 800-132  
n_iters = 100*1000


kdf = PBKDF2HMAC(  
  algorithm=hashes.SHA256(),
  length=32,
  salt=salt,
  iterations=n_iters,
  backend=crypto_backend
)

intermediate_key = kdf.derive(password)  
k2 = intermediate_key[:16]  
k3 = intermediate_key[16:]

cipher = Cipher(algorithms.AES(k2), modes.CBC(iv), backend=crypto_backend)  
encryptor = cipher.encryptor()  
ct = encryptor.update(k1) + encryptor.finalize()

# At this stage, we can store ct, k3, salt and iv
# for later use to encyrpt and decrypt data


## ENCRYPTION STEP

# assume that we have loaded ct, k3, salt and iv
# from storage, we need to regenerate k1 and k2
# from the password

kdf = PBKDF2HMAC(  
  algorithm=hashes.SHA256(),
  length=32,
  salt=salt,
  iterations=n_iters,
  backend=crypto_backend
)

intermediate_key_2 = kdf.derive(password)  
k2_2 = intermediate_key[:16]  
k3_2 = intermediate_key[16:]

assert k3 == k3_2 # means that we know the password is ok

cipher = Cipher(algorithms.AES(k2_2), modes.CBC(iv), backend=crypto_backend)  
decryptor = cipher.decryptor()  
k1_2 = decryptor.update(ct) + decryptor.finalize()

assert k1 == k1_2 # just checking

file_cipher = Cipher(algorithms.AES(k1_2), modes.CBC(iv), backend=crypto_backend)

# to encrypt, use file_cipher and make sure you finalize so that padding is 
# dealt with properly

file_encryptor = file_cipher.encryptor()  
hidden_message = file_encryptor.update(message) + file_encryptor.finalize()  

There's quite a lot going on here. I am sure that I will receive an email about how terrible this code is. That's probably why it's best to stick with the higher-level APIs.