Anyone who has spent a fair bit of time dealing with passwords and password security knows the pain of trying to establish a good password policy. That is, one which:

  • results in resistance to online and offline password-guessing attacks
  • won’t cause your users to rebel and burn down the office
  • maximizes the chances that your users will choose and remember a strong password

There is a great deal of guidance on this, including the NIST Electronic Authentication Guideline (pdf) that is commonly-referenced as a source for the “8 chars, mixed-case, digits, and special chars” default password policy that so many organizations establish.

But what about when you switch to passphrases? How do you set a passphrase policy that gives you same-or-better security as a good password policy?

When organizations start to consider switching from pass-words to pass-phrases—that is, authentication secrets made up of a series of words rather than simply characters—they end up either needlessly resisting passphrases or coming up with truly mind-boggling policies around them. It seems that many organizations have never ingested the reasoning behind common password guidance.

What I hope to do here is make some small progress toward fixing that situation by:

(This is long: if you want, you can skip to the recommendation.)

The principles behind password policies

The idea behind having a strong password policy is to make it infeasible for an attacker to guess a user’s password while that password is still valid. We don’t generally need a very strong password policy to prevent online attacks—where the attacker guesses against a live system—because we use account lockouts or other controls to make such things infeasible.

But if the attacker should get a copy of your password database, they can conduct an offline attack, and here’s where password policies start to be relevant.

An attacker’s chances of successfully recovering a user’s password while it still has value basically boil down to how many guesses an attacker must make (on average) before they are successful, how fast they can guess, and the password rotation/reuse policies. Essentially, if an attacker can guess the password before the user is forced to change it, the attacker wins.

If your password database stores the passwords in plaintext, they must make 0 guesses, so that’s extraordinarily dumb. Which is why salted hashes of passwords are stored instead.

If an attacker has to guess a password, compute it’s digest from a hash function, and then compare the results to the stolen database, then:

  1. How fast they can guess is a function of their available computing power vs. the computational complexity of the hash function
  2. How many guesses they have to make is a function of password strength

It’s the second item we try to control by having and enforcing password policies; but note that these two things are interrelated. If we increase computational complexity enough, we can have weaker passwords and still be OK.

How do we measure password strength?

The short answer is “bits of entropy”. If a password has 8 bits of entropy, an attacker must guess an average of 28 times before getting the answer correct. That means that increasing strength by 1 bit doubles the work for an attacker.

The entropy H of a password is a function of its length (the number of symbols used, L) and it’s complexity (the size of the pool from which symbols are chosen, N). Specifically:

H = L × log2 N

This isn’t the same as “the number of possible passwords”, also known as the “search space”. The total search space S is given by:

S = NL

If a password consists of all lower-case letters, then N=26 (there are 26 letters in the alphabet), whereas mixed-case doubles that to N=52, adding digits means N=62, and so on.

However, you can see that increasing length has a greater effect than increasing complexity. In fact, here’s a chart:

Chart: Changes in L vs. Changes in N

You can see that requiring longer passwords increases entropy linearly, while increasing complexity only works logarithmically.

Those silly humans confound everything

Of course, the calculation above only really works if the passwords are randomly-generated. Why? Because people really suck at being unpredictable; we’re not wired for it.

The N-value (number of possible symbols) is effectively reduced when people are picking passwords. Every possible combination of 10 lower-case letters, for example, results in 2610 (141.1 trillion combinations); but when people are choosing passwords, they’re not going to choose from that set.

People are likely to pick words from the dictionary, or initialisms of sentences, as the usual pieces of common advice suggest—and that means there are a lot of combinations that can be excluded from an attack. And that significantly reduces the effective size of N.

When people pick passwords, attackers can use a dictionary of likely candidates (thus the term “dictionary attack”). And due to various data breaches over the years, we have some really nice data on what kinds of passwords people actually pick. People have used these to generate large password lists that are likely to recover the bulk of human-selected passwords.

The largest such password list I could find with a simple search is the UNIQPASS v15 list which contains 243,779,397 entries. With a constrained list like that, your L value is one, and your N value is the dictionary size (243,779,397 in this case). So if your password happens to be on the UNIQPASS list, your effective password strength is ~27.8 bits, which means less than 234 million guesses on average.

On an old laptop (1 CPU core!) I was able to get hashcat to make 33 million guesses per second for MD5-hashed passwords. That means I should be able to guess each password in your database in about 7 seconds on average.

For comparison, a randomly selected 10-character password with mixed case and digits (N=62) has 59.5 bits of entropy, which is about 815 quadrillion guesses. My old laptop would take 783.36 years on average to crack those hashes.

In 2013, you could use Amazon Web Services to build 2.3 billion MD5 hashes per second rig on a single 2-GPU instance with relatively little money or effort. That rig can crunch our UNIQPASS list in less than a second. And a custom-built rig in 2012 (something a professional attacker is likely to do!) could do the whole thing in 1.3 milliseconds, at 180 billion MD5 hashes per second.

And this is why your hash function matters

That same custom-built rig was also turned loose against other hashes, so it’s a great reference point for why choice of hash matters. Merely switching from MD5 to SHA1 ups the UNIQPASS list processing time to about 3.7 milliseconds; nearly 3 times slower. Using LM hashes (a modern Windows hash), time would be reduced to about 11.7 milliseconds; 9 times slower than MD5.

Now those are still really short times, but notice how simply changing the algorithm made things up to 9 times slower for the attacker. That’s the same as increasing password strength by a bit more than 3 bits.

On my own test system, password-specific adaptive hashes like bcrypt were 170 times slower than MD5; that’s like adding 7.4 bits of entropy to your password strength.

Using slow, adaptive hashes only has one downside: it also takes longer for a legitimate user to authenticate. If we make it so an attacker with a nice hashing rig spends a full second per guess, then a user on a busy server might take a minute or two for login. That’s obviously unacceptable for most use cases.

The best policy, then, is to use an adaptive hash that’s tuned to be as slow as your users will tolerate. Keep in mind this is a moving target: both your hardware and your attackers’ gets faster every year.

As a result, it’s important to use a properly-tuned adaptive password hash, and be sure to include a salt when hashing. One of the following is likely adequate:

You can also use many rounds of SHA-512 following a pattern like:

for r in 1..ROUNDS {
    password = sha512(salt + password)
}
save(password)

There is no concrete and well-researched recommendation for the appropriate number of rounds; but testing this in your environment should be relatively straightforward.

So how does this shape password policy?

Password policies should be set such that they increase the chances of a user picking a strong password, keeping in mind that users are likely to choose passwords that are significantly less strong than random selection given the same length and complexity policy.

Ok, great… how does this apply to passphrases?

A passphrase—that is a password consisting of several words rather than several characters—gets its strength two ways: having a large N against passphrase attacks, and a large L against password attacks.

The usual caveats about users picking passwords apply. It’s difficult to estimate how much weaker a user-chosen passphrase is than a random one: there’s really not enough data to make good guesses about this. So we’ll be comparing the strengths of random passwords and random passphrases—just remember the discussion above when considering user-chosen passphrases.

Passphrases are relatively easy to make strong, and a passphrase of a given strength is much easier for a user to remember than an equivalent password; which means users are less likely to record their passphrase somewhere an attacker could discover.

So what’s the strength of a passphrase?

As with passwords, this is a function of length L and complexity N; but for passphrases each symbol is a word. Therefore a passphrase of 5 words has L=5, even though it may be more than 20 characters long.

Determining N is a bit trickier. How big is the set of possible words? It’s not an easy question, and it’s tempting to use things like the Oxford English Dictionary’s list of words in use, which would put N≅171,476. But of course, people don’t have vocabularies that large.

The long and the short of it is that the 3000 most common words make up 95% of what people read and write on a daily basis. So N=3000 seems a reasonable floor.

What does a 5-word passphrase made up of words from that common set make for password complexity? entropy of L=5, N=3000 ≅ 57.7 bits of entropy; which is right in between passwords of 9 and 10 characters with N=62 (mixed case and numbers).

If you instead generate from the diceware corpus, then N=7776, which means a 5-word diceware passphrase is ≅64.6 bits of entropy (between a 10 and 11 char password with N=62).

Of course, these are minimums. An attacker would also have to deal with variations in composition. The phrase “correct horse battery staple xkcd” could be entered, for example:

  • as written (correct horse battery staple xkcd)
  • without spaces (correcthorsebatterystaplexkcd)
  • with separators (correct-horse-battery-staple-xkcd)

And that’s without mixing in variations in case or mixing techniques (because let’s be honest, that isn’t likely to happen with the average user). If an attacker considers just those three variants, that triples their work, which effectively adds ~1.6 bits of entropy.

but what about if the attacker doesn’t know it’s a passphrase?

Passphrases can be treated as passwords, significantly lowering the N. If we assume one of the composition rules above (all-lower, optionally separated by space or dash), N=28. But passphrases make pretty long passwords.

How long? We can estimate this using a corpus. Fortunately, there’s a list of common words found in movie and TV subtitles; I used the English list as of 2012 (en-2012.zip).

If we use the 1,000 most common words, the average word length is 4.8; with the 3,000 most common words, the average word length is about 5.6 characters; for the top 10,000 it’s about 6.2 characters. Let’s use the shortest of those, because password policy is all about minimum strength.

A 5-word passphrase should average at least N = 4.8 × 5 = 24 characters. With N=28, that’s ≅ 115.3 bits of entropy.

If we apply this same approach to the larger Diceware wordlist, we get an average length of 4.2 characters, so N= 4.2 × 5 = 21, which puts a Diceware 5-word passphrase at about 110.9 bits of entropy.

If we use a passphrase system that forces only lower-case letters, you end up losing about 2.5 bits of entropy.

In other words, an attacker is much better off using a dictionary and treating a passphrase as a passphrase, because password attacks against such long passwords are harder.

Passphrase policy summary

A passphrase policy that requires 5 words and a minimum of 21 characters should result in:

  • equivalent strength to a 10-character password with mixed case and numbers when the attacker knows passphrases are in use

  • equivalent strength to a 16-character password with mixed case and numbers if the attacker treats it as a password

You should use the same hash function configuration and rotation policy for a passphrase as you would for passwords of the same strength.

Additional strength can be had by making one or more of the following recommendations:

  • Randomly choose the words and word order (best!)

  • Avoid using pre-existing phrases; pick a series of words individually

  • Use at least one word that doesn’t exist in your native language (foreign words)

  • Avoid using words commonly associated with you, your job, or your organization

  • Use a special character or number to separate words

These are, however, more difficult to enforce through technical controls.