Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Entropy isn't sufficient to measure password strength (benwr.net)
34 points by benwr on Jan 17, 2022 | hide | past | favorite | 120 comments


> Because choosing good passwords is about memorableness as well as sheer strength

That's not been true ever since the development of good password managers. There are fewer than 10 passwords I remember. One of them is my password manager's master passphrase (5 misspelled-and-with-random-punctuation words). The others include stuff like my work and home laptop/disk passwords, which I can't autofill, my 3 important banking passwords which I do not even entrust to my password manager, and my AppleID password because iOS is annoying enough at asking for that that I'm using one I can remember.

The other ~600 entries in my password manager are 25 random characters (or whatever the upper limit if password length is for sites/services that are 'doin it wrong').


One could argue that you still need to remember your master password, and since it gives access to all your other passwords, it's all the more important to make it extremely strong. Therefore the randomness/memorability trade-off is still very important.


Yes, but it’s not too hard to make one ridiculously long/complicated master password that is also memorable. It might take you a while to remember it — just keep it written down on paper somewhere private & safe and refer to it as needed. If you’re not being targeted then you’ll probably be fine.


It doesn't need to be complicated. Just long.

ie

theuniverseis99%emptyspaceatleastthatswhatiwastaughtbymr.cattoningrade6

easy to remember without paper and uncrackable. Pair it with a yubikey and that's your bitwarden master


aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa is an equally long password, but much less safe than your example password because my password has far less entropy.


> That's not been true ever since the development of good password managers.

A lot of people (do not trust password managers, case in point the recent last pass scare.

You want passwords to your key accounts to be 1) memorable 2) strong 3) only in your head. For these, I think the article is fairly relevant.


> A lot of people (do not trust password managers, case in point the recent last pass scare.

That's no excuse. KeePass allows having the database file locally where it's you duty to manage it.

It might be less convenient, maybe. But I don't see valid excuses for people to not start using a password manager, even less the less tech savvy people.


It’s completely valid to distrust password managers. No software is free from bugs, or accidentally exposing your passwords. It might take a lot of work, but it’s certainly possible.

There’s also the possibility of mismanaging your password database and losing all of your data.


> It’s completely valid to distrust password managers.

Is it, really? And at the same time to trust one's memory? For memorizing hundreds of long passwords? Don't think so...


Distrusting password managers does not implicitly trust your own memory.

There are alternatives to programmatic password managers or human memory, e.g. a paper notebook and a safe. That's not an approach I would personally take, but I imagine it's a reasonable option for someone who sees the potential danger of trusting any program for sensitive data.


> There’s also the possibility of mismanaging your password database and losing all of your data.

The alternatives are same password everywhere or keeping a paper around with the passwords written in plain text. Both are equally disastrous (unless you work at home and don't ever get robbed)


I would say having it written is more secure. If you hide it in a good spot no robber is going to find it and even if they do you will almost certainly know you’ve been robbed so you can change the passwords. Not only that but I bet 99% of robbers wouldn’t have an effective plan to sell or use the passwords. Digital on the other hand has a significant chance to go undetected for a while and certainly know exactly what to do with them.


Or finding a good mix between entropy and memorability so you can keep lots of strong passwords in your head, like the featured article discusses.


This doesn't work. If you have 100 different accounts, there is no way you can memorize around 5000 bits of entropy in a reasonable amount of time.


You don't need 50 unique bits of entropy for every one of those accounts. Memorize 10 25-bit passwords, for each account combine two of them, now you have 100 unique 50-bit passwords and you only need to remember 250 bits (technically 257 because you need to remember which combo goes with which), roughly the entropy of a long sentence. It might not be secure if someone has already hacked enough of your accounts to work out your pattern, but if you have dozens of accounts with different logins simultaneously compromised, that's on you.

I've got maybe 10 accounts that I really care about keeping secure - things like my bank and such where if someone got a hold of my account it would be a tough mess to sort out. Each of them has a unique password. But for most services I have login credentials for, I am not actually giving them any sensitive information. While I now use a password manager for these, before I just had a simple system for altering an otherwise standard set of passwords. It's not too hard to remember redd1t[standardsecurepassword], h@ckernews[standardsecurepassword], p0rnhub[standardsecurepassword], etc but as far as some random attack script is concerned these are all extremely unique and secure. If a human were specifically looking at it they could easily figure out the pattern and make some smart guesses, but even then I already give different emails to different accounts so I can tell who is selling my email addresses to spammers, and I had a few different secure passwords that I'd rotate, so only a tiny fraction would actually be in jeopardy. And again, there's nothing of value to be gained by hacking into these accounts. Overall I had maybe 15 genuinely unique passwords to remember, hardly a herculean feat. Now with the password manager, I still don't use it for my sensitive accounts, so I have like 8 passwords to remember; a relatively minor improvement.


Keeping a local database file secret is a pretty difficult task. You introduce a wider attack surface vs. a memory-based password.


Keeping it secret isn't as critical as you make it sound.

KeePass is open source. You can review the crypto, or pay somebody competent to review the crypto, or trust that the project or some 3rd party has done so.

I wouldn't go out of my way to publish my KeePass file publicly, but any attacker who can break the 256 bit AES encryption, or brute-force/dictionary-attack it's key that's using Argon2 KDF with enough rounds to take 1 second per key transform on my laptop, is well into the "I stand no chance against state level actors specifically targeting me" category, and I'll just assume I've lost to them already. In the immortal words of James Mickens: "If your adversary is the Mossad, YOU'RE GONNA DIE AND THERE'S NOTHING THAT YOU CAN DO ABOUT IT." If ASIO/CSIS/GCHQ/GCSB/NSA want access to my accounts, it's unlikely having passwords that are only in my memory is going to make much difference to my personal outcome. If a driveby teenaged script kiddie hits a zero day on one of my devices and pops my KeePass file, I'm not even sure I'd bother changing the passwords.

I'm happy enough storing the KeePass file on my (encrypted) laptop hard drives. I'm OK with using iCloud to sync it to my phone. I'm fine with it being part of my regular TimeMachine backups to a pair of external usb (encrypted) drives, and for a copy of that usb drive backup to be synced to an encrypted S3 bucket.


The problem with password managers was they were a commercial venture - not that commercial is inherently in the general case worse, but:

1. Closed source, so you cannot audit a critical peice of security infrastructure. 2. Perverse incentives - they want to make money, so they are naturally going to encourage new versions over old and deprecate support for old programs. 2a. If your company of choice has not great business they have an active incentive to sell your data (including bank passwords) on the black market. 3. A need to keep "Up to date" i.e. jam whatever hot takes into your app to up the selling appeal - you want your security to be very boring, having a bunch of new features mixed into every release is a recipe for insecurity and disaster. 4. Cloud access - this leads on from the last point, but as soon as you store your stuff on a third party server, even encrypted, your potential leaks go from your computer, to every device between you and the remote, and then some (all third party integrations). Which has the side effect, said companies must start (complex) security auditing practices with all the fun and failure points that brings...

Now, even on the open source side:

1. As soon as you have to update your password manager, you might as well throw away all passwords and start over: a) Can you really trust that no source was beached during the update? b) How do you know it is even a legitimate update? Better not have put your password for updating things in your password manager... c) It's open source, great, so you can audit it but...will you? d) Or will you just trust it and because some guy who wasn't getting paid and is trying to get through school and hold a part time job missed a critical bug, you end up with all your passwords compromised anyways. 2) Deserves status as its own point, Open Source is auditable but not necessarily trustworthy, not without a lot of active oversight.

As such, one can conclude that such programs are mostly collosal wastes of time, if not actively endagering security.

Even as a 'better than nothing', they are a bad idea, to the layfolk who don't know any better its just another potential bad practice they are getting drilled into them.

I would argue that writing down passwords on paper is usually a better practice than using a password manager, at least that can be locked up in your home (and if you can get into my home I have other bigger worries).

Instead we should focus on giving back some responsibility to the user - most sites don't need passwords, if you are using a password manager for those sites you should presume that password is low security.

It would be better if we could codify the importance of a password somehow.


I mostly agree, but I do find myself choosing a new FDE and login passphrases about once a year, and I wish that I could choose these using something like Diceware, but memorable enough that I wouldn't need to write them down at all. Thinking about how I might do that is what ultimately led to this post.


Login, like for your local computer? Why rotate those?


More often for e.g. work computers.


I'm curious if you have a rotation/audit practice for those? With 600 odd passwords, I'm not even sure how I would keep track of access to the items being protected.


Rotating/expiring random 25 char passwords is unnecessary.

One big advantage of a password manager is you _can_ audit accounts/passwords. I do a once a year sweep of my personal ones in KeePass, and use it as an opportunity to close accounts on services I no longer use. (Not that I believe any 3rd party service can be trusted to actually delete your data when you close your account, but spending 5 minutes updating your profile with junk data before deleting it improves your chances of not ending up on spam lists or automated credential stuffing attacks when that service gets popped.)

For the work shared passwords we use 1Password, which while I prefer their old standalone app over their new cloud thing, they do two very useful things - 1) integrate with HIBP's password checking service so it warns you when you have a password that's been published in a dump, and 2) provides an audit trail of which credentials each team member has ever accessed, so you can revoke only what's needed instead of rolling all shared passwords every time a staff member leaves.


That is auditing your access to a password. It is not auditing the use of said password at the service. Consider, social hacking to reset one of your passwords cannot be determined by inspecting your manager.

And rotation is still needed. Less likely that you're password is busted, I agree. But, you still need to rotate, if only to make sure it was never intercepted.


> 5 misspelled-and-with-random-punctuation words

Why misspell and add random punctuation?


So that they can't be found in a dictionary

Granted, 5 words chosen truly randomly from an English dictionary is already insanely strong, but why not make it slightly stronger?


Most likely to make dictionary attacks against the password(s) ineffective.


Also makes them more resistant to people looking over your shoulder.


I recommend changing the keyboard layout silently when someone is looking over your shoulder.


Maybe I just don't have trendy-enough coworkers or friends...but I know of no one who actually analyzes password strength in terms of Shannon entropy. Cripes, the very first sentence of the Wikipedia page for Shannon entropy tells us that it's an average.

Simple analogy - if the goal was to protect your house from a 9-foot-deep flood, would a dike with an average height of 10 feet do the job?


I've done a fair bit of research into this, and as far as I can tell, the entire internet does this thing you've never seen. For example, https://en.wikipedia.org/wiki/Password_strength#Entropy_as_a... implies the use of Shannon entropy.


[sigh...] +1, though you're making me feel d*mn old.

I won't tell you what decade it was, when I found that some "bright" user had picked his/her own office phone # (10 digits, 2 hyphens) to use as a "high security" password.

My own mental model - with a decent compression algorithm, and compression dictionary pre-loaded with popular passwords and personal information, how many bits would the specific password in question compress to? That also catches the clever folks who pick stuff like "abcdabcdabcdabcd" or "3.1415926535".


Yep one of those cases, where an ensemble average is not at all relevant for describing the situation.


When will we stop using passwords?! They are an elementary school kid “secret club” game taken way, way too far. They are totally broken. Nobody can come up with and remember good passwords. Nobody can store passwords securely. 100% busted.

Instead of continuing to debate what makes a good password, we need to put our energy into better techniques altogether! No more shared secrets! Let’s talk about one-time codes, asymmetric key cryptography, hardware tokens, anything but passwords!!


In one of my current web-based projects I decided to experiment with magic links sent via email. They are pretty convenient (and secure enough) but turns out there's a problem with mobile email clients: they tend to open links in isolated embedded browsers and then forget the cookies. For most non-technical people this is a show stopper unfortunately.

I then went with one-time 6-digit sign in codes that are emailed to the user. These are secure enough if done right, but now I'm wondering if they will feel secure to the users.

P.S. I might change it to a one-time alphanumeric code, which should feel more secure.


You know that the opened token is linked to the initial client session. You can unblock that session and the user can proceed in the non-isolated browser. You have this workflow with codes anyway: the user must open an email then go back to the browser and type in the code. With the link you will save on the typing.


Interesting! (I searched, even asked on SO, couldn't find any solution for this). So to elaborate: I first store a nonce in cookies as a login session token. Once the code is validated somehow, I unblock it on the backend, and on a first chance also set my main JWT cookie if not yet set. Excellent, that solves it!

P.S. Unless there are some security implications - need to think about it a bit more. Thanks!


I find this way less convienent because my password manager automatically fills in my username and password. So I can log in with 1 click. With "magic links" I need to enter my email (which may be autocompleted, but it is much less reliable) and then wait for the email to show up. (Assuming I have my email available.) Also email is never going to be reliably "instant" spam techniques include bouncing an email and waiting for a retry which is going to frustrate users and slow down login.

Additionally my security is now tried to the email I use which may be undesirable.

So I see why this exists, but please consider also supporting username+password at least until something else browser-integrated comes along.


I know but it comes at a price of some users who don't use a password manager setting silly weak passwords.

In one of my mobile apps that manages KeyChain user/passwords correctly, I still see a lot of password reset requests. I can't even think of a reason why people would ignore autofill so often. The result is, although I haven't checked, but wouldn't be surprized if there were still a lot of "password123"'s in the DB.

So neither are passwords a good option, it seems.


Don't let the user set the password, just assign them something random. If you let users pick their own passwords some of them are guaranteed to pick insecure ones (i.e. anything which isn't random and unique to that site).

Though frankly we should be able to do far better than one-off shared secrets for each account. WebAuthn, for example, with the browser as the authenticator, protected by either a client-side master password or biometrics. That would be at least as good as a password stored in a password manager, with the advantage that the user doesn't need to store (and sync) unique passwords for every site. To log in from a new device just enroll a second authenticator.


My experience with password managers is that it works that well on about 10% of websites/apps, and I have to resort to copy and paste from the password manager everywhere else. It's not that great


10% is pretty low though. In my case Safari does it right in maybe 80% of cases. However the ones (websites and apps) that do it wrong can be very annoying.


That seems incredibly low. Using Firefox's built-in password manager I definitely get >90% of sites. The only site that I use frequently where it doesn't work is my bank because the "card number" isn't recognized as the username.

But even copy-paste isn't too difficult. Roughly as much clicking as the magic-link solution in my experience.


Having to check my e-mail for each login is a major annoyance. Perhaps something like SQRL[0] may help.

[0]: https://sqrl.grc.com/pages/what_is_sqrl/


How is this better than oauth (assuming you use a provider who doesn't have or doesnt share your real name).


More annoying than passwords?


Of course. I can type a password from memory, or it can be auto-filled by the browser/password-mgr. No interruptions before signing in. Having to open email inbox means switching tabs/context.


You never forget a password and/or your password manager generates passwords and auto fills them with no hiccup for every website and app out there? Also changing passwords when a site policy requires it or when (not if) a breach happens? Because that is NOT my experience.


Yes, occasionally there is a hiccup with the password system. But email link or code is disruptive and irritating 100% of the time.


My password manager fills them in for me. E-mail means switching tabs and remembering which account I used to sign up for the service to know where to check.


Sometimes the magic links or codes expires in X minutes. That helps them feel secure.

But like password resets, you're hosed if your email is hacked (unless you have 2FA).


Not only expiration, you also limit the number of attempts, the IP address, you verify an additional nonce token generated for the specific request, etc.

The security of your email is typically taken care of by a more sophisticated system like GMail, that will do captcha, they remember your geographic region, your habits, etc.

Given the above, I'd say alphanumeric one-time codes are better in terms of entropy and feel. They look like passwords but you don't need to remember them.


So, no worse than passwords at all


I don't use a password manager and my email account is by far one of my most secure accounts so I actually love signing in to things like this. It beats having to remember another battery-horse-stapler type password. For more paranoid users you could add a 2FA option.


> one-time 6-digit sign in codes that are emailed to the user

> I'm wondering if they will feel secure to the users

I don't know about secure, but most users will feel extremely irritated for sure.


> Nobody can store passwords securely. 100% busted. [...] Let’s talk about

> one-time codes

One-time codes rely on a password: either it is stored in your 2FA App, or they rely on your email password, or they rely you storing a password somewhere else. OTP rely on stored secrets.

You can make these secrets be much larger than the humble password and call them "private keys" :

> asymmetric key cryptography, hardware tokens

If these are not protected by a passphrase they can be stolen. Which seems like a variation of "nobody can store passwords securely". To mitigate the effects of them being stolen, you need to protect them with a password.

I sympathize with your desire, but it's not that easy, although I do think that we can reduce password usage.

But fundamentally a password is a trust anchor in your brain. I have yet to find a way around this limitation.


I don't actually think passwords are broken at all - I mostly used to see this from people pushing biometrics or hardware tokens (people selling stuff).

How we use them, is very much broken. What is the point of a password that a bit of social engineering can bypass? Why are passwords required to get info on my ice cream rewards? Shouldn't I just get a coupon instead?

You should only use passwords that mean something and they should not be resetable, otherwise you have something closer to a one time token with a replay attack. Forget the password? Tough luck. Either it should not have needed one, or it should have some tangible effect which causes the user to highly value forgetting or getting it stolen.

We have engineered a state where we can't remember passwords because we are actively encouraged to ignore them, passwords are fine, how and when we use them is not.


“ fundamentally a password is a trust anchor in your brain.”

In other words, something you have (until you forget it). But also something you have to give to someone else after which all security bets are off.

A private key is also something you have (until you lose it). It is not something you ever have to give to anyone else. If you protect it with a password you don’t have to give that password to anyone else.

Big difference!


I feel like this whole argument is saying, "these other solutions have problems that somewhat resemble problems that passwords have, so just keep using passwords." Sorry I'm not convinced.


I'm curious how you think these other items work. They ultimately boil down to a shared secret that is beyond what you can remember. Which... isn't the best thing, necessarily.

Consider, if I leave my hardware token at home when I go on vacation, I'm basically locked out of all of my accounts. This is fine, as I typically plan for this to be the case. But it is an attack vector. I can't even audit my protected assets while away.


The point of public key cryptography is that there isn't a shared secret.


A shared secret is not an attack vector though, a reused secret is. If you reuse an asymmetric key, it will identify you across the world. But if you don't reuse an asymmetric key, then the point of asymmetric cryptography is moot and is no better than a password manager.


Using something like the Hierarchical Deterministic approach used for modern cryptocurrency wallets ("HD wallets") you can reuse a single master asymmetric key for any number of logins without linking those logins together.

In this scheme there is a single master private key which you protect in whatever way seems best and never share with anyone. From this master private key you can derive any number of subordinate private keys, each with its own public key. You share one of those public keys with each service, along with the derivation path, and authenticate using the corresponding private key. Only the unchanging master private key needs to be stored, so unlike a password manager there is no need to make new backups or sync a password database across multiple devices when you set up a new account. Best of all, without either the master private key or the corresponding master public key there is no (known) way to show that any two subordinate keys were derived from the same master key—they appear unrelated.


Identifying yourself is the whole point of logging in to a website. Yes, if you want to maintain a distinct digital identity for each website you’ll need to use a different key pair for each website, which by the way is actually feasible.

With our current email-address-and-password scheme that is so difficult to do it might as well be impossible. You’d need a distinct email account for each distinct digital identity that you want to maintain.


If a web site wants your email address or phone number, it will require them, asymmetric cryptography won't help you. On the other hand, this very site uses password authentication, but doesn't require an email address.


While technically correct, a private key will usually need to be encrypted to provide adequate security.

From the user point of view, they will still need to remember the password to unlock this private key.


Further to this point, the shared item is moved to a public key and the infrastructure to facilitate communication. Such that it is not a panacea. Getting away from trust in the system is...

Likely you will envision a system to register your key. And then you have to have a bootstrap to authenticate to this system. Probably a password.


Right, if it’s not a panacea then we should definitely not pursue it.


Not my point. Just not clear that the alternatives actually are better. Again, I use hardware tokens. Not seeing my family join me on that anytime soon.

List of problems with every approach always falls back to, "what happens if you lose it?"

And the resolution to that is always outside of the technical chain.


But from a security point of view this is a massive improvement! That password is never shared with anyone else. Someone has to first get ahold of your private key before they can start brute forcing that password in order to steal credentials. This is not a trivial thing! Plus users only need that one password, not a unique one for every website/app.


More generically than "shared secret" which is one implementation, the idea is shared trust.

We both (client and server) trust some common background info. It can't be hand-waved away because that trust must exist or be established.

Shared secrets (passwords) are close to an optimal solution when considering all possible criteria. Various forms of PAKEs can be better sometimes, but not very popular. Other solutions address different threat models, often with more significant tradeoffs than a shared secret.


How is any of that worse than passwords? I only see improvements. Still not perfect, but big improvements.


My question is how is it any better? A weak link is the trust chain, in both.

For problems, easy ones to consider is that of access. Easy to forget to take a key with you. Or lose it in a fire/disaster.

You can also be compelled to turn over a hardware token. Or a digital file.


Hopefully never. Or at least not in the near future. Almost every other method is going to have privacy implications because they will rely on something you have or something you are. You can't compel passwords in a similar way that you can compel people to give physical tokens or fingerprints or retina scans.


Fingerprints and retinal scans are not something I proposed. Your point is valid for those. Your point is not valid for hardware tokens vs. passwords. Especially when you consider that passwords used for authentication have to be written down somewhere and shared with a third party for them to work at all.


I agree and go even further that "something you have" is equivalent to a password. Even though "something you have" tends to be a public-secret key pair.


PKI and physical tokens, preferably not involving plugging into any ports (NFC devices) have been my suggestion for most of a decade now.

Passwords were adopted when comuting was something that occurred at a specific facility and the goal was to keep the people, largely the users one already knew of, out of one-another's accounts and data.

The persistence of passwords in a world of global access and billions of devices is ... ludicrous.

And the failure of both enterprises and governments to identify better standards and practices is criminal.


Theres some nice research on a framework to assess different authentication schemes, e.g. security, usability, etc.

Turns out passwords aren't great at most things but the alternatives often have big downsides.

https://www.microsoft.com/en-us/research/wp-content/uploads/...


The US Government thought they were bad, and got rid of them. In 2004 (Thanks George W. Bush!)


Kolmogorov complexity/entropy is more suitable for this purpose, under the implicit assumption that password crackers don't have tailored prior knowledge and are just enumerating "simple" sequences. It only agrees with Shannon entropy on long ergodic sequences. The author basically constructed an example where the two notions don't agree.


How would you estimate the Kolmogorov complexity for the author's example?


Kolmogorov complexity is only unambiguously defined asymptotically, and "asymptotics is merely a heuristic". It is also uncomputable. So, to use entropy arguments for passwords, the only correct way I could think of is to generate long and (elementwise) random passwords.


You asserted that Kolmogorov complexity will disagree with Shannon entropy in this example, so how do you know what the Kolmogorov complexity of this example is?


It is a random variable in this setting, as it is a function of the randomly generated password. Given a deterministic sequence, you find the definition of its Kolmogorov complexity in textbooks/Wikipedia/etc. By saying the Kolmogorov complexity will disagree with Shannon entropy, I meant the former, which is a random variable here, does not converge to the latter, contrary to the standard asymptotic setting which probably gives people the idea of using entropy to characterize password (I don't know, don't work in security).

The point of my original post is that the asymptotics break down here, and this phenomenon is not poorly understood, at least in some other communities. It is not meant to provide an alternative that is always well-defined and useful, although as I said in the grandparent comment, there is the useful implication that you can stay safe by sticking to the asymptotic regime.


It's easy to calculate an upper bound on the Kolmogorov complexity of a given value (you just have to exhibit a Turing machine that computes it), but very hard to prove a lower bound (you would have to prove something about all possible Turing machines).


By giving the password to a good compressor? (and then computing the shannon entropy of the result) Yet i'm not sure i know a good compressor for short strings... Perhaps something like gpt2tc tailored to passwords instead of english text.


The implicit assumption however isn't good. Password crackers regularly make use of prior knowledge. A password that consists of a Shakespearian Sonnet for example has very high complexity but makes for a bad password.


Kolmogorov complexity kinda does account for "prior knowledge" (that's why it's not computable). A shakespearian sonnet will have low kolmogorov complexity (there's redundancy).


There's a bit of a logical flaw here in that the argument is made against average entropy of a set of passwords, rather than individual entropy of each chosen password.

This is an argument I can't find anyone making: an aggregate average entropy of the set of all passwords you use is fine for password security, rather than the entropy of each individual password.

As far as I can tell this seems to be a (possibly intentional?) misunderstanding on the author's part.


Entropy and min-entropy are properties of distributions, not of individual samples from those distributions. So there's no meaning to "the entropy of each chosen password".


Despite that slight misuse of terminology, the point stands: the article talks about estimating the entropy of a distribution used for generating a password, but the important thing is the “distribution” an attacker is using for guessing the password.

A single password should instead be treated as a sample from a (plausible) attacker’s distribution, and the complexity of that password can be used to estimate the size of the sample space required for that plausible attacker (as in, how many guesses/how much work they’ll have to do). This is, AIUI, the approached used by libraries like https://zxcvbn-ts.github.io/zxcvbn/

The entropy of a distribution for generating passwords matters when generating them in bulk, such as OTPs or implementing a password manager. This doesn’t seem to be the situation being discussed in the article, which is more about rating a user-provided password.


Zxcvbn is also a good idea, but it's a complementary approach. The user or password manager should generate secure passwords (using a high-min-entropy distribution), and the website or application should check that they're secure (using zxcvbn or similar).

Of these two approaches, a high-entropy generation method gives more confidence. It gives a mathematical strength "guarantee": if you design and follow the method correctly, then an attacker, whether or not they know the generation method, is mathematically unlikely to guess your password quickly no matter what order they guess in. "Guarantee" is in quotes because of course the attacker could get very lucky or the user could get unlucky (eg generate a uniformly random 8-character string and it happens to be "password"), and also if there's eg an implementation flaw then your guarantee isn't worth the pixels it's printed on.

By contrast, zxcvbn has no guarantee, because it doesn't use a huge curated dictionary and generation mechanism that the attacker is likely to use. So in addition to missing well-known passwords like "correct horse battery staple", it will miss bad passwords related to current events.


A single password represents a distribution of possible bit values for each byte within it. The password itself is a distribution of characters used within the password.

In fact, the author's article makes this very point, which is why I pointed out the logical flaw in the thinking.

I'll reduce N to 6 for simplifying the author's absurd example but it can expand to any N.

If we take the argument to hold that you roll a random die of N length (6 in our case) and the upperbound represents one strong password, while all other values equate to the word "password", the flaw is in how this logic is applied.

Imagine this is our set of possible values:

password, password, password, password, password, hj5^@l2jl9GGk;Clkm(0]

It makes little difference if you look at this as either the bytes involved in the entire set, or the average of all passwords within the set, it's going to come out looking like you are secure.

This means what they're attacking is all permutations of the following set of characters:

a, C, d, h, j, k, l, m, o, p, r, s, w, G, 0, 2, 5, 9, ;, ], @, ^, (

What an attacker must know though, is the character set used within, as well as the length. This is the logical flaw the author made in their analysis. For an attacker, the entropy of an individual string is taken as possible character permutations required to discover the true password and NOT permutations of the entire strings themselves.

If you look at the values for each string presented in our set, what an attacker has to attack is:

a, d, o, p, r, s, w

C, h, j, k, l, m, G, 0, 2, 5, 9, ;, ], @, ^, (

But in order to attack these, they need to try the full set:

a-z

a-zA-Z0-9;:[]!@#$%^&*(){}

One of these will be VASTLY easier to break.


I don't understand your argument at all. Why does an attacker need to try a full set of characters? Real attackers try from dictionaries or password generation methods (eg dictionary + numbers, dictionary + dictionary + number + symbol, etc), and "password" is one of the first passwords they'll try. They do this because they don't know exactly how you generated the password, but due to password leaks, they do have a pretty good idea of how most people generate passwords.

In principle, you could estimate a password's strength by the order in which a cracker would be expected to guess it. But that's a pain, depends on the password cracker being used, and can change at any time. Also, it's not "entropy", which is a well-defined mathematical concept and is what the linked article is about.

Entropy is supposed to be a bound that even if the attacker knows your generation method, they won't be able to do better than brute-force search. For this, the author is correct that min-entropy or a similarly conservative measure is the right one; though for the most common (uniform) generation methods this is the same as Shannon entropy.

Entropy of the set of characters used in your password (well, sets don't have entropy, but let's say of the uniform distribution on that set) isn't the same as entropy of password generation mechanism, because the attacker might have more information. For example, if he knows (or correctly guesses) that your password is a dictionary word, then this is super helpful information that isn't captured in the entropy of the bytes.


> I don't understand your argument at all. Why does an attacker need to try a full set of characters? Real attackers try from dictionaries or password generation methods (eg dictionary + numbers, dictionary + dictionary + number + symbol, etc), and "password" is one of the first passwords they'll try. They do this because they don't know exactly how you generated the password, but due to password leaks, they do have a pretty good idea of how most people generate passwords.

I'm well aware. How does this help the attacker attacking the higher-entropy string I outlined?

How difficult is it for an attacker to attack a password consisting of four lower case english dictionary words?

If you run some of these permutations through John, you'll see how long it takes just to generate even quick broken hashes like MD5 versus using something that is a long string of essentially type-able byte data.

> Entropy is supposed to be a bound that even if the attacker knows your generation method, they won't be able to do better than brute-force search. For this, the author is correct that min-entropy or a similarly conservative measure is the right one; though for the most common (uniform) generation methods this is the same as Shannon entropy.

I'm not sure who has dictated that this is supposed to be how entropy is used for password management. Do you have any references here? Because otherwise it looks like it's still the author and yourself assigning a set of rules to something that doesn't actually apply in the real world and doesn't represent how things are used in practice.

My entire point is that the author has taken an incredibly narrow definition of what entropy must be applied to (only to the distribution of the overall set of characters used in the example) and how it must be used in this circumstance, and argued against that.

Where it falls down is this: The entire purpose of using entropy as a measure of difficulty of cracking a password is precisely the character set approach. If you were to type "password" into any system employing a Shannon entropy analysis on the set of characters required to generate that password, you would at worst have to generate 26^8 combinations. Dictionary attacks are good because they reduce that from around 208 billion to about half a million. 208 billion is not a high enough number, and these systems will tell you it's weak. Smarter ones will probably alert you that it's a dictionary word as well.

If the issue is that people are "misusing" the term entropy for passwords here, that's fine but that's a different article (and I'd still disagree).


> I'm well aware. How does this help the attacker attacking the higher-entropy string I outlined?

Well, suppose the attacker is aware of your password generation method (e.g. it's in an open-source password generator, or you wrote down your method and someone stole the description). You have specified the generator as { 5/6 "password", 1/6 "hj5^@l2jl9GGk;Clkm(0]" }. In this case, the attacker will guess the password pretty quickly -- on the second guess at worst -- even in the 1/6th case that it is "hj5^@l2jl9GGk;Clkm(0]".

This is because the string "hj5^@l2jl9GGk;Clkm(0]" doesn't intrinsically have entropy. The generation method is what has entropy -- but in this example, not very much entropy, which is why you got hacked.

> How difficult is it for an attacker to attack a password consisting of four lower case english dictionary words?

It depends on the dictionary and the cost to guess a password. If you choose from, say, the 3000 most common dictionary words, then it will take the attacker 3000^4 = 81 trillion guesses to guess 4 of them. If the application has appropriately used salt and strengthening, such that it takes eg 10 core-ms to check a guess (with a function like argon2 that's annoying to run on a GPU), and the attacker throws 1000 cores at the problem, then this will take about 81e12 * 10e-3 / 1000 / 86400 / 365 = 25 years to exhaust the entire space, or half that on average.

Of course, the attacker could use more than 1000 cores, so this difficulty is surmountable, but it is pretty expensive to break. If your account is high-value, then 5 or 6 words would be a better choice. Also, if the service doesn't strengthen the password, and the attacker can acquire the hash, then 4 words is definitely not enough.

> I'm not sure who has dictated that this is supposed to be how entropy is used for password management.

I'm not sure what you mean by "supposed to be used" or "dictated". You don't have to use entropy to analyze password management, but it does make for a good analysis. The theory has been around for decades. See eg https://diceware.dmuth.org.

Theorem: if you sample a fresh secret (e.g. a password) from a distribution D of min-entropy x bits, and if an attacker then tries to guess it based on no other information (i.e. they might know D but they didn't like, already phish the secret), then in N guesses they will succeed with probability at most N/2^x.

Proof: By definition, the probability that any one guess is correct is at most 1/2^x, so the overall probability is at most N/2^x by the union bound. Easy peasy.

Note that this theorem does not hold if min-entropy is replaced by Shannon entropy, which is usually what people mean when they say "entropy" without qualifications. Note also that it makes no assumptions about character sets. The character set would only be relevant if each character were chosen iid, or if the attacker decides to attack the password as if this were so.


Individual passwords don’t have entropy.


You can calculate Shannon entropy of an individual password, as the number of bits per char.

A password of "aaaaaaa" will have a much lower entropy than "axipeY7".

What am I missing?


Replying to self...

I guess it's not a useful measure of password strength, even if possible. Any password that doesn't repeat any letters will have identical entropy by this measure.

So 123456789 will be the same Shannon entropy as Ar4e$hUa^


Entropy is really a measure of password length.


That's basically right for passwords.

Of course, if we impose password complexity requirements (e.g. must have a digit or an uppercase letter), it actually reduces the entropy in the password!


Entropy is a measure of the potential state space. So password length matters a lot but so does the size of the character set.


The real question here is if there are any actually used password strategies where this distinction matters? In practice, no one would ever use the type of password strategy described.


This is a fair question; I've been thinking about "weird" password choice strategies recently, for which it can matter. For example, if you want your password to be an English sentence, choosing sentences based on random parse trees will produce duplicated sentences with ambiguous parses.


It’s important to remember that attackers get no information on how close they are (assuming good hashing practices). It is unknowable to them if you went with the correcthorsebatterystaple approach or placed your cat on the keyboard for a few minutes. Given that, a simpler alphabet with longer strings > more complexity with shorter strings.


Cool example. An attacker will take 2^234 guesses on average to guess the password, but that's an average of 19 1's and one enormous number. So the attacker will usually guess the answer quickly. It's kind of like the St. Petersburg paradox in that the expectation value doesn't reflect typical behavior.

Seems like this might be a use case for "dispersion" (the second moment of entropy) [1].

[1] https://math.stackexchange.com/questions/1626522/higher-mome...


The argument feels like a straw man.

He seems to be saying, if your password selection strategy skews towards really weak passwords, and you measure the Shannon entropy of the distribution, it won't reveal that this is a bad strategy.

I don't know anyone who would actually do this and declare a win "because Shannon".

At best, it's mildy interesting that Shannon entropy on its own isn't going to give you a useful answer if you have a weak strategy.


I thought it's the entropy of the chosen password not about the entropy of the possibilities of password you could choose


Entropy of a single password isn't actually a well-defined concept; entropy is always about a distribution. "Entropy calculators" that look at your password and tell you "its entropy" are making assumptions about how you chose the password.

We care about the distribution from which you drew the password, because that lets us analyze how difficult it would be for an attacker who knew your password selection process to brute-force the password. Just knowing the password itself isn't enough information to determine that (though of course you can judge how hard it would be for an attacker once you know their brute forcing strategy).


I typically use a phrase from my life e.g.

MathsDegree@StamfordWasABigWin [1] RanThroughAPlateGlassDoorWhenTen [2]

with some esoteric obfuscation rules.

1. I don't have a maths degree from Stamford. 2. Did happen, not one of my passwords.


Hasn't this problem been solved for decades by diceware?

Use words as your characters with a dictionary of a few thousand words. Assume an attacker knows the dictionary. Make passwords that are too long to brute force (40+ characters). Use enough words that a dictionary attack is also infeasible (4+). Add a salt if you're feeling extra spicy.

Entropy is sufficient if you use the right language model.


This is a good place to advertise https://phrase.shop - a webapp I wrote that makes secure yet memorable passphrases.

It makes entropy requirements explicit, and you can even roll your own dice to supply the required entropy to generate your passphrase.

Try it, it's fun!


It seems like it is still sufficient for passwords that are generated in a normal way.



Instructions unclear, password on all sites is now "correct horse battery staple".

Inspired by this, there's a package https://github.com/dropbox/zxcvbn to estimate entropy and give suggestions.


Fundamentally is there any flaw with this method? Or a reason why it isn't better than general password approach?


You can only remember a limited number of passwords regardless of whether it's a sequence of words or a sequence of random characters. The main flaw in all these schemes is that you have to remember them. The only viable option is to use a password manager.


It's vulnerable to the dictionary-based attacks that are very common.


Diceware is designed to make passwords against dictionary attacks. Estimates of diceware entropy begin with the assumption that an attacker has the dictionary. A dictionary with 6^5 entries would take 6^5^N guesses to exhaust (assuming the entries are randomly chosen). 6^5^4 = 2^52.


That is a sadly too often repeated lie. If you know otherwise please explain/link how the attack works, how can you guess the 4 words? Effectively, that would mean requiring much less than 2^44 attempts as xkcd explains.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: