That's a great question, and I wish I was good enough at math to give you a sophisticated answer. But my thinking is that the entropy you might gain by allowing duplicates is negated by the huge set of weak/guessable passwords you allow. For example, the password "aaaaA1!" is probably more likely to be guessed or used by others than "agkxA1!". (I just checked on haveibeenpwned.com, and the former has been seen 12 times, while the latter has been seen 0 times. Not very scientific I know)
Though this isn't set in stone if someone wants to formally correct me.
If you use passwords of a small enough size that this would really be a problem (like four digit PINs or your "aaaaA1!" example) then your password isn't delivering adequate security against brute force and so you've definitely lost.
If you use passwords that are big enough to make brute force impractical anyway then this "feature" will never make any real difference and is just a waste of time at best, and since it adds complexity it's another place to hide bugs.
Having a PwnedPasswords check (not this silly "repeating characters" test) makes sense if you allow users to enter passwords. Whereas if you generate passwords of decent length and at random then they're random so there's no purpose in checking them.
>If you use passwords of a small enough size that this would really be a problem (like four digit PINs or your "aaaaA1!" example) then your password isn't delivering adequate security against brute force and so you've definitely lost.
I can't force users to use reasonably long passwords. In the case that they don't (and some certainly won't), it's preferable that their password is still as good as it can be given the length.
When it comes to web logins, brute force is probably the least effective method of attack. Social engineering, dictionary attacks, rainbow tables, password lists, guessing and so on are all much bigger concerns. It's my thinking that most of those threats are better addressed by sacrificing a negligible amount of entropy and removing a large subset of weak passwords.
Moreover, this reduction in entropy only applies when the attacker knows the algorithm used to derive the password, making it even less relevant for the average case. (That's not an appeal to security through obscurity; just an observation).
>Whereas if you generate passwords of decent length and at random then they're random so there's no purpose in checking them.
Random != secure. "aaaaaaaaaa" could be the result of a random function. The goal is to create passwords that are both difficult to brute force, and difficult to guess, while making no assumptions about the length.
Entropy has diminishing returns; if it takes 10 million years to crack a password, another 5 million doesn't increase security. However I would re-consider if someone could provide a concrete example of how the small loss in entropy could lead to a practical vulnerability.
> I can't force users to use reasonably long passwords.
But you can, apparently, force them to use passwords that meet whatever other weird criteria you choose. So this is a statement of policy. You've decided by policy to allow "agkxA1"† but not "V+0mCx&3LmgyC" (because the latter has a "duplicate letter C") and that's crazy.
> Social engineering, dictionary attacks, rainbow tables, password lists, guessing and so on are all much bigger concerns.
Since you enumerated them let's run down the list and actually examine what they are and how your "no duplicate characters" rule addresses them or doesn't.
Social engineering: bad guys persuade the user to give them the secret. Only brick wall UX works well here, so that's WebAuthn / U2F and similar. Your rule makes no difference.
Dictionary attacks: bad guys have a list of passwords to try. Some of those passwords have "duplicate characters" and so are eliminated by your rule. But others do not.
Rainbow tables: it's weird to call out Rainbow tables specifically and suggests you're cargo culting. The Rainbow table is just a very specific optimisation of a time-space tradeoff attack on password hashing (the main thing salt mitigates). Your rule actually makes things marginally easier for attackers in this scenario, again if passwords are short you lose, if they're long enough you don't gain anything.
Password lists: You already listed a dictionary attack. A password list is just a dictionary.
Guessing: Guessing is a brute force attack, the same thing you said was "least effective".
Needlessly making your software more complicated is the way it can lead to practical vulnerability, nobody is going to be able to explain this to your satisfaction and so all I can do is recommend that people avoid it.
Your "add characters from different classes into a pool, then shuffle it to make the password" approach is just slightly worse than actual random long passwords, but the added complexity to make it slightly worse is the problem.
† Actually this isn't allowed, the algorithm actually used insists upon putting at least one character of each class into the password before any others are added. But this far too short password does obey the "No duplicate characters" requirement.
>But you can, apparently, force them to use passwords that meet whatever other weird criteria you choose.
No one is forced to use the random password generator, and even if I set a minimum limit they could just chop it up to their liking. All I can do is guide users towards a criteria I think provides the most effective security.
>You've decided by policy to allow "agkxA1"† but not "V+0mCx&3LmgyC" (because the latter has a "duplicate letter C") and that's crazy.
I didn't suggest that all passwords that contain duplicate characters are weak. But if we were to simply use a random string of characters without discrimination, then we would have to allow passwords like "aaaaaaa" and "abcdefg123", which is unacceptable in my opinion. If you agree that there should be some sort of policy that guarantees certain properties, then I'm confused by your position, as that would contradict the main point of your criticism centered around code complexity.
The list of attack vectors was not to suggest that removing duplicate characters was a solution to all of them (I disagree with your assessment but we'll leave that alone for now). I was merely highlighting the fact that brute force attacks are one of the least important factors in securing web-based accounts.
If you are able to point out a concrete example of a vulnerability introduced by disallowing duplicate characters (that includes bugs in the code caused by the added complexity) I'm all ears/eyes. If not, I'm going to call an end to this debate for now. I do appreciate your input though and it's definitely given me something to think about.
Increasing entropy mitigates brute force attacks but not necessarily rainbow table attacks, hence the distinction. If every user had a unique password, rainbow tables would be rendered useless. This is why it's important to reduce the likelihood that a randomly generated password is comprised of a common pattern.
> Increasing entropy mitigates brute force attacks but not necessarily rainbow table attacks
No. All time-space trade-offs need to expend the entire attack effort once (and usually it's considerably more). Their advantages are that you can do this in advance (timeliness) and that you can re-use the product (which is what salt mitigates).
You can very simply increase entropy until the attack effort cannot be deployed at all, regardless of when. For example the 'pass' tool lots of people have mentioned in this thread defaults to 24 character random passwords, far more than 128 bits of entropy. As a result it simply isn't possible to deploy the attack effort even once, you can neither do a Brute Force attack nor build the Rainbow Table to find these passwords.
The way the Rainbow Table got famous is its application to the LM Hash, an old (but already terrible when it shipped) Microsoft password hash. LM Hash uses two 56-bit values to represent a password, naively this looks like 112-bits of entropy but you can attack each independently so it isn't.
Rainbow Tables took attacking this from something that could in theory work on any password but you'd get bored waiting for your "crack" program to finish if it wasn't trivial to a few minutes on a fast laptop for every possible password. Because it's a time-space tradeoff, so somebody put all the effort in once and then you can re-use it.
Yes. For example, if a rainbow table contains a match for "a" repeating 500 times, then that password's entropy is a non-factor. Therefore entropy in of itself does not necessarily mitigate rainbow table attacks. Entropy does necessarily mitigate naive brute force attacks. We can use this same logic for more reasonably sized passwords that are comprised of known patterns but would otherwise be difficult to brute force.
Obviously you could use extremely long passwords and be fairly certain that you won't end up with something that would be found in a rainbow table or dictionary. You could also just use passwords that are long enough that they can't be brute forced and remove common patterns from the set of possible passwords. Both of these solutions are valid, and both have theoretical weaknesses that are irrelevant in practice.
You've spent a significant amount of time trying to convince me that my random password generator is flawed in some way. Why don't you just demonstrate how instead of continuously trying to argue tangentially related theoretical points? I would hope that such an effort would be motivated by a practical concern and not merely the desire to bikeshed out of boredom.
> Yes. For example, if a rainbow table contains a match for "a" repeating 500 times, then that password's entropy is a non-factor.
This muddle reflects your overall intuitive mistake.
When you roll two fair dice the "snake eyes" (two ones) outcome has only one chance in thirty six. But if you've just rolled snake eyes, the probabily it was snake eyes isn't one in thirty six it's 1.0 exactly. We do not say, as a result, that you were "certain" to roll snake eyes, you weren't, but it has happened anyway and so now it won't change. Entropy is not a measure of what did happen it's a measure of the unknown. Password generation entropy doesn't result in passwords which change constantly, it just means you can't know what the password might be within that parameter.
Constructing a rainbow table where the choice function just cycles through some number of arbitrary guesses - and then saying if those guesses are right the rainbow worked, therefore rainbow tables were a success skips the step where it's overwhelmingly likely that none match, it fails and was a total waste of resources. Nobody actually does this because it's pointless.
When we tell humans to pick randomly they behave like you, and your algorithm - trying so hard to pick a "non-obvious" answer and thus inadvertently they're predictable. Your algorithm is, as a result, needlessly predictable compared to sane "random password" code and yet also significantly more complicated.
Look at the number line. See the first number you think isn't interesting? Well the fact it's the first non-interesting number is pretty interesting, isn't it? So now maybe that's not the first one that isn't interesting after all. Next one, same argument. Likewise your removal of "common patterns" just means the set of possible passwords is made smaller, and leaves remaining passwords with the "common pattern" of not having those "common patterns". It's futile, stop it.
If you want to put your intuition about what "random" ought to be like, put it into a playlist shuffle codebase or something, where humans will appreciate how properly "random" it feels to always get a nice mix of things. In security software this is a mistake, no matter how sure you are that it's common sense to do it your way.
You missed my point again, and your tangent on randomness is unnecessary; I have no misconceptions of how randomness works, and it is precisely that understanding that has lead me to these decisions - a pure random password generator produces every word in the English language. That's not a good thing!
The point remains that if your password generator produces passwords such as "aaaaaaa" then it is a bad algorithm, end of discussion. It doesn't matter if the passwords it produces are completely random. That's not the goal and is completely irrelevant. The goal is to produce unique, unpredictable passwords that utilize randomness. We're not producing fixed-length keys. In the average case for web-based logins, uniqueness is far more important than entropy.
So, I'll say it once more. If you can demonstrate how passwords generated by the algorithm I wrote are predictable or otherwise insecure in a real-world setting as you claim, then do it. If you cannot, then any further responses are in vain.
> The point remains that if your password generator produces passwords such as "aaaaaaa" then it is a bad algorithm, end of discussion.
The correct algorithm does this, and you've managed to convince yourself that this is bad and it's time to invoke lots more complicated code. This is the end of the discussion in the sense that you've departed from reality so severely that you may be unable to recover.
For those who can't read shell that's saying to run the random data from the OS kernel through a pipeline which removes everything except $characters and then consume $length bytes of the result as the new password.
This absolutely can give you "aaaaaaa" as a password if you've unaccountably chosen to use such short passwords - but no more easily than it might choose "fuckwit" or "X3_$mwK" or any other sequence of permitted symbols.
But enough about how to do this correctly, you're very focused on your bizarre way to pick "unique, unpredictable passwords that utilize randomness", so let's look at that again:
The tight inner loop picks characters randomly using code from Sodium. Unfortunately it discards characters from the candidate list once it has picked them, and then it has a further rule which may ignore this (already discarded) character and go round again.
For the short case (size = 4) this means it can produce only about 5.2 million different passwords, whereas a better (simpler) solution gives 78 million different passwords, you've made it more than a decimal order of magnitude worse.
> For example, if a rainbow table contains a match for "a" repeating 500 times, then that password's entropy is a non-factor.
But it doesn't. You seem to be taking it as gospel that this sort of password is going to show up in the early in the list of passwords people are trying. But I don't buy it. There is no need to reject a long password that contains some repeated characters.
Why doesn't it? If that password became public knowledge, then it certainly does exist in lists and tables. Its high entropy is only protective as long as it remains secret. This is why it's important to avoid common patterns, even if those patterns are a result of a random number generator.
>Every password that becomes public knowledge ends up in credential stuffing lists, whether it matches your password policy or not.
That's right. And we don't want to produce passwords that are likely to be on those lists. A simple policy greatly reduces the chances of that happening. After a certain number of zeros, entropy is no longer a concern.
>"Common patterns" and "passwords that contain repeated characters" are not even remotely the same thing.
Though this isn't set in stone if someone wants to formally correct me.
And libsodium is indeed a pleasure to work with.