To me this is completely unrelated to the quality of the PRNG, because security is explicitly a non-goal of the design. A general-purpose non-cryptographically secure PRNG is evaluated primarily on speed and uniformity of output. Any other qualities can certainly be interesting, but they're orthogonal to (how I would evaluate) quality.
Right: put differently, why would you bother to select among the insecure RNGs an RNG whose "seed" was "harder" to recover? What beneficial property would that provide your system?
CSPRNGs have all of the desirable properties for the output.
All else being equal, I don't think it is possible for a trivially reversible generator to have better statistical properties than a generator whose output behaves more like a CSPRNG.
It can definitely be good enough and or faster, though.
Right, I think defaulting to a CSPRNG is a pretty sane decision, and you'd know if you had need of a non-CSPRNG RNG. But what does that say about the choice between PCG and xorshiro?
Defaulting to a CSPRNG pre-seeded with system randomness is not a bad choice per se(especially given many users don't know they need one) but current ones are much slower than the RNGs we are discussing.
If you're going to provide a non-CS one for general simulation purposes, you probably want the one that is the closest to indistinguishable from random data as you can without compromising performance, though.
Some people will have more than enough with a traditional LCG(MC isn't even using RNGs anymore) but others may be using more of the output in semantically relevant ways where it won't work.
If Xoshiro's state can be trivially recovered from a short span of the output, there is a local bias right there that PractRand lets through but that your application could accidentally uncover.
The choice is: Are the performance gains enough to justify that risk?
Why does it matter if the state can be trivially recovered? What does that have to do with the applications in which these generators are actually used? If the word "risk" applies to your situation, you can't use either xorshiro or PCG.
This is too deep to reply but if a bit is dependent on the value of a bit a couple bytes back then it is not acting randomly.
It's not about security.
I hope you can agree that if every time there is a treasure chest to the left of a door, a pink rabbit spawns on the top left of the room, that's not acting very random-like.
I'm not taking a position on the perceived added value of PCG over Xoshiro.
The property you're talking about (next bit unpredictability) is important for a CSPRNG, but it doesn't matter at all for a PRNG. A PRNG just needs to be fast and have a uniform output. LCGs, for instance, do not have next bit unpredictability and are a perfectly fine class of PRNG.
The paper that triggered this thread "breaking" PCG sees it as potentially in the same class of issues as using RANDU.
> our results […] do mean that [PCG']s output has detectable properties. Whether these properties may affect the result of Monte-Carlo numerical simulations is another matter entirely.
Again this is on PCG which required a breaking effort.
The short version of Xorshift as originally presented by Marsaglia outputting its whole state for example is bound to have behaviors like my room-generation example emerging fairly easily. Particularly, with low hamming-weight states.
I doubt Xoshiro's output is that bad but if presented as trivial to recover vs PCG, that to me indicates potential issues when using the output for simulation.
Why not just read 64 bits off /dev/urandom and be done with it? All this additional complexity doesn't actually buy any "extra" randomness over this approach, and I'm skeptical that it improves speed either.
The problem is, there's around 2^62 double precision numbers between 0 and 1, but they're not uniformly spaced-- there's many, many more between 0 and 0.5 (nearly 2^62) than there are between 0.5 and 1 (around 2^52), for instance.
So, if you want a uniform variate, but you want every number in the range to be possible to generate, it's tricky. Each individual small number needs to be much less likely than each individual large number.
If you just draw from the 2^62 space randomly, you almost certainly get a very small number.
Seems to me that the simplest solution would be to repeatedly divide the range of numbers into two halves, then randomly selecting either one until it converges onto a single value. In C this might look something like this:
double random_real(double low, double high, int (*random_bit)(void)) {
if (high < low)
return random_real(high, low, random_bit);
double halfway, previous = low;
while (true) {
halfway = low + (high - low) / 2;
if (halfway == previous)
break;
if (random_bit() & 1)
low = halfway;
else
high = halfway;
previous = halfway;
}
return halfway;
}
That should theoretically produce a uniformally-distributed value. (Although perhaps I've missed some finer point?)
So you have two doubles halfway and previous and a recursion that depends on if(halfway==previous) to terminate, where halfway is the result of a floating point calculation. You sure that’s going to work? I suspect it will frequently fail to terminate when you think.
Secondly, why does this generate a uniform random number? It’s not clear to me at all. It seems it would suffer the exact problem GP’s talking about here, that certain ranges of numbers would have a much higher probability than others on a weighted basis.
> Secondly, why does this generate a uniform random number?
Each interval of equal size occurs with equal likelihood at each step.
Consider that you want to generate a random number between 0 and 1024 (excl.). The midpoint would be 512, thus you choose randomly whether the lower interval [0, 512) or the higher interval [512,1024) is selected. In each step, the range size is independent of the concrete numbers, i.e. for it is exactly 2^(-step_size) * (high - low), and in each step each range has equal probability. Thus if the algorithm terminates, the selected number was in fact uniformly sampled.
I would also presume it must terminate. Assume that the two endpoints are one ulp apart. The midpoint is thus either of the two, there is no randomness involved (barring FPU flags but they don't count). In the next step, the algorithm either terminates or sets the endpoints equal, which also fixes the midpoint. Thus the procedure always returns the desired result.
The issues that the GP is grappling with are largely due to the fact that they are trying to "construct" real numbers from a stream of bits. That is always going to lead to bias issues. On the other hand, with this particular algorithm (assuming a truly random source) the resulting number should be more or less completely uniform. It works because we are partitioning the search space itself in such a way that all numbers are as likely as any other. In fact, that the algorithm terminates rather predictably essentially proves just that. After one million invocations, for example, the average number of iterations was something like 57 (with the minimum being 55 and the maximum outlier 74). Which is to say you could pick any number whatsoever and expect to see it no more than once per ~2^57 invocations.
I was curious about this. On the one hand, comparing doubles with == is rarely a good idea but, on the other hand, your explanation seems valid.
After some testing I discovered a problem but not with the comparison. The problem is with calculating the halfway value. There are some doubles where their difference cannot be represented as a double:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <float.h>
double random_real(double low, double high, int (*random_bit)(void)) {
if (high < low)
return random_real(high, low, random_bit);
double halfway, previous = low;
while (1) {
halfway = low + (high - low) / 2;
if (halfway == previous)
break;
if (random_bit() & 1)
low = halfway;
else
high = halfway;
previous = halfway;
}
return halfway;
}
int main(int argc, char *argv[]) {
srand(time(NULL));
for (int i = 0; i < 1000000; i++) {
double r = random_real(-DBL_MAX, DBL_MAX, rand);
printf("%f\n", r);
}
}
Actually, the problem is that the algorithm is having to calculate DBL_MAX+DBL_MAX, which of course is going to exceed the maximum value for a double-precision number (by definition). That isn't a very realistic use-case either, but in any case you could just clamp the inputs like so:
double random_real(double low, double high, int (*random_bit)(void)) {
if (high < low)
return random_real(high, low, random_bit);
const double max = DBL_MAX / 2;
if (high > max)
high = max;
const double min = -max;
if (low < min)
low = min;
double halfway, previous = low;
while (1) {
halfway = low + (high - low) / 2;
if (halfway == previous)
break;
if (random_bit() & 1)
low = halfway;
else
high = halfway;
previous = halfway;
}
return halfway;
}
I am hiring a new Application Security team in Austin to focus on making the highest-privilege applications in the non-AWS side of the company the planet's most secure.
This team will be joining a 9-month old effort to collaborate with developers of key apps on security assessment, architecture improvement, design and code review, and automation of the security process.
The pros of our team are technical excellence, a culture of sustainable work (we are working hard here, but strictly 9-5), the opportunity to have a significant influence on the security posture of the company as a whole, and the chance to hack on applications operating at a global scale, and low (1x/month) oncall expectations.
The cons of our team are moderate process debt (arising from our newness and some unexpected demand)and higher-than-normal ambiguity in tasks (we hold too many task definitions/bars in our head and haven't written them down yet).
Please apply to these roles through the links below:
ICOs were killed by Solidity and the Ethereum ecosystem more generally being insufficiently expressive to create anything of value other than pyramid schemes (insofar as those have value).
This is the exact sort of thing that allows people to think that things like Telegram are acceptable equivalents to Signal instead of disastrously poor imitators. It's a shame the discourse around secure messengers has become so polluted.
In the paper they were still able to cover 100% of US numbers for Signal and discover all of its users, but less than 0.02% for Telegram and discover only 908 of its users due to simple rate limits, how is Signal better at this exactly? On top of that the paper purposely chose unrealistic threat models and assumptions about privacy, as if letting other people know your phone number is somehow acceptable for privacy in the first place (it isn't and never was).
Rate limits are trivial to skirt with rented botnets. Some "Free VPN" apps allow inbound traffic from paying clients to be redirected out to the internet.
How is user discovery of Telegram at 0.02% worse than Signal at 100%? It isn't like they could possible get it any higher and Telegram's couldn't get much lower. People who know what they are talking about have been critical of Signals use of phone numbers since the start but Signal have always brushed it off as irrelevant.
Ironically, Nietzsche is considered (by some) to be one of the fathers of postmodern thought. His criticism of the objectivity of science in "On Truth and Lies in a Nonmoral Sense", his deconstruction of the Western concept of self in "The Anti-Christ", and to some extents his criticism of 19th-century historiography in “On the Uses and Disadvantage of History for Life” and other books are touchstones which presage a lot of postmodern discussion of these topics.