Yet any arbitrary sequence (e.g. of bits) has an equal probability of occurring in a random bitstring. 111111 is equally likely as 100110 or 010101. Asked "which one is random?" most people would say the middle one (even people who know better, like me).
Personally I discount the likes of 111111 and 010101 because I know there are artificial processes which produce those sequences and so I discount them on that basis. Yet if you were training a machine to recognize "random" data you'd need to include one sample of each of the possible 2^6 sequences to train it on representative data.
There is another category of random / not random to be considered: self-similar data, where there is similar data in the same area or at different scales. Taken in total, all sequences in this "universe" may be nonetheless randomly distributed.
A taxonomy / review of sequences which we generate inordinately and what phenomena are affected is missing. Self-similar data always deserves a second look, although the cause can be a natural self-organizing principle (e.g. literal snowflakes).
Personally I discount the likes of 111111 and 010101 because I know there are artificial processes which produce those sequences and so I discount them on that basis. Yet if you were training a machine to recognize "random" data you'd need to include one sample of each of the possible 2^6 sequences to train it on representative data.
There is another category of random / not random to be considered: self-similar data, where there is similar data in the same area or at different scales. Taken in total, all sequences in this "universe" may be nonetheless randomly distributed.
A taxonomy / review of sequences which we generate inordinately and what phenomena are affected is missing. Self-similar data always deserves a second look, although the cause can be a natural self-organizing principle (e.g. literal snowflakes).