What is the longest known sequence that repeats in Pi? (homelab)

0points · on Aug 28, 2024

On a side note (it's mentioned in the article), it has been a while since I last visited OEIS (Online Encyclopedia of Integer Sequences), which has been of use over the years.

Very much appreciate the amazing effort that is OEIS!

I thought I should mention it to raise awareness: https://oeis.org/

The sequence in the article: https://oeis.org/A197123

zimpenfish · on Aug 28, 2024

There's a Neil Sloane (OEIS creator/curator) playlist from Numberphile which, unsurprisingly, often feature OEIS sequences.

https://www.youtube.com/playlist?list=PLt5AfwLFPxWJXQqPe_llz...

TacticalCoder · on Aug 29, 2024

OEIS is amazing and in the same "spirit" I also love factordb. Another "Web 1.0" look which also exists since immemorial times.

It contains, well, factorization of many numbers.

For example here's RSA-250:

https://factordb.com/index.php?query=21403246502407449612644...

I hope sites like these continue to exists for a very long time.

I'd link to one or two other useful oldschool sites that I do still use but I'm not sure they could handle the /. (well, HN) effect.

sponaugle · on Aug 28, 2024

OEIS is amazing to say the least. It is such an interest collection of sequences as well as a great collection of editors that are willing to keep it all in order! Certainly worth a mention and a contribution.

kstrauser · on Aug 28, 2024

I wonder if you could do something with Bloom filters similar to:

  filters = {}
  candidates = []
  for i in range(len(pi)):
      block = pi[i:i+10]
      hash_prefix = block[:4]
      try:
          bloom = filters[hash_prefix]
      except KeyError:
          filters[hash_prefix] = BloomFilter().add(block)
          candidates.append((i, block))
          continue
      if block in bloom:
          candidates.append((i, block))
      else:
          bloom.add(block)

Basically, make one pass through to build a list of statistically likely candidates to evaluate later. Then use full string matching on those candidates to find the real matching sequences.

That might be helpful here because Bloom filters can say "we might have seen this string before" or "we definitely have not seen it before". That may greatly reduce the amount of storage you need.

ted_dunning · on Aug 28, 2024

Exactly this. Use a Bloom filter to find candidate solutions, write those out and review them later. Since you can write the entire candidate and its location, the review consists of a simple sort.

Digging in to estimate the speed, if your Bloom filter has a 0.1% chance of a collision then a trillion digits of pi will result in a billion candidates. This requires ~15 bits per entry or about 2TB of memory for the full Bloom filter. Using multiple passes on subsets of the data allows you to trade speed for space.

Note also that a 128 bit integer has room for 38 digits. This means that you can do all of the shifting using modulo, multiplication and addition operations. I would be very surprised if the cost of the disk I/O to read the raw digits is as fast as the sieving of candidates. As such, the speed of a single pass should be about the cost of reading about 500GB of data which means that multi-threading will be of limited use. This should be less than an hour on most machines. That means that one of my ancient Intel NUCs with 32GB should be able to scan the entire range in about a day no matter what size sequence we are looking for.

im3w1l · on Aug 28, 2024

This is very nice, but I can see a way to optimize it even further.

Consider pi[i:i+10] and pi[i+1:i+11] they are highly similar. Intuitively it seems that storing them both seems wasteful. But how can we avoid storing them both? This is where we have to get clever.

We are looking for a length 18 match. If position a and position b are a length 18 match, then it holds that (a,b) is a length 10 match. (a+1,b+1) is also a length 10 match, and so is (a+2,b+2). Wouldn't it be nice if we could store only one of a,a+1,a+2 etc?

The naive attempt is to do the loop with a stride of say 3, but this will not work as it may happen that e.g. a=1 mod 3 and b=2 mod 3.

However there is another approach that does work. We need to use a position-independent way of deciding between a,a+1 etc. We can do it thusly. Loop over all the numbers. When we encounter position a we score the length 10 sequences starting at a, a+1.. a+8. Any scoring function will do as long as it doesn't have ties. For simplicity we can score=number. The highest scoring number is stored. As an implementation detail we can use a "sliding window maximum" algorithm to avoid recomputing and recomparing the scores.

Note that in this scheme we may actually end up storing more than one of a, ..a+8. But it will cut down the storage quite a bit at least.

kstrauser · on Aug 28, 2024

That would be O(n^2) though, where n is the max length of the strings to compare, right?

im3w1l · on Aug 28, 2024

No, it's possible to do much better https://www.geeksforgeeks.org/sliding-window-maximum-maximum...

Edit: It should also be mentioned that due to the conjectured normality of pi, comparing two sequences will on average terminate very quickly, just looking at one or two digits will be enough most of the time, so a comparison is expected O(1).

sponaugle · on Aug 28, 2024

Indeed, the comparison length is 90% one hit, 99% two hits, etc.

sponaugle · on Aug 28, 2024

My current method is O( (n/h)^2 ) where h is the hash size, which is really O(n^2) for n>>h. I'll have to think about this to see if it would be better.

sponaugle · on Aug 28, 2024

I need to noodle on this. Hmm.

sponaugle · on Aug 28, 2024

Oh that is interesting!.. and indeed I did consider doing a 'hash compression' where I would generate offsets to numbers that might be similar, then do direct comparison on those ones that are similar.

I'll take a crack at doing this - It could certainly be an improvement and would just be fun to try out.

kstrauser · on Aug 28, 2024

Please let us know!

iambateman · on Aug 28, 2024

I read the original post and thought: "someone knows a more efficient way to do this."

Thanks for being that someone.

kstrauser · on Aug 28, 2024

Full disclosure: that’s a wild guess. It may be helpful, or it might turn out to be a terrible idea for reasons I hadn’t considered. I’m definitely not claiming it is a better way.

That said, I’m curious now. I also refuse to allow myself to dig further into it. I’ve got enough spinning plates at the moment. :-)

sponaugle · on Aug 28, 2024

Yep! I suspect there are lots of different ways to optimize this search, especially with some heuristic approaches. One of the great things about HN - There is always someone with a better way!

nobrains · on Aug 28, 2024

OK cool. But...

Finding random numbers repeating is a simple brute force problem. It is cool, but slightly boring.

Related to this, what I cannot wrap my head across, is the Infinite monkey theorem. Is it not possible that we keep expanding the numbers and never reach a complex enough set of values?

sponaugle · on Aug 28, 2024

It is, as far as being a math problem, somewhat boring. The numbers themselves are equivalent to a random number. There is certainly no significance to the numbers themselves, especially as this is all in the decimal representation, which is very 'us' specific.

It does represent something 'interesting' that is easy to observe and wonder about, and for the non-math person it leads to slightly better understanding of 'normal' numbers, of the difference in representation ( these answers are base specific), and a very small amount of simple computer science.

The notion of world record is of course tongue in cheek. No one really needs to know these numbers, nor will they rememeber them! But at least there is an entry in OEIS!

bee_rider · on Aug 28, 2024

WRT significance, do people ever find the position of socially significant numbers in pi? For example I recall quite a while ago, there was a leak of the key that was used to encrypt blue rays.

Hypothetically that could be found in pi, right? Of course, it ended up being spread widely enough that they gave up on trying to police it. But maybe there are some other not-allowed numbers that could be distributed as a position in pi? I’m not sure what the point would be, mostly just to make a point I guess.

zimpenfish · on Aug 28, 2024

https://www.angio.net/pi/ lets you search for sequences of up to 120 digit in the first 200M digits of pi. I'm sure there's others knocking around that might go further.

clueless · on Aug 28, 2024

in a pure infinite monkey scenario, works of Shakespeare would be generated at some point

ryanmcbride · on Aug 28, 2024

You can't guarantee that monkeys are typing perfectly randomly. We may never see better than "it was the best of times it was the blurst of times"

scarmig · on Aug 28, 2024

It doesn't need to be perfectly random; you just need every sequence to have a nonzero probability of being generated.

"to be or not to be" may be more likely for a monkey on a keyboard to generate than a random string, actually. The keys to generate it are highly repetitive and relatively close together. That's true for any NL string: the lower entropy of NL strings are reflected in the keyboard layout. If the monkeys switch to Dvorak, they could probably generate Shakespeare even faster.

deathanatos · on Aug 28, 2024

  What, you egg?
    [He stabs him.]

(from Macbeth)

Are you certain the real thing isn't "the blurst of times" already?

mauvehaus · on Aug 28, 2024

Your monkeys are pretty bad at Shakespeare if the best they come up with "it was the best of times it was the blurst of times".

qingcharles · on Aug 28, 2024

Their typewriters are kinda janky, TBH.

https://preview.redd.it/it-was-the-best-of-times-it-was-the-...

mr_toad · on Aug 29, 2024

> Your monkeys are pretty bad at Shakespeare

Worse, they can’t even tell their Shakespeare from their Dickens (It’s the opening line of A Tale of Two Cities).

tacoooooooo · on Aug 28, 2024

you misunderstand infinity

chowells · on Aug 28, 2024

So if I randomly type letters from the top row of my keyboard forever, I'll eventually type the entire works of Shakespeare?

Of course not. There are lots of ways you can bias random that remain random, but prevent every possible output from being generated. That's all GP was saying. You can't just say "infinite time", you need to rule out biases that would prevent the desired result.

recursive · on Aug 28, 2024

You don't need "perfect randomness". The claim was about "perfect randomness". You just need a non-zero probability for the outcomes in question. And you need independence between subsequent random events.

clueless · on Aug 28, 2024

how so?

crtified · on Aug 28, 2024

If you have infinite monkeys, then (at least) one of the monkeys is Shakespeare.

dmurray · on Aug 29, 2024

If you have finite monkeys, let's say they number M, but infinite time, then for small values of M it would be faster to breed the monkeys until one of them evolves into Shakespeare.

amelius · on Aug 28, 2024

The binary representation of Pi contains both the MIT license and the GPL. I'm confused.

kstrauser · on Aug 28, 2024

I think I saw the Blu-ray rip of “The Big Levowski” flit by (but it was dubbed in Finnish).

lifthrasiir · on Aug 29, 2024

Extending that to all known digits would be a fun technical challenge. I once wrote a code to calculate positions where k subsequent decimal digits equal to the position index itself [1], but I had no storage to spare so I instead used a stream of digits from the public GCP dataset [2]. You would eventually want to try this approach if you are like me (i.e. don't want to pay for two-digit-TB storage).

[1] https://github.com/lifthrasiir/remote-pi-reader/

[2] https://storage.googleapis.com/pi100t/index.html (used to be `pi50t` back then)

sponaugle · on Aug 29, 2024

Nice... I did that same problem as it was a numberphile episode. Very cool! Fortunately I have over a PB of local storage at home which helps out.

qingcharles · on Aug 28, 2024

This one is fun too:

https://en.wikipedia.org/wiki/Six_nines_in_pi

mihaic · on Aug 28, 2024

This can easily be optimized to use less memory, since pi is thankfully random.

Do N passes over the data, and at each pass only put in your hashmap the values that mod N equal the current iteration. If you take N~100, the runtime would probably be in the same ballpark, since the only thing that increase is streaming all the data N times. With a fast SSD, that's not that much.

altruios · on Aug 28, 2024

Pedantic point here.

PI isn't random. It is exactly the ratio of a radius and the circumference of the circle.

Incommutability is not the same thing as 'random'. PI does not follow any sort of probability or statistical algorithm to be generated. Nothing inside PI is random...

Now... if you take a random number (introducing randomness in the first place) representing the position and another random number to represent the length. From PI those two random numbers will generated a random sequence. But the 'randomness' is not IN PI, the random elements are only in the 2 chosen numbers. I understand this is colloquially what people mean when they say PI is 'random'...

Randomness can only come about from choice, and the digits of PI have no choice as to what they will be. You can not read out a deterministic sequence of numbers in order and expect randomness. The randomness must come from some choice interacting with that defined sequence of numbers.

flir · on Aug 28, 2024

Not taking issue with you, slightly different question:

Do the digits of Pi pass statistical tests for randomness? Uniform distribution, etc?

sponaugle · on Aug 28, 2024

I believe they do.

https://blogs.sas.com/content/iml/2015/03/12/digits-of-pi.ht...

mirashii · on Aug 28, 2024

Statistical tests? Yes. Mathematically proven to be normal? No.

p4bl0 · on Aug 28, 2024

This reminds me of Chaitin's work on the nature of randomness and his halting probability constant: https://en.wikipedia.org/wiki/Chaitin's_constant

tfehring · on Aug 28, 2024

Isn't this functionally the same thing that the author already did, just with N=10 and filtering on the first digit instead of the last?

mihaic · on Aug 28, 2024

Sure, I did read it too quickly, but why didn't he simply extend the generalized implementation instead of saying at the end that he was limited by the amount of RAM he has?

Those runtimes seem quite a lot slower than what I would have expected, and I'm pretty sure they could be optimized by a lot, especially since he sounds like he has enough RAM to hold the entire file.

Not trying to be dismissive, but this doesn't sound like that great of an implementation.

sponaugle · on Aug 28, 2024

Indeed no worries about pointing that out! - there are lots of things that could optimize the RAM requirements, and especially reduce the runtime. Since the overall runtime is pretty quick to getting a single answer (3 hours) vs spending more time optimizing.. but absolutely there is a lot left on the table.

I think with some good optimization you could reduce the runtime significantly, especially on modern hardware.

As for RAM limitations - You are correct the only limitation is just that with the RAM I have I would need to do more iterations. It would be possible to do this in much less RAM with more iterations, or the reverse of course.

I was fond of solving the problem in RAM just as a way to limit the scope of the problem... but SSDs are indeed pretty fast at streaming data like this.

ForOldHack · on Aug 28, 2024

How odd. You have to read and copy the data from the web to your hard disk, to the SSD. Why not just run the program on the web copy, streaming. Tiny amount of ram for all the hash tables, and you do not need to move the data around. Multi threading would improve the runtime, but reading the table is the bottle neck.

He makes a great optimization, by using hash tables, but I was wondering about optimizing about the evaluation of the Chi-Square Test for Equal Proportions.

If the Chi-Square test diverges from the Chi-Square Test for e, then there is little likely hood for pi+e to ever converge on rationality.

My hypothesis is that because Pi is cyclical, and e is exponentially transcendental, that their sum and product are not rational, and nether are their respective powers ( Pi^e and e^Pi ), but those will take a much larger homelab than I have access to.

sponaugle · on Aug 28, 2024

I have a local copy of 10 trillion digits of Pi on the math cluster, so no web access needed.

amingilani · on Aug 29, 2024

Is there a research paper is this?

Seriously, I find the most interesting works self-published in people’s personal blogs and I often wonder if there’s a paper in there and why it didn’t make it into a paper.

Maybe there’s a paper in finding the largest repeating sequence in Pi. There have certainly been more niche papers than that.

marginalia_nu · on Aug 29, 2024

There isn't much incentive to publish academic papers outside of academia. It's a huge hassle and the juice generally isn't worth the squeeze.

34679 · on Aug 28, 2024

Assuming that pi has an infinite number of digits, then the answer to "What is the longest known sequence that repeats in Pi?" is "The longest known sequence of Pi".

Misdicorl · on Aug 28, 2024

This isn't true as you can build an infinite sequence that never repeats. An example sequence in binary is (the number of 0s between each 1 increases by 1 every time)

01001000100001...

poizan42 · on Aug 28, 2024

Exactly. A number with the property that every sequence occurs is called a rich or disjunctive number - a number can be rich in s specific bases or rich for all bases we don't know whether pi is any if that. A number where every sequence occurs equally often (scaled to the length of the sequence) is called a normal number, which is an even stronger property.

sponaugle · on Aug 28, 2024

While Pi is not proven to be a disjunctive number, nor the stronger condition of being normal, it is generally believed to be normal. That being said we don't have a proof of Pi being normal, nor disjunctive.

I am not familiar with how a proof of that would be constructed, as clearly numerical or computational measurements could never be conclusive.

jfoutz · on Aug 28, 2024

Starting from the fourth digit,

01000100

But maybe I don’t understand your example

Misdicorl · on Aug 28, 2024

You didn't understand the original claim which is that because Pi has an infinite decimal representation, every subsequence of it has a repeat

jfoutz · on Aug 28, 2024

Ahh, you’re saying something like every sequence containing at least 2 ones, does not repeat.

There may be some long repeats, but not all sequences repeat. Thanks!

34679 · on Aug 29, 2024

Your example is governed by an arbitrary rule of your choosing. Pi is not.

nimih · on Aug 28, 2024

That assumption is empirically supported, but is, as of now, conjectural and unproven. In fact, our ability to prove facts about the distribution of digits in pi is so weak that we can't even rule out statements such as "after a certain point, all digits of pi represented in base 10 are either 0 or 1."

markusde · on Aug 29, 2024

The longest sequence of digits that eventually repeat (as defined in the article) is unboundedly large for any infinite sequence of digits, not just pi!

Using the pigeonhole principle, there must be at least one length N repeating string in the first N(N!+1) characters of any string.

sponaugle · on Aug 28, 2024

Ha! Yea, that is true.. now if we only had a printout of it!

whartung · on Aug 28, 2024

Post the IP of your printer, and I'll send you one.

sponaugle · on Aug 28, 2024

1f:ca7:5ee5:b12d::ca7:eat5:b12d

csense · on Aug 29, 2024

Breaking news: As of this writing, OP is 23 hours old. OEIS has already been updated with the new terms OP discovered.

cheeze · on Aug 28, 2024

Can you do something fancy with FFTs and convolution here?

I _think_ you can but my brain doesn't have the bandwidth to think past that.

ZiiS · on Aug 28, 2024

Isn't the entire known sequence of pi also known to repeat due to its nature?

IngoBlechschmid · on Aug 28, 2024

You mean because the entire known sequence is finite, and every finite sequence occurs in pi infinitely often?

This is suspected but currently not yet proven.

In fact, we don't even have a proof yet that the digit 7 occurs infinitely often in pi. (Same for any other digit.) Right now it's conceivable, but very unlikely, that there is some final 7 in pi.

drpixie · on Aug 29, 2024

I suspect not. Pi is known to be irrational, and repeating sequences are known to be rational. So pi cannot be truely repeating.

cperciva · on Aug 28, 2024

The right algorithm for this problem starts with a linear time suffix sort.

Beijinger · on Aug 28, 2024

Pi is normal

https://www.unige.ch/~mccoid/presentations/TALK_piday_2019_p...

danbruc · on Aug 28, 2024

Conjectured to be normal.

sponaugle · on Aug 28, 2024

Yep - This really is just a random number problem.. it is only interesting in the sense that Pi is something that has been captivating in public math perception. Any random number would have the same characteristics..

It is however something fun to do in a homelab outside of the normal learning and playing around, and that has merit for me. Hobbies are hobbies in part because of the interest, joy and appreciation they drive.

Spivak · on Aug 28, 2024

In one sense you're right. The set of normal numbers has measure 1 so if you choose a real number randomly it will be normal and have the same properties.

In another sense you're very wrong. The set of uncomputable numbers also has measure 1. So while there are lots of normal numbers Pi belongs to a pretty exclusive club of numbers we can actually look at.

It's not a particularly deep result but the fact that the set of computable numbers is countable will never not make me existential. Every number that we can actually express, that we will ever know, is no bigger than the whole numbers.

drexlspivey · on Aug 28, 2024

How do you generate a random irrational number?

chowells · on Aug 28, 2024

Generate a random Cauchy sequence. One trivial way to do this is to output a random string of digits. It's not perfect, but it works. Approximately 0% of random Cauchy sequences will be computable, so long as your random number generator doesn't have self-correlations.

danbruc · on Aug 28, 2024

Would this not require generating an infinite number of random numbers which you can not do? Or can you actually generate a random Cauchy sequence from a finite set of random numbers?

chowells · on Aug 29, 2024

"which you can not do" is an interesting assertion. In some sense it's trivially true, of course. Generating infinite random numbers would take infinite time. In fact, I'm quite happy to say that approximately 0% of real numbers can exist, for that reason.

But in contrast to that, I'm also happy to grant the existence of constructions with countably infinite length when discussing theories outside of computability. So in that context, is it possible to generate an infinite string of random numbers? Well, it is so long as you believe you can generate any amount random numbers at all. Just repeat the process, whatever it is.

Is that a reasonable thing to believe? That's probably a philosophical question at this point. I'm certainly not equipped to answer it. But assuming it is possible allows for some interesting math, which I support for its own sake.

This is all sort of reminiscent of a mathematician friend's stance on the axiom of choice. If you were to tell them a vehicle is only guaranteed to not explode by the axiom of choice, they wouldn't use it. On the other hand, assuming the axiom of choice is true leads to some useful math, even if it's sort of sketchy ontologically. It's a lot like programming. Sometimes you can't fix the bugs in the underlying system and you just code around them to get your stuff working.

danbruc · on Aug 29, 2024

We were talking about somewhat different things, you meant doing it mathematically, sure, no problem, pick an infinite random sequence, I understood the question to mean how to do it in practice.

Actually I had something even stronger in mind, being able to pass the generated number around. For pi I can have a program that I can pass around and that will give everyone access to all the digits of pi, in base 16 we can even have random access to all the digits.

For generating a random Cauchy sequence things are not that easy. Locally I can just use a true random number generator - in case such a thing exists - and generate all the digits I am interested in on demand, storing them in case I have to look at a previously generated digit again. But I can not easily pass that number around, only the digits I have already generated. We would need a shared database where everyone can share all the generated digits.

Or I could try to replace the true random number generator with a pseudo random number generator, then I could just pass the seed around. Everyone would have to either agree how the sequence of random numbers is mapped to the sequence of digits or we would have to use a seekable random number generator. But this raises the question whether using a [specific] pseudo random number generator would still yield a normal number.

Spivak · on Aug 28, 2024

I think the answer is yes and no.

Let's say we knew we had a normal number, say we could prove pi is normal. Then the set of sequences defined by the decimal expansion of pi starting at the nth place for all n in the natural numbers would be uniformly random over all decimal expansions. I'm going this route because it in theory allows us to define each sequence by virtue of a single number rather than an infinite sequence of randint(0, 9). This I think would be our best case if it were possible. But you would have to select a number uniformly in the range [0, inf) which... you can't.

danbruc · on Aug 28, 2024

We actual know some normal numbers [1] but they seem to all be synthetic. I would guess they have a lot of local correlation because of the way they are constructed and the comment I responded to explicitly mentioned no self-correlations - not sure what exactly that means - but it probably indicates that being a normal number is not a strong enough condition for using its digits as a sequence of random digits.

[1] https://en.wikipedia.org/wiki/Normal_number#Properties_and_e...

rdlw · on Aug 28, 2024

> Pi is normal

ok... click

> Pi may be a normal number

huh...

tialaramex · on Aug 28, 2024

Yeah, it looks really normal. If "it looks about right" is convincing then I guess there a bunch of professions you need to avoid, and mathematics is one of them.

We do not have, and perhaps will never get, proof that Pi is normal.