> But how does probability distribution over sequences of consecutive tokens can...

> But how does probability distribution over sequences of consecutive tokens can create new things?

If you start a sentence with a few words, think about the probability for what the next word might be. Imagine a vector (list) with a probability for every single other word in the language, proper nouns included. This is a huge list, and the probabilities of almost everything are near zero. If you take the very highest probability word, you'll get a fairly predictable thing. But if you start taking things a little lower down the probability list, you start to get what amounts to "creativity" but is actually just applied statistics plus randomness. (The typical threshold to use for how high the probability of a selected word should be is called the "temperature" and is a tunable parameter in these models usually.) But when you consider the fact that it has a lot of knowledge about how the world works and those things get factored into the relative probabilities, you have true creativity. Creativity is, after all, just trying a lot of random thoughts and throwing out the ones that are too impractical.

Some models, such as LaMDA, will actually generate multiple random responses, and run each of those responses through another model to determine how suitable the response is based on other criteria such as how on-topic things are, and whether it violates certain rules.

> Is this based on an entirely previous creation?

Yes, it's based entirely on its knowledge of basically everything in the world. Basically just like us, except we have personal volition and experience to draw from, and the capability to direct our own experiments and observe the results.