I would recommend the substitution s/randomness/uncertainty/ since it seems to be the more useful concept. With the equivalence between the two ways of thinking becomes more clear. The uncertainty you have before learning the value of a bit, is equal to the information you gain when learning it's value.
Let's use an analogy of a remote observation post and with a soldier sending hourly reports:
0 ≝ we're not being attacked
1 ≝ we're being attacked!
Instead of thinking of a particular message x, you have to think of the distribution of messages this soldier sends, which we can model as a random variable X. For example in peaceful times, the message will be 0 99.99% of the time, while in war times could be 50-50 in case of active conflict.
The entropy, denoted H(X), measures how uncertain the central command post have about the message before they will receive, or equivalently, the information they gain after receiving the message. The peace time messages contain virtually no information (very low entropy), while wartime 50-50-probability messages contain H(X)=1 bit each.
Another useful way to think about information is to say "how easy would be to guess the message" instead of receiving it? In peacetime you could just assume the message is 0 and you'll be right 99.99% of the time. In wartime, it would be much harder to guess---hence the intuitive notion that wartime messages contain information.
Let's use an analogy of a remote observation post and with a soldier sending hourly reports:
Instead of thinking of a particular message x, you have to think of the distribution of messages this soldier sends, which we can model as a random variable X. For example in peaceful times, the message will be 0 99.99% of the time, while in war times could be 50-50 in case of active conflict.The entropy, denoted H(X), measures how uncertain the central command post have about the message before they will receive, or equivalently, the information they gain after receiving the message. The peace time messages contain virtually no information (very low entropy), while wartime 50-50-probability messages contain H(X)=1 bit each.
Another useful way to think about information is to say "how easy would be to guess the message" instead of receiving it? In peacetime you could just assume the message is 0 and you'll be right 99.99% of the time. In wartime, it would be much harder to guess---hence the intuitive notion that wartime messages contain information.