Information Entropy
In the field of Information Theory, we can define the information conveyed $I(E)$ by the occurrence of an event $E$ as $$I(E) = \log \frac{1}{p(E)},$$ where $p(E)$ is the probability of the event $E$. Note that if the event is unlikely then the information gained is large. Suppose that we have an information source $X$, which can be thought of as a discrete random variable. $X$ can take on many "states" $\{x_i\}$. We can define the entropy of the system as $$H(X) = \sum_i^n p(x_i)I(x_i),$$ where $n$ is the number of states that $X$ can take. An example of $X$ is an alphabet. A qualitative description of the entropy of an information source is a measure of its unpredictability. We can see this by observing the following: If the probabilities of the states of the source are uniform the entropy is maximised. That is, when we have no prior information about the likelihood of each state we expect the entropy of the information source to be large. We can prove this f...