A couple of years ago, I worked on a natural language processing application along with fellow C# developer Michael Cronin. While working on this, I did extensive research into Markov Chains. Although there are many articles on the internet, I found that none of them did a good job of explaining the concepts behind Markov Chains.

Markov Chains are used to produce random text that still largely conforms to the rules of the English language. This random text is sometimes used by authors and musicians as a source of inspiration, although it is also used by spammers to trick spam filters.

Markov chains are generated by analzying a piece of text to determine the probability of each word. Here’s an example, calculating the probability data for the word **Jack **in the nursery rhyme “Jack and Jill”:

**Jack and** Jill went up the hill to fetch a pail of water

**Jack fell** down and broke his crown

*And Jill came tumbling after.*

*Up got Jack, and home did trot *

*As fast as he could caper*

*He went to bed and bound his head*

*With vinegar and brown paper.*

There are three instances of the word **Jack **in the text. There are two cases where **Jack** is followed by **and**, and one instance where it is followed by **fell**.

A Markov chain sees this as a 66% chance (2/3) of the word **Jack** being followed by the word **and**, and a 33% chance (1/3) of **Jack** being followed by the word **fell**.

To produce a Markov chain, you need to calculate probabilities for every word in the text. This produces the following probability data:

and |
Jill (33%) | broke (16%) | home (16%) | bound (16%) | brown (16%) |

Jack |
and (33%) | fell (66%) | |||

as |
fast (50%) | he (50%) | |||

he |
could (50%) | went (50%) | |||

his |
crown (50%) | head (50%) | |||

Jill |
went (50%) | came (50%) | |||

to |
fetch (50%) | bed (50%) | |||

up |
the (50%) | got (50%) | |||

went |
up (50%) | to (50%) |

All of the other words in the rhyme are followed by only one word (100% probability).

Let’s now create some example chains from this data, using the word **Jack** as a starting point. We’ll stop at 7 words for the sake of simplicity.

*Jack fell down and home did trot (80%)*

*Jack fell down and brown paper (76%)*

*Jack fell down and Jill came tumbling (75%)*

*Jack fell down and broke his crown (72%)*

*Jack fell down and broke his head (72%)*

*Jack fell down and bound his crown (72%)*

*Jack fell down and bound his head (72%)*

All of these chains could potentially be generated from the text, although some are more likely than others.

Here’s a breakdown of the first chain:

**Jack** has a 66% chance of being followed by **fell**

**fell **has a 100% chance of being followed by **down
down** has a 100% chance of being followed by

**and**

andhas a 16% chance of being followed by

and

**home**

homehas a 100% chance of being followed by

home

**did**

didhas a 100% chance of being followed by

did

**trot**

The average of these percentages is around **80%**.

The actual probability of this result is around **10%**.