{"id":206,"date":"2012-05-09T11:38:14","date_gmt":"2012-05-09T11:38:14","guid":{"rendered":"http:\/\/www.silverclaw.net\/?p=206"},"modified":"2012-05-09T11:38:14","modified_gmt":"2012-05-09T11:38:14","slug":"markov-chains-explained","status":"publish","type":"post","link":"http:\/\/www.silverclaw.net\/?p=206","title":{"rendered":"Markov Chains Explained"},"content":{"rendered":"<p>A couple of years ago, I worked on a natural language processing application along with fellow C# developer <a href=\"http:\/\/www.linkedin.com\/pub\/mike-cronin\/16\/7aa\/600\" target=\"_blank\">Michael Cronin<\/a>. While working on this, I did extensive research into Markov Chains. Although there are many articles on the internet, I found that none of them did a good job of explaining the concepts behind Markov Chains.<\/p>\n<p>Markov Chains are used to produce random text that still largely conforms to the rules of the English language. This random text is sometimes used by authors and musicians as a source of inspiration, although it is also used by spammers to trick spam filters.<\/p>\n<p>Markov chains are generated by analzying a piece of text to determine the probability of each word. Here&#8217;s an example, calculating the probability data for the word <strong>Jack <\/strong>in the nursery rhyme &#8220;Jack and Jill&#8221;:<\/p>\n<p><em><strong>Jack <span style=\"text-decoration: underline;\">and<\/span><\/strong> Jill went up the hill to fetch a pail of water<\/em><br \/>\n<em><strong>Jack <span style=\"text-decoration: underline;\">fell<\/span><\/strong> down and broke his crown<\/em><br \/>\n<em>And Jill came tumbling after.<\/em><br \/>\n<em>Up got <strong>Jack, <span style=\"text-decoration: underline;\">and<\/span><\/strong> home did trot <\/em><br \/>\n<em>As fast as he could caper<\/em><br \/>\n<em>He went to bed and bound his head<\/em><br \/>\n<em>With vinegar and brown paper.<\/em><\/p>\n<p>There are three instances of the word <strong>Jack <\/strong>in the text. There are two cases where\u00a0<strong>Jack<\/strong> is followed by <span style=\"text-decoration: underline;\"><strong>and<\/strong><\/span>, and one instance where it is followed by <span style=\"text-decoration: underline;\"><strong>fell<\/strong><\/span>.<\/p>\n<p>A Markov chain sees this as a 66% chance (2\/3) of the word <strong>Jack<\/strong> being followed by the word <span style=\"text-decoration: underline;\"><strong>and<\/strong><\/span>, and a 33% chance (1\/3) of <strong>Jack<\/strong> being followed by the word <span style=\"text-decoration: underline;\"><strong>fell<\/strong><\/span>.<\/p>\n<p>To produce a Markov chain, you need to calculate probabilities for every word in the text. This produces the following probability data:<\/p>\n<table border=\"0\" frame=\"VOID\" rules=\"NONE\" cellspacing=\"0\">\n<colgroup>\n<col width=\"86\" \/>\n<col width=\"86\" \/>\n<col width=\"86\" \/>\n<col width=\"86\" \/>\n<col width=\"86\" \/>\n<col width=\"86\" \/><\/colgroup>\n<tbody>\n<tr>\n<td align=\"LEFT\" width=\"86\" height=\"18\"><strong>and<\/strong><\/td>\n<td align=\"LEFT\" width=\"86\">Jill (33%)<\/td>\n<td align=\"LEFT\" width=\"86\">broke (16%)<\/td>\n<td align=\"LEFT\" width=\"86\">home (16%)<\/td>\n<td align=\"LEFT\" width=\"86\">bound (16%)<\/td>\n<td align=\"LEFT\" width=\"86\">brown (16%)<\/td>\n<\/tr>\n<tr>\n<td align=\"LEFT\" height=\"17\"><strong>Jack<\/strong><\/td>\n<td align=\"LEFT\">and (33%)<\/td>\n<td align=\"LEFT\">fell (66%)<\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<\/tr>\n<tr>\n<td align=\"LEFT\" height=\"17\"><strong>as<\/strong><\/td>\n<td align=\"LEFT\">fast (50%)<\/td>\n<td align=\"LEFT\">he (50%)<\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<\/tr>\n<tr>\n<td align=\"LEFT\" height=\"17\"><strong>he<\/strong><\/td>\n<td align=\"LEFT\">could (50%)<\/td>\n<td align=\"LEFT\">went (50%)<\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<\/tr>\n<tr>\n<td align=\"LEFT\" height=\"17\"><strong>his<\/strong><\/td>\n<td align=\"LEFT\">crown (50%)<\/td>\n<td align=\"LEFT\">head (50%)<\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<\/tr>\n<tr>\n<td align=\"LEFT\" height=\"18\"><strong>Jill<\/strong><\/td>\n<td align=\"LEFT\">went (50%)<\/td>\n<td align=\"LEFT\">came (50%)<\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<\/tr>\n<tr>\n<td align=\"LEFT\" height=\"17\"><strong>to<\/strong><\/td>\n<td align=\"LEFT\">fetch (50%)<\/td>\n<td align=\"LEFT\">bed (50%)<\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<\/tr>\n<tr>\n<td align=\"LEFT\" height=\"17\"><strong>up<\/strong><\/td>\n<td align=\"LEFT\">the (50%)<\/td>\n<td align=\"LEFT\">got (50%)<\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<\/tr>\n<tr>\n<td align=\"LEFT\" height=\"17\"><strong>went<\/strong><\/td>\n<td align=\"LEFT\">up (50%)<\/td>\n<td align=\"LEFT\">to (50%)<\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<td align=\"LEFT\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>All of the other words in the rhyme are followed by only one word (100% probability).<\/p>\n<p>Let&#8217;s now create some example chains from this data, using the word <strong>Jack<\/strong> as a starting point. We&#8217;ll stop at 7 words for the sake of simplicity.<\/p>\n<p><em>Jack fell down and home did trot\u00a0 (80%)<\/em><br \/>\n<em>Jack fell down and brown paper (76%)<\/em><br \/>\n<em>Jack fell down and Jill came tumbling (75%)<\/em><br \/>\n<em>Jack fell down and broke his crown (72%)<\/em><br \/>\n<em>Jack fell down and broke his head (72%)<\/em><br \/>\n<em>Jack fell down and bound his crown (72%)<\/em><br \/>\n<em>Jack fell down and bound his head (72%)<\/em><br \/>\n<em><\/em><\/p>\n<p>All of these chains could potentially be generated from the text, although some are more likely than others.<\/p>\n<p>Here&#8217;s a breakdown of the first chain:<\/p>\n<p><strong>Jack<\/strong> has a 66% chance of being followed by <strong>fell<\/strong><br \/>\n<strong>fell <\/strong>has a 100% chance of being followed by <strong>down<br \/>\ndown<\/strong> has a 100% chance of being followed by <strong>and<br \/>\nand<\/strong> has a 16% chance of being followed by <strong>home<br \/>\nhome<\/strong> has a 100% chance of being followed by <strong>did<br \/>\ndid<\/strong> has a 100% chance of being followed by <strong>trot<\/strong><\/p>\n<p><strong><\/strong>The average of these percentages is around <strong>80%<\/strong>.<br \/>\nThe actual probability of this result is around <strong>10%<\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A couple of years ago, I worked on a natural language processing application along with fellow C# developer Michael Cronin. While working on this, I did extensive research into Markov Chains. Although there are many articles on the internet, I &hellip; <a href=\"http:\/\/www.silverclaw.net\/?p=206\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[12],"tags":[],"_links":{"self":[{"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=\/wp\/v2\/posts\/206"}],"collection":[{"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=206"}],"version-history":[{"count":6,"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=\/wp\/v2\/posts\/206\/revisions"}],"predecessor-version":[{"id":213,"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=\/wp\/v2\/posts\/206\/revisions\/213"}],"wp:attachment":[{"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=206"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=206"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.silverclaw.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=206"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}