Andrey Kurenkov's Web World Jekyll 2017-10-23T04:08:22-04:00 / Andrey Kurenkov / contact@andreykurenkov.com <![CDATA[Some Thoughts on Meditation]]> /writing/some-thoughts-on-meditation 2017-10-22T19:19:34-04:00 2017-10-22T19:19:34-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>I’ve had a rough few months. Work, work, work, a failed research project, broken laptops, moving, PhD applications, financial mishaps… it’s been a stressful time. Bummer. But, it’s not big deal, this too shall pass, etc - the bummerness of the situation is not the point. The point is that as has happened before and will happen again, this current tired predicament has given me motivation to take up something I have been meaning to do for a while: writing about meditation.</p> <p>Meditation - or at least the entirely secular non new-agey kind - has long intrigued me as a seemingly scientifically backed method for being calmer, healthier, and perhaps wiser. Particularly for a habitually working-too-much-and-taking-life-too-seriously type such as myself, it’d clearly be the height of rationality to go ahead and take up such an objectively good practice. Still, I did not start trying it out until two years ago after first hearing about and then reading a <a href="http://www.10percenthappier.com/mindfulness-meditation-the-basics/">best-seller</a> specifically aimed at selling anti-new-age skeptics like myself on the idea. The book was alright, a sort of shallow crowd pleaser, but it did convince my rational self it was time to do this thing.</p> <p>So now that I’ve done it on and off for a couple of years, I feel compelled to write this: it works. It calms me down, improves my focus, gives me ideas, and is all around a worthwhile practice. Sitting up with eyes closed and attention focused on the breath, something so simple, invariably leads to an unmistakable change in my physical and emotional state. So much so that it has become a regular goto destressing practice akin to reading, excercise, or spending time with good friends. And as the last few months have been rough, it has been particularly helpful, and I have become particularly motivated to speak of its helpfulness.</p> <p>Mind you, none of this is profound. Meditation has not changed me deeply, has not inched me closer to enlightnment, none of that. And what I refer to as meditation is just about the most basic and straighforward practice that can be called that, nothing at all advanced. And on top of that, much like excercise I am not even that good about consistenly doing it.</p> <p>But, like excercise, whenever I do it I am impressed with how undeniably beneficial it is for me and wish I did it more. It really is a foolproof way of becoming less stressed, especially when the stress is a big heavy pile that’s been building up too long. And it’s so easy, especially in our modern there-is-an-app-for-everything times. I rather liked <a href="https://www.calm.com/">calm</a> when getting started with it, and <a href="https://www.headspace.com/">headspace</a> is similarly good (nowdays I use <a href="https://insighttimer.com/">Insight Timer</a>).</p> <p>So, if you’ve read this far, all this obviously goes to say - maybe you should try it too. I am certainly glad I did.</p> <p><a href="/writing/some-thoughts-on-meditation/">Some Thoughts on Meditation</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on October 22, 2017.</p> <![CDATA[DeformNet, Or A Tale of Broken Chairs]]> /writing/deformnet-or-a-tale-broken-chairs 2017-10-18T19:19:34-04:00 2017-10-18T19:19:34-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>I spent about half of his year so far devoted to my first major research project at Stanford - DeformNet. The gist of the project was to create 3D models of objects based on a single 2D image of the object, by deforming the 3D model of a similar object. It took months to get working properly, but all that work eventually led to a paper - “DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image” - with its very own <a href="https://arxiv.org/abs/1708.04672">Arxiv post</a> and <a href="https://deformnet-site.github.io/DeformNet-website/">project page</a>. But, it also led to much more. It led to horrific, bizzare, and delightful failures.</p> <p>All research projects, and projects in general, face a sequence of failures before their eventual success (or demise). Unlike the eventual success, the failures are typically barely catalogued, and certainly not freely shared with the world. But not DeformNet! Due to its nature as computer vision research, and in particular research on deforming 3D models of objects, DeformNet generated uniquelly broken and surreal imagery in its many months of development. So much so, that I was inspired to capture and share said imagery with my Friends on facebook. And now, dear reader, I shall share it with you! Without further ado, or any further context or explanation, please enjoy:</p> <figure class="figure"><figcaption class="figure__caption"><p>Beautiful Arrows</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/2016-4-15-a-brief-history-of-game-ai/16797445_1438988406114168_9212772470262986498_o.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16797445_1438988406114168_9212772470262986498_o.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Spread 1</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/2016-4-15-a-brief-history-of-game-ai/16807616_1442442285768780_7897771771545166109_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16807616_1442442285768780_7897771771545166109_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Spread 2</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/2016-4-15-a-brief-history-of-game-ai/16830718_1442442282435447_7909518336960080635_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16830718_1442442282435447_7909518336960080635_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Spread 3</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/16831179_1442442279102114_4063186804650971445_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16831179_1442442279102114_4063186804650971445_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Spread 4</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/16832243_1442442312435444_2730660855479305093_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16832243_1442442312435444_2730660855479305093_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Bent Out of Shape</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/2016-4-15-a-brief-history-of-game-ai/16716040_1442443249102017_6471384390721714604_o.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16716040_1442443249102017_6471384390721714604_o.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Split</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/16864203_1447351921944483_8878398950718576730_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16864203_1447351921944483_8878398950718576730_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Modernist Chair</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/16806900_1452334068112935_5140427923114788905_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16806900_1452334068112935_5140427923114788905_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>The First</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17021757_1455820011097674_3621251967274797044_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17021757_1455820011097674_3621251967274797044_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Perfect Correspondence</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17202800_1459767327369609_4492442434375103005_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17202800_1459767327369609_4492442434375103005_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Color Spiral</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/16939623_1459767330702942_4073595802082156353_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/16939623_1459767330702942_4073595802082156353_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Strech</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17309979_1467615326584809_4257681561614777357_o.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17309979_1467615326584809_4257681561614777357_o.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Chair Growths</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17621968_1483241391688869_9139772162230816003_o.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17621968_1483241391688869_9139772162230816003_o.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Chair Growths 2</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17622114_1483241395022202_135454706992583524_o.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17622114_1483241395022202_135454706992583524_o.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Chair Hallucinations 1</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17435863_1483241398355535_2328838522499450722_o.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17435863_1483241398355535_2328838522499450722_o.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Chair Hallucinations 1</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17621744_1483241425022199_7862745114641406764_o.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17621744_1483241425022199_7862745114641406764_o.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Foreshortening</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/18058151_1518098891536452_6182864029848832977_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18058151_1518098891536452_6182864029848832977_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Relaxing Beach Chair</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17992160_1518098894869785_3643613128276992654_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17992160_1518098894869785_3643613128276992654_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Unfriendly Chair</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/18118890_1518098898203118_7556374425933627554_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18118890_1518098898203118_7556374425933627554_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Elephant Chair</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/17992348_151809 8921536449_8297748936321495132_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/17992348_1518098921536449_8297748936321495132_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Perfect Correspondence 2</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/18194139_1521245274555147_6387873167953340635_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18194139_1521245274555147_6387873167953340635_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Uncomfortable Chair</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/18423960_1536082143071460_5562000232042538980_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18423960_1536082143071460_5562000232042538980_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Chill Plane</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/18519900_1542003065812701_69998419652910827_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18519900_1542003065812701_69998419652910827_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Alien Counch</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/18486356_1542003069146034_8832229143760253451_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18486356_1542003069146034_8832229143760253451_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Melted Car</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/18527562_1542003072479367_757465147288662663_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18527562_1542003072479367_757465147288662663_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Old Counch</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/ 18581869_1542003169146024_415512911450005262_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18581869_1542003169146024_415512911450005262_n.jpg" alt="History" /></a></p> </div></figure> <figure class="figure"><figcaption class="figure__caption"><p>Success!</p> </figcaption><div class="figure__main"> <p><a href="/writing/images/ 18556281_1542003282479346_5657773445370981127_n.jpg"><img class="postimageactual" src="/writing/images/2017-10-15-deformnet-or-a-tale-broken-chairs/18556281_1542003282479346_5657773445370981127_n.jpg" alt="History" /></a></p> </div></figure> <p><a href="/writing/deformnet-or-a-tale-broken-chairs/">DeformNet, Or A Tale of Broken Chairs</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on October 18, 2017.</p> <![CDATA[KerasJS3D]]> /projects/major_projects/KerasJS3D 2017-08-18T00:00:00-04:00 2017-08-18T00:00:00-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>Not much to say, click on the demo link (hopefully it still works…)! A fun little project.</p> <p><a href="/projects/major_projects/KerasJS3D/">KerasJS3D</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on August 18, 2017.</p> <![CDATA[DeformNet]]> /projects/research/deformnet 2017-08-11T00:00:00-04:00 2017-08-11T00:00:00-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>I was not the one to have the original idea, but mostly the one to grind out and iterate a lot to get it working. See the Arxiv post and website for more details.</p> <p><a href="/projects/research/deformnet/">DeformNet</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on August 11, 2017.</p> <![CDATA[Moving On, Looking Back]]> /writing/moving-on-looking-back 2017-07-28T19:19:34-04:00 2017-07-28T19:19:34-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>For a while now, I have been living two lives at once: that of a full time software engineer working at Oracle, and that of a CS Masters grad student at Stanford. In one life, finishing the first release of a brand new product with a small team of esoteric engineers scattered across a few different state. In the other, trying to keep up with classwork and delve deeper into research in AI. Granted, I was officially only a grad student part time, but it certainly did not feel like it.</p> <p>Well, no more. As of two weeks ago, my time at Oracle has come to an end - I am now a full time grad student at Stanford. Being the sentimental guy that I am, I decided to write up a little blog post to commemorate the occasion - and hopefully this also mark a return to writing more stuff on this site, as I have not done in more than a year now.</p> <p>It was an odd choice, going to Oracle. I had focused extensively on robotics research and AI in my undergrad, and had a perfect little plan to spend a bit of time working on robotics in industry before most likely going back to grad school. Long story short, the robotics interviews did not pan out and I went with the backup option of just doing some data-related software engineering.</p> <p>Though without a hint of relation to AI, the job at Oracle did offer the exciting prospect of working on a prototype of a product rather than small iterative feature development or bug fixing as would be the case in almost all entry level positions. But, within a short period there it became clear to me the progress being made in robotics and AI is far too exciting to not get back into that world. So I applied to Stanford’s HCP program, which allows unusually enterprising young men such as myself to get a CS Master’s degree while working full time.</p> <p>The plan was to stick around Oracle long enough to hopefully see our product through to release, while getting back into research and exploring possibilities for working in AI. And the plan worked! I managed to survive doing both long enough to <a href="http://www.oracle.com/technetwork/server-storage/sun-unified-storage/downloads/systems-manager-zfs-3711217.html">see our product through to a beta release</a>. At the same time, I have become a research assistent at the Stanford AI lab (my name is listed <a href="http://cvgl.stanford.edu/people.html">here</a> and everything).</p> <p>And here I am, now. Research and working with robots has felt like the right thing for me to do since I’ve gotten back to it, so I expect to continue with this particular focus indefinitely. Now, there is just the small matter of figuring out exactly what specific contributions I can make in this field… all in good time.</p> <p><a href="/writing/moving-on-looking-back/">Moving On, Looking Back</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on July 28, 2017.</p> <![CDATA[OSM]]> /projects/team_projects/oracle-systems-manager 2017-07-14T00:00:00-04:00 2017-07-14T00:00:00-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>As documented in my <a href="http://127.0.0.1:4000/writing/moving-on-looking-back/">more sentimental</a> post on this, my choice to go work at Oracle for a while was a bit odd but ultimately rewarding. Working on a prototype and eventually an entire new product from scratch was a rare opportunity and a rewarding project. We prototyped a microservices driven concept on the side for about nine months with fancy new tools such as Docker and the various popular messaging queues and NoSQL DBs, before being given the green light to go ahead and develop the monitoring and management tool in Java as a proper product. In just about a year, we had finished a polished mostly-bug-free complex product that did a lot of data crunching and visualization. And we did this as a small team of about a dozen engineers, so I got to have broad and significant influence over the development of the backend. Even if I eventually left to go back to research (and the project was ultimately scrapped, after beta release), I think fondly of how we started with zero code and built a really solid management tool in just a year.</p> <p><a href="/projects/team_projects/oracle-systems-manager/">OSM</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on July 14, 2017.</p> <![CDATA[ObjectCropBot]]> /projects/hacks/objectcropbot 2017-02-18T00:00:00-05:00 2017-02-18T00:00:00-05:00 www.andreykurenkov.com contact@andreykurenkov.com <p>I had already had experience with DeepMask when starting on this, since my project for Stanford’s AI class was to modify DeepMask to see if it could be used to crop objects with a single click. My conclusion from that was that I was still better off using Facebook’s approach, but my experience with AWS for that project came in handy. Specifically, I reused my previous AWS DeepMask computing solution: paying for an AWS EC2 instance with a GPU, installing the relevant dependencies on it, and making it host a REST server. I had to modify the DeepMask code slightly, but for the most part I just set it up to run on the cloud and used Facebook’s pretrained models.</p> <p>The bulk of the remaining work was in implementing the http://objectcropbot.com/ web demo. I made this with basic HTML/CSS/JS parts, except for the excellent open source visual cropping library Cropper.js. After an all-nighter tweaking the demo to look and feel right, I set it up to be hosted on GitHub and arranged the domain to be what it is after buying it from NameCheap.</p> <p><a href="/projects/hacks/objectcropbot/">ObjectCropBot</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on February 18, 2017.</p> <![CDATA[IMDB Data Viz]]> /projects/hacks/imdb-data-viz 2017-02-18T00:00:00-05:00 2017-02-18T00:00:00-05:00 www.andreykurenkov.com contact@andreykurenkov.com <p>See the linked writeup. A fun little project - gotta love classes that let the student decide on what to work on for their project.</p> <p><a href="/projects/hacks/imdb-data-viz/">IMDB Data Viz</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on February 18, 2017.</p> <![CDATA[DeepCrop]]> /projects/major_projects/deepcrop 2016-12-16T04:26:22-05:00 2016-12-16T04:26:22-05:00 www.andreykurenkov.com contact@andreykurenkov.com <p>See the quite in-depth attached documents. In the end the outcome was not that impresive, but it was quite fun to do and a good chance to play with Deep Learning.</p> <p><a href="/projects/major_projects/deepcrop/">DeepCrop</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on December 16, 2016.</p> <![CDATA[IMDB Data Visualizations with D3 + Dimple.js]]> /writing/visualizing-imdb-data-with-d3 2016-08-10T19:19:34-04:00 2016-08-10T19:19:34-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p><em>Notes: not optimized for mobile (or much else). Full page version <strong><a href="/writing/files/2016-08-10-visualizing-imdb-data-with-d3/standalone_page.html">here</a></strong>, visualization code <strong><a href="https://github.com/andreykurenkov/imdb-data-viz">here</a></strong>. I don’t get into the technical aspects here, but feel free to take a look.</em></p> <div id="genreChartContainer" class="chartContainer"> <script type="text/javascript"> /*Start on 1915 because prior too few movies are listed to make them a fair comparison to modern times*/ var start_year = 1915; /*End on 2013 due to a strange dive towards zero in 2014 and 2015 I cannot explain or guarantee is not due to flawed data. At first I included the dip but received feedback it is best to remove it to avoid confusion, and then removed it.*/ var end_year = 2013; //Get from localhost, perhaps change to github later var data_source = "/writing/files/2016-08-10-visualizing-imdb-data-with-d3/data/yearly_data.tsv"; var name = "IMDB Yearly Movie And Genre Counts (1915-2013)"; createGenreChart("#genreChartContainer", data_source, name, start_year, end_year); </script> </div> <form class="form" id="genreToggleForm"> <div class="switch-field"> <!-- <div class="switch-title">Display Type</div> --> <input type="radio" id="switch_left" name="switch" value="yes" checked="" /> <label for="switch_left">Counts</label> <input type="radio" id="switch_right" name="switch" value="no" /> <label for="switch_right">Percents</label> </div> </form> <p>And there it is! IMDB data<sup id="fnref:gotten_with"><a href="#fn:gotten_with" class="footnote">1</a></sup> visualized with <a href="https://d3js.org/">D3</a>, or more precisely with the D3-powered <a href="http://dimplejs.org/">Dimple.js</a>. The data is minimally cleaned up by filtering for movies that have at least one vote and associated length information, and info on TV episodes or shows is also not included, but the data is otherwise directly (after parsing) from IMDB. The legend is interactive (try clicking the rectangles!).</p> <p>As you can see this chart visualizes the number of genre movie releases between 1915 and 2013<sup id="fnref:why_years"><a href="#fn:why_years" class="footnote">2</a></sup>, as well as the total number of movies in those years. A single movie may be associated with zero, one, or multiple genres and so the ‘Total Movies’ line corresponds to actual movie counts and every colored-in area represents the number of movies tagged with that genre for that year. The clear conclusion is that there has been an explosion in film production from the 90s onward, for which I have some theories<sup id="fnref:theories"><a href="#fn:theories" class="footnote">3</a></sup> but no definitive explanation. Beyond the big takeaway there are a multitude of possible smaller conclusions regarding the relative popularity of genres and movies overall, which was really my intent in making such an open-ended visualization.</p> <p>There is a ton more that can be done with the data. The direction I decided to go with it was to explore various aspects of more recent data rather than more aspects related to change over time. I would love to eventually add controls to view any year range for all the following charts<sup id="fnref:nontrivial"><a href="#fn:nontrivial" class="footnote">4</a></sup>, but they still reveal some interesting aspects about modern movie production and IMDB metrics.</p> <p>An obvious place to start is with looking at how rating data is distributed, and the answer is delightfully normal:</p> <div id="ratingChartContainer" class="chartContainer"> <script type="text/javascript"> createLineChart("#ratingChartContainer", "/writing/files/2016-08-10-visualizing-imdb-data-with-d3/data/rating_data.tsv", false, "IMDB Average Movie Rating Distribution (2003-2013) ", "rating", false, "Average IMDB User Rating"); </script> </div> <p>Yep, a bell curve-ish<sup id="fnref:bell_curve"><a href="#fn:bell_curve" class="footnote">5</a></sup> kinda shape! Not overly suprising to see that most movies are rated as mediocre/good and the frequency flattens out at either extreme. Next, a slightly more fun shape from the length distribution:</p> <div id="lengthChartContainer" class="chartContainer"> <script type="text/javascript"> createLineChart("#lengthChartContainer", "/writing/files/2016-08-10-visualizing-imdb-data-with-d3/data/length_data.tsv", true, "IMDB Movie Length Distribution (2003-2013)", "length", false, "Length (minutes)", "max"); </script> </div> <p>Ah, what a nice regularly spiky shape<sup id="fnref:fourier"><a href="#fn:fourier" class="footnote">6</a></sup>. It’s logical that most movies hit the 90-minute mark, though it seems likely that simplified data entry also brings about the periodicity here. The chart is a bit of a mess as a line graph, so it makes sense to clean it up by binning the data quite a bit more:</p> <div id="lengthBinChartContainer" class="chartContainer"> <script type="text/javascript"> createHistChart("#lengthBinChartContainer", "/writing/files/2016-08-10-visualizing-imdb-data-with-d3/data/length_data_hist.tsv", true, "IMDB Movie Length Distribution (2003-2013) ", "length", false, "Length (minutes)"); </script> </div> <p>And there it is, hiding in that data was another sort-of bell curve. Except of course for that first bar - IMDB apparently has a large amount of shorter 0-20 minute film entries as well. No doubt short films are part of this, though it’s unclear why there are quite so many. As with many aspects of the data, it could be explored more deeply and filtered more thouroughly to focus on a specific subset of films. But, that’s for another day. For now I continued my visualization quest by looking into the vote distribution:</p> <div id="votesChartContainer" class="chartContainer"> <script type="text/javascript"> createLineChart("#votesChartContainer", "/writing/files/2016-08-10-visualizing-imdb-data-with-d3/data/votes_data.tsv", true, "IMDB Movie Vote Count Distribution (2003-2013) ", "votes", true, "IMDB User Vote Count"); </script> </div> <p>Yes astute reader<sup id="fnref:corny"><a href="#fn:corny" class="footnote">7</a></sup>, that is indeed a log-scale on the x axis. Unsurprisingly, the number of votes for any given film declines exponentially - very few of those thousands of movies in the first graph are blockbusters<sup id="fnref:again"><a href="#fn:again" class="footnote">8</a></sup>. As with the histogram above the continuous data is in fact binned for counting, but in this case there are enough bins that it makes sense to smooth out into a line. Once again the data can also be shown via a histogram with fewer bins:</p> <div id="votesBinChartContainer" class="chartContainer"> <script type="text/javascript"> createHistChart("#votesBinChartContainer", "/writing/files/2016-08-10-visualizing-imdb-data-with-d3/data/votes_data_hist.tsv", true, "IMDB Movie Vote # Distribution (2003-2013) ", "votes", true, "IMDB User Vote Count"); </script> </div> <p>Lastly, I explored the distribution of budgets within the data <sup id="fnref:budgets"><a href="#fn:budgets" class="footnote">9</a></sup>. I was originally inspired to look into movie data based on <a href="http://flavorwire.com/492985/how-the-death-of-mid-budget-cinema-left-a-generation-of-iconic-filmmakers-mia">an article</a> that discussed the death of mid-budget-cinema, and of course I wanted to look into the data and see the phenomenon myself. The result once again demands a log-scale and reveals a certain periodicity:</p> <div id="budgetChartContainer" class="chartContainer"> <script type="text/javascript"> createLineChart("#budgetChartContainer", "/writing/files/2016-08-10-visualizing-imdb-data-with-d3/data/budget_data.tsv", true, "IMDB Movie Budget Distribution (2003-2013) ", "budget", true, "Budget (USD)", "average"); </script> </div> <p>The data does<sup id="fnref:plural"><a href="#fn:plural" class="footnote">10</a></sup> not seem to back the notion of mid-budget movies dying, since one peak is at about 1m, but then again as said before the data is not particularly carefully filtered. There being a ton of less-than one million budget movies certainly explains how such an explosion in movie production might have been possible in the past twenty years. That guess shall hopefully be further explored in future posts, but for now I will finish with a final simplified histogram:</p> <div id="budgetBinChartContainer" class="chartContainer"> <script type="text/javascript"> createHistChart("#budgetBinChartContainer", "/writing/files/2016-08-10-visualizing-imdb-data-with-d3/data/budget_data_hist.tsv", true, "IMDB Movie Budget Distribution (2003-2013) ", "budget", true, "Budget (USD)"); </script> </div> <h2 id="what-i-learned">What I Learned</h2> <p>And now time for everyone’s favorite part of the book report. In truth I prepared the genre chart for Udacity’s an online class, <a href="https://www.udacity.com/course/data-visualization-and-d3js--ud507">Data Visualization with D3.JS</a>. I completed the class as part of Udacity’s Data Analyst Nanodegree, and as with my <a href="http://www.andreykurenkov.com/writing/fun-visualizations-of-stackoverflow/">previous</a> <a href="http://www.andreykurenkov.com/writing/power-of-ipython-pandas-scikilearn/">posts</a> based off projects for the nanodegree I felt that I learned<sup id="fnref:learned"><a href="#fn:learned" class="footnote">11</a></sup> a useful technology and got the chance to complete a fun project with it worthy of cataloguing. I have a few key takeaways from having now completed the project:</p> <ul> <li>It’s way easier to do data exploration and visualization via RStudio or IPython than D3. Perhaps this is not true for others, but I was surprised by how high level and unstreamlined D3 is for typical visualization tasks. Of course this is part of its power and the reason that higher-abstraction libraries like Dimple.js got built, but on balance I still felt that using JavaScript, HTML, and a browser was not nearly as elegant as RStudio. As someone only mildly experienced with web-dev, the prior classes on R and Pandas+IPython made me feel much more empowered to play with data easily.</li> <li>D3 allows for interactivity, but interactvity is not always needed. This one is rather obvious but why not still spell it out. All but the genre chart here could have comfortably been PNGs files (as in my previous visualization posts), and not lost much. Still, allowing for interactivity does open up a considerable amount of possibilies and in particular is good for open ended data visualization without a single particularly concrete point.</li> <li>Aggregation over hundreds of thousands of data points in JS is probably a bad idea. The year-grouped data for the genre graph was originally completely computed in JS when I submitted my project to Udacity. This took painful seconds, which was not helped by my meek laptop. Again unsurprising, but I did feel a tinge of annoyance at realizing I would be best off writing a python script to pre-process my data into multiple CSV files ready for charting without much manipulation.</li> </ul> <h2 id="bloopers">Bloopers</h2> <p>You did not ask for them, and I delivered. Here are a couple of silly moments during the creation of this<sup id="fnref:high_five"><a href="#fn:high_five" class="footnote">12</a></sup>. Hope you enjoyed!</p> <figure class="figure"><div class="figure__main"> <p><a href="/writing/images/2016-08-10-visualizing-imdb-data-with-d3/oops.png"><img class="postimageactual" src="/writing/images/2016-08-10-visualizing-imdb-data-with-d3/oops.png" alt="oops" /></a></p> </div><figcaption class="figure__caption"><p>Making small ordering mistakes unsurisingly had major glitchy implications…</p> </figcaption></figure> <figure class="figure"><div class="figure__main"> <p><a href="/writing/images/2016-08-10-visualizing-imdb-data-with-d3/great.png"><img class="postimageactual" src="/writing/images/2016-08-10-visualizing-imdb-data-with-d3/great.png" alt="great" /></a></p> </div><figcaption class="figure__caption"><p>… which were worse in some cases than others.</p> </figcaption></figure> <figure class="figure"><div class="figure__main"> <p><a href="/writing/images/2016-08-10-visualizing-imdb-data-with-d3/step1.png"><img class="postimageactual" src="/writing/images/2016-08-10-visualizing-imdb-data-with-d3/step1.png" alt="step1" /></a></p> </div><figcaption class="figure__caption"><p>For a while I thought to plot the binned data with tiny little cute bins via step interpolation…</p> </figcaption></figure> <figure class="figure"><div class="figure__main"> <p><a href="/writing/images/2016-08-10-visualizing-imdb-data-with-d3/step2.png"><img class="postimageactual" src="/writing/images/2016-08-10-visualizing-imdb-data-with-d3/step2.png" alt="step2" /></a></p> </div><figcaption class="figure__caption"><p>… but evidently changed my mind.</p> </figcaption></figure> <h2 id="notes">Notes</h2> <div class="footnotes"> <ol> <li id="fn:gotten_with"> <p>(gotten with <a href="https://github.com/andreykurenkov/data-movies">this code</a>) <a href="#fnref:gotten_with" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:why_years"> <p>The cutoff years are chosen due to there being very few movies comparable to modern films prior to 1915, and the data possibly being incomplete post 2013 <a href="#fnref:why_years" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:theories"> <p>Primarily that films are cheaper to make due to digital technology and that IMDB tacks more modern movies better <a href="#fnref:theories" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:nontrivial"> <p>This is nontrivial for various boring reasons and I have too many side-projects as it is… <a href="#fnref:nontrivial" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:bell_curve"> <p>Fine, a somewhat offset and wobbly bell curve, but still looks pretty good. <a href="#fnref:bell_curve" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:fourier"> <p>Don’t you just want to take the fourier transform of it? <a href="#fnref:fourier" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:corny"> <p>Too corny? I shan’t apologize, this is my site! <a href="#fnref:corny" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:again"> <p>Again, something warranting a deeper dive someday. <a href="#fnref:again" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:budgets"> <p>Many movies did not have associated budget data, but that still left thousands that did <a href="#fnref:budgets" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:plural"> <p>I know, ‘do’ is grammatically correct here, but then natural speech is largely nonsensical so who cares <a href="#fnref:plural" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:learned"> <p>Well, learned a little… <a href="#fnref:learned" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:high_five"> <p>Wow, did you actually read all the text and are still reading it? High five. <a href="#fnref:high_five" class="reversefootnote">&#8617;</a></p> </li> </ol> </div> <p><a href="/writing/visualizing-imdb-data-with-d3/">IMDB Data Visualizations with D3 + Dimple.js</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on August 10, 2016.</p> <![CDATA[VividVr]]> /projects/hacks/vividvr 2016-06-17T00:00:00-04:00 2016-06-17T00:00:00-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>The presentation slides or photos below sum it up; basically we tried to glue together some cutting edge open source projects to get something quite novel. Trying to do so in 24 hours proved tough, and we only go as far as running the separate programs but not to the full pipeline. Something like this may still be worth exploring, though I don’t think this is the best approach to go about it.</p> <p><a href="/projects/hacks/vividvr/">VividVr</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on June 17, 2016.</p> <![CDATA[The Power of IPython Notebook + Pandas + and Scikit-learn]]> /writing/power-of-ipython-pandas-scikilearn 2016-06-10T19:19:34-07:00 2016-06-10T19:19:34-07:00 www.andreykurenkov.com contact@andreykurenkov.com <p>IPython Notebook, Numpy, Pandas, MongoDB, R — for the better part of a year now, I have been trying out these technologies as part of Udacity’s <a href="https://www.udacity.com/course/data-analyst-nanodegree--nd002">Data Analyst Nanodegree</a>. My undergrad education barely touched on data visualization or more broadly data science, and so I figured being exposed to the aforementioned technologies would be fun. And fun it has been, with R’s powerful IDE-powered data mundging and visualization techniques having been particularly revelatory. I learned enough of R to create <a href="/writing/fun-visualizations-of-stackoverflow/">some complex visualizations</a>, and was impressed by how easy is to import data into its Dataframe representations and then transform and visualize that data. I also thought RStudio’s paradigm of continuously intermixed code editing and execution was superior to my habitual workflow of just endlessly cycling between tweaking and executing of Python scripts.</p> <figure class="figure"><div class="figure__main"> <p><a href="/writing/images/2016-06-10-power-of-ipython-pandas-scikitlearn/rstudio.png"><img class="postimageactual" src="/writing/images/2016-06-10-power-of-ipython-pandas-scikitlearn/rstudio.png" alt="History" /></a></p> </div><figcaption class="figure__caption"><p>The RStudio IDE</p> </figcaption></figure> <p>Still, R is a not-quite-general-purpose-language and I hit upon multiple instances in which simple things were hard to do. In such times, I could not help but miss the powers of Python, a language I have tons of experience with and which is about as general purpose as it gets. Luckily, the courses also covered the equivalent of an R implementation for Python: the Python Data Analysis Library, Pandas. This let me use the features of R I now liked — dataframes, powerful plotting methods, elegant methods for transforming data — with Python’s lovely syntax and libraries I already knew and loved. And soon I got to do just that, using both Pandas and the supremely good Machine Learning package Scikit-learn for the final project of <a href="https://www.udacity.com/course/intro-to-machine-learning--ud120">Udacity’s Intro to Machine Learning Course</a>. Not only that, but I also used IPython Notebook for RStudio-esque intermixed code editing and execution and nice PDF output.</p> <p>I had such a nice experience with this combination of tools that I decided to dedicate a post to it, and what follows is mostly a summation of that experience. Reading it should be sufficient to get a general idea for why these tools are useful, whereas a much more detailed introdution and tutorial for Pandas can be found elsewhere (for instance <a href="http://nbviewer.jupyter.org/github/fonnesbeck/pytenn2014_tutorial/blob/master/Part%201.%20Data%20Wrangling%20with%20Pandas.ipynb">here</a>). Incidentally, this whole post was written in IPython Notebook and the source of that <a href="http://www.andreykurenkov.com/writing/files/2016-06-10-power-of-ipython-pandas-scikilearn/post.ipynb">can be found here</a> with the produced HTML <a href="http://www.andreykurenkov.com/writing/files/2016-06-10-power-of-ipython-pandas-scikilearn/post.html">here</a>.</p> <h2 id="data-summarization">Data Summarization</h2> <p>First, a bit about the project. The task was to first explore and clean a given dataset, and then train classification models using it. The dataset contained dozens of features about roughly 150 important employees from the <a href="https://en.wikipedia.org/wiki/Enron_scandal">notoriously corrupt</a> company Enron, witch were classified as either a “Person of Interest” or not based on the outcome of investigations into Enron’s corruption. It’s a tiny dataset and not what I would have chosen, but such were the instructions. The data was provided in a bunch of Python dictionaries, and at first I just used a Python script to change it into a CSV and started exploring it in RStudio. But, it soon dawned on me that I would be much better off just working entirely in Python, and the following code is taken verbatim from my final project submission.</p> <p>And so, the code. Following some imports and a ‘%matplotlib notebook’ comment to allow plotting within IPython, I loaded the data using pickle and printed out some basic things about it (not yet using Pandas):</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span> <span class="kn">import</span> <span class="nn">matplotlib</span> <span class="kn">import</span> <span class="nn">pickle</span> <span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span> <span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">display</span> <span class="o">%</span><span class="n">matplotlib</span> <span class="n">notebook</span></code></pre></div> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">enron_data</span> <span class="o">=</span> <span class="n">pickle</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s">&quot;./ud120-projects/final_project/final_project_dataset.pkl&quot;</span><span class="p">,</span> <span class="s">&quot;rb&quot;</span><span class="p">))</span> <span class="k">print</span><span class="p">(</span><span class="s">&quot;Number of people: </span><span class="si">%d</span><span class="s">&quot;</span><span class="o">%</span><span class="nb">len</span><span class="p">(</span><span class="n">enron_data</span><span class="o">.</span><span class="n">keys</span><span class="p">()))</span> <span class="k">print</span><span class="p">(</span><span class="s">&quot;Number of features per person: </span><span class="si">%d</span><span class="s">&quot;</span><span class="o">%</span><span class="nb">len</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">enron_data</span><span class="o">.</span><span class="n">values</span><span class="p">())[</span><span class="mi">0</span><span class="p">]))</span> <span class="k">print</span><span class="p">(</span><span class="s">&quot;Number of POI: </span><span class="si">%d</span><span class="s">&quot;</span><span class="o">%</span><span class="nb">sum</span><span class="p">([</span><span class="mi">1</span> <span class="k">if</span> <span class="n">x</span><span class="p">[</span><span class="s">&#39;poi&#39;</span><span class="p">]</span> <span class="k">else</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">enron_data</span><span class="o">.</span><span class="n">values</span><span class="p">()]))</span></code></pre></div> <pre><code>Number of people: 146 Number of features per person: 21 Number of POI: 18 </code></pre> <p>But working with this set of dictionaries would not be nearly as fast or easy as a Pandas dataframe, so I soon converted it to that and went ahead and summarized all the features with a single method call:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_dict</span><span class="p">(</span><span class="n">enron_data</span><span class="p">)</span> <span class="k">del</span> <span class="n">df</span><span class="p">[</span><span class="s">&#39;TOTAL&#39;</span><span class="p">]</span> <span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">transpose</span><span class="p">()</span> <span class="n">numeric_df</span> <span class="o">=</span> <span class="n">df</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">pd</span><span class="o">.</span><span class="n">to_numeric</span><span class="p">,</span> <span class="n">errors</span><span class="o">=</span><span class="s">&#39;coerce&#39;</span><span class="p">)</span> <span class="k">del</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;email_address&#39;</span><span class="p">]</span> <span class="n">numeric_df</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span></code></pre></div> <div class="post_table_div"> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>bonus</th> <th>deferral_payments</th> <th>deferred_income</th> <th>director_fees</th> <th>exercised_stock_options</th> <th>expenses</th> <th>from_messages</th> <th>from_poi_to_this_person</th> <th>from_this_person_to_poi</th> <th>loan_advances</th> <th>long_term_incentive</th> <th>other</th> <th>poi</th> <th>restricted_stock</th> <th>restricted_stock_deferred</th> <th>salary</th> <th>shared_receipt_with_poi</th> <th>to_messages</th> <th>total_payments</th> <th>total_stock_value</th> </tr> </thead> <tbody> <tr> <th>count</th> <td>81.000000</td> <td>38.000000</td> <td>48.000000</td> <td>16.000000</td> <td>101.000000</td> <td>94.000000</td> <td>86.000000</td> <td>86.000000</td> <td>86.000000</td> <td>3.000000</td> <td>65.000000</td> <td>92.000000</td> <td>145</td> <td>109.000000</td> <td>17.000000</td> <td>94.000000</td> <td>86.000000</td> <td>86.000000</td> <td>1.240000e+02</td> <td>125.000000</td> </tr> <tr> <th>mean</th> <td>1201773.074074</td> <td>841602.526316</td> <td>-581049.812500</td> <td>89822.875000</td> <td>2959559.257426</td> <td>54192.010638</td> <td>608.790698</td> <td>64.895349</td> <td>41.232558</td> <td>27975000.000000</td> <td>746491.200000</td> <td>465276.663043</td> <td>0.124138</td> <td>1147424.091743</td> <td>621892.823529</td> <td>284087.542553</td> <td>1176.465116</td> <td>2073.860465</td> <td>2.623421e+06</td> <td>3352073.024000</td> </tr> <tr> <th>std</th> <td>1441679.438330</td> <td>1289322.626180</td> <td>942076.402972</td> <td>41112.700735</td> <td>5499449.598994</td> <td>46108.377454</td> <td>1841.033949</td> <td>86.979244</td> <td>100.073111</td> <td>46382560.030684</td> <td>862917.421568</td> <td>1389719.064851</td> <td>0.330882</td> <td>2249770.356903</td> <td>3845528.349509</td> <td>177131.115377</td> <td>1178.317641</td> <td>2582.700981</td> <td>9.488106e+06</td> <td>6532883.097201</td> </tr> <tr> <th>min</th> <td>70000.000000</td> <td>-102500.000000</td> <td>-3504386.000000</td> <td>3285.000000</td> <td>3285.000000</td> <td>148.000000</td> <td>12.000000</td> <td>0.000000</td> <td>0.000000</td> <td>400000.000000</td> <td>69223.000000</td> <td>2.000000</td> <td>False</td> <td>-2604490.000000</td> <td>-1787380.000000</td> <td>477.000000</td> <td>2.000000</td> <td>57.000000</td> <td>1.480000e+02</td> <td>-44093.000000</td> </tr> <tr> <th>25%</th> <td>425000.000000</td> <td>79644.500000</td> <td>-611209.250000</td> <td>83674.500000</td> <td>506765.000000</td> <td>22479.000000</td> <td>22.750000</td> <td>10.000000</td> <td>1.000000</td> <td>1200000.000000</td> <td>275000.000000</td> <td>1209.000000</td> <td>0</td> <td>252055.000000</td> <td>-329825.000000</td> <td>211802.000000</td> <td>249.750000</td> <td>541.250000</td> <td>3.863802e+05</td> <td>494136.000000</td> </tr> <tr> <th>50%</th> <td>750000.000000</td> <td>221063.500000</td> <td>-151927.000000</td> <td>106164.500000</td> <td>1297049.000000</td> <td>46547.500000</td> <td>41.000000</td> <td>35.000000</td> <td>8.000000</td> <td>2000000.000000</td> <td>422158.000000</td> <td>51984.500000</td> <td>0</td> <td>441096.000000</td> <td>-140264.000000</td> <td>258741.000000</td> <td>740.500000</td> <td>1211.000000</td> <td>1.100246e+06</td> <td>1095040.000000</td> </tr> <tr> <th>75%</th> <td>1200000.000000</td> <td>867211.250000</td> <td>-37926.000000</td> <td>112815.000000</td> <td>2542813.000000</td> <td>78408.500000</td> <td>145.500000</td> <td>72.250000</td> <td>24.750000</td> <td>41762500.000000</td> <td>831809.000000</td> <td>357577.250000</td> <td>0</td> <td>985032.000000</td> <td>-72419.000000</td> <td>308606.500000</td> <td>1888.250000</td> <td>2634.750000</td> <td>2.084663e+06</td> <td>2606763.000000</td> </tr> <tr> <th>max</th> <td>8000000.000000</td> <td>6426990.000000</td> <td>-833.000000</td> <td>137864.000000</td> <td>34348384.000000</td> <td>228763.000000</td> <td>14368.000000</td> <td>528.000000</td> <td>609.000000</td> <td>81525000.000000</td> <td>5145434.000000</td> <td>10359729.000000</td> <td>True</td> <td>14761694.000000</td> <td>15456290.000000</td> <td>1111258.000000</td> <td>5521.000000</td> <td>15149.000000</td> <td>1.035598e+08</td> <td>49110078.000000</td> </tr> </tbody> </table> </div> <p>This high level summarization of data is one example of what Pandas can do for you. But the main strength is in how easy it is to manipulate the data and derive new things from it. The project instructed me to first summarize some things about the data, and then handle outliers. The summary indicated a large standard deviation for many of the features, and also a lot of missing values in the data for various features. First I dropped features with almost no non-null values, such as loan_advances and restricted_stock_deferred. Then, in order to investigate if any features are particularly bad in terms of outliers, I went ahead computed the standard deviation of each feature for each entry in the data, and easily got summary statistics for this data as well:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">del</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;loan_advances&#39;</span><span class="p">]</span> <span class="k">del</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;restricted_stock_deferred&#39;</span><span class="p">]</span> <span class="k">del</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;director_fees&#39;</span><span class="p">]</span> <span class="n">std</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">x</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span> <span class="o">/</span> <span class="n">x</span><span class="o">.</span><span class="n">std</span><span class="p">())</span> <span class="n">std</span> <span class="o">=</span> <span class="n">std</span><span class="o">.</span><span class="n">fillna</span><span class="p">(</span><span class="n">std</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span> <span class="n">std</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span></code></pre></div> <div class="post_table_div"> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>bonus</th> <th>deferral_payments</th> <th>deferred_income</th> <th>exercised_stock_options</th> <th>expenses</th> <th>from_messages</th> <th>from_poi_to_this_person</th> <th>from_this_person_to_poi</th> <th>long_term_incentive</th> <th>other</th> <th>poi</th> <th>restricted_stock</th> <th>salary</th> <th>shared_receipt_with_poi</th> <th>to_messages</th> <th>total_payments</th> <th>total_stock_value</th> </tr> </thead> <tbody> <tr> <th>count</th> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> <td>145.000000</td> </tr> <tr> <th>mean</th> <td>0.612134</td> <td>0.670659</td> <td>0.690552</td> <td>0.558364</td> <td>0.739307</td> <td>0.487468</td> <td>0.694769</td> <td>0.532234</td> <td>0.670577</td> <td>0.444004</td> <td>0.657200</td> <td>0.525893</td> <td>0.568830</td> <td>0.794256</td> <td>0.648079</td> <td>0.287221</td> <td>0.547885</td> </tr> <tr> <th>std</th> <td>0.587181</td> <td>0.371822</td> <td>0.409188</td> <td>0.689763</td> <td>0.537626</td> <td>0.669599</td> <td>0.549542</td> <td>0.648923</td> <td>0.491393</td> <td>0.711333</td> <td>0.751724</td> <td>0.735294</td> <td>0.659254</td> <td>0.462087</td> <td>0.582615</td> <td>0.884946</td> <td>0.774945</td> </tr> <tr> <th>min</th> <td>0.001230</td> <td>0.001025</td> <td>0.002415</td> <td>0.040311</td> <td>0.005314</td> <td>0.028674</td> <td>0.010294</td> <td>0.032302</td> <td>0.027083</td> <td>0.000058</td> <td>0.375173</td> <td>0.044846</td> <td>0.025148</td> <td>0.037736</td> <td>0.041484</td> <td>0.003077</td> <td>0.014143</td> </tr> <tr> <th>25%</th> <td>0.380270</td> <td>0.670659</td> <td>0.611358</td> <td>0.346078</td> <td>0.510059</td> <td>0.310038</td> <td>0.481671</td> <td>0.342075</td> <td>0.546392</td> <td>0.297679</td> <td>0.375173</td> <td>0.302841</td> <td>0.250755</td> <td>0.605495</td> <td>0.455283</td> <td>0.130231</td> <td>0.296228</td> </tr> <tr> <th>50%</th> <td>0.612134</td> <td>0.670659</td> <td>0.690552</td> <td>0.470558</td> <td>0.739307</td> <td>0.324161</td> <td>0.694769</td> <td>0.412024</td> <td>0.670577</td> <td>0.334411</td> <td>0.375173</td> <td>0.417338</td> <td>0.568830</td> <td>0.794256</td> <td>0.648079</td> <td>0.196170</td> <td>0.423551</td> </tr> <tr> <th>75%</th> <td>0.612134</td> <td>0.670659</td> <td>0.690552</td> <td>0.558364</td> <td>0.817162</td> <td>0.487468</td> <td>0.694769</td> <td>0.532234</td> <td>0.670577</td> <td>0.444004</td> <td>0.375173</td> <td>0.525893</td> <td>0.568830</td> <td>0.847365</td> <td>0.648079</td> <td>0.271301</td> <td>0.508700</td> </tr> <tr> <th>max</th> <td>4.715491</td> <td>4.332032</td> <td>3.103078</td> <td>5.707630</td> <td>3.786101</td> <td>7.473631</td> <td>5.324312</td> <td>5.673526</td> <td>5.097756</td> <td>7.119750</td> <td>2.647054</td> <td>6.051404</td> <td>4.669820</td> <td>3.687066</td> <td>5.062584</td> <td>10.638201</td> <td>7.004259</td> </tr> </tbody> </table> </div> <p>This result suggested that most features have large outliers (larger than 3 standard deviations). In order to be careful not to remove any useful data, I manually inspected all rows with large outliers to see any values that seem appropriate for removal:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">outliers</span> <span class="o">=</span> <span class="n">std</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">&gt;</span> <span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">any</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="n">outlier_df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="n">numeric_df</span><span class="p">[</span><span class="n">outliers</span><span class="p">]</span><span class="o">.</span><span class="n">index</span><span class="p">)</span> <span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="n">numeric_df</span><span class="o">.</span><span class="n">columns</span><span class="p">:</span> <span class="n">outlier_df</span><span class="p">[</span><span class="nb">str</span><span class="p">((</span><span class="n">col</span><span class="p">,</span><span class="n">col</span><span class="o">+</span><span class="s">&#39;_std&#39;</span><span class="p">))]</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">numeric_df</span><span class="p">[</span><span class="n">outliers</span><span class="p">][</span><span class="n">col</span><span class="p">],</span><span class="n">std</span><span class="p">[</span><span class="n">outliers</span><span class="p">][</span><span class="n">col</span><span class="p">]))</span> <span class="n">display</span><span class="p">(</span><span class="n">outlier_df</span><span class="p">)</span> <span class="n">numeric_df</span><span class="o">.</span><span class="n">drop</span><span class="p">(</span><span class="s">&#39;FREVERT MARK A&#39;</span><span class="p">,</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">df</span><span class="o">.</span><span class="n">drop</span><span class="p">(</span><span class="s">&#39;FREVERT MARK A&#39;</span><span class="p">,</span><span class="n">inplace</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span></code></pre></div> <div class="post_table_div"> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>('bonus', 'bonus_std')</th> <th>('deferral_payments', 'deferral_payments_std')</th> <th>('deferred_income', 'deferred_income_std')</th> <th>('exercised_stock_options', 'exercised_stock_options_std')</th> <th>('expenses', 'expenses_std')</th> <th>('from_messages', 'from_messages_std')</th> <th>('from_poi_to_this_person', 'from_poi_to_this_person_std')</th> <th>('from_this_person_to_poi', 'from_this_person_to_poi_std')</th> <th>('long_term_incentive', 'long_term_incentive_std')</th> <th>('other', 'other_std')</th> <th>('poi', 'poi_std')</th> <th>('restricted_stock', 'restricted_stock_std')</th> <th>('salary', 'salary_std')</th> <th>('shared_receipt_with_poi', 'shared_receipt_with_poi_std')</th> <th>('to_messages', 'to_messages_std')</th> <th>('total_payments', 'total_payments_std')</th> <th>('total_stock_value', 'total_stock_value_std')</th> </tr> </thead> <tbody> <tr> <th>DELAINEY DAVID W</th> <td>(3000000.0, 1.24731398542)</td> <td>(nan, 0.67065886001)</td> <td>(nan, 0.690552246623)</td> <td>(2291113.0, 0.121547846815)</td> <td>(86174.0, 0.6936264325)</td> <td>(3069.0, 1.3363193564)</td> <td>(66.0, 0.0127001697143)</td> <td>(609.0, 5.67352642171)</td> <td>(1294981.0, 0.635622582522)</td> <td>(1661.0, 0.333603873451)</td> <td>(True, 2.64705431598)</td> <td>(1323148.0, 0.078107486712)</td> <td>(365163.0, 0.457714373186)</td> <td>(2097.0, 0.781228126919)</td> <td>(3093.0, 0.394602217763)</td> <td>(4747979.0, 0.22391802188)</td> <td>(3614261.0, 0.0401335784062)</td> </tr> <tr> <th>FREVERT MARK A</th> <td>(2000000.0, 0.553678511813)</td> <td>(6426990.0, 4.33203246439)</td> <td>(-3367011.0, 2.95725609803)</td> <td>(10433518.0, 1.35903759241)</td> <td>(86987.0, 0.711258803121)</td> <td>(21.0, 0.319272057897)</td> <td>(242.0, 2.03617142019)</td> <td>(6.0, 0.352068179278)</td> <td>(1617011.0, 1.00881008801)</td> <td>(7427621.0, 5.00989337561)</td> <td>(False, 0.375173052658)</td> <td>(4188667.0, 1.3518014845)</td> <td>(1060932.0, 4.38570296241)</td> <td>(2979.0, 1.5297529467)</td> <td>(3275.0, 0.465071080146)</td> <td>(17252530.0, 1.54183664695)</td> <td>(14622185.0, 1.72513602468)</td> </tr> <tr> <th>HIRKO JOSEPH</th> <td>(nan, 0.612134343218)</td> <td>(10259.0, 0.644790923106)</td> <td>(nan, 0.690552246623)</td> <td>(30766064.0, 5.05623412708)</td> <td>(77978.0, 0.515871316129)</td> <td>(nan, 0.487467982744)</td> <td>(nan, 0.694769235346)</td> <td>(nan, 0.532233915598)</td> <td>(nan, 0.670576589457)</td> <td>(2856.0, 0.332743987428)</td> <td>(True, 2.64705431598)</td> <td>(nan, 0.52589323995)</td> <td>(nan, 0.568830375372)</td> <td>(nan, 0.794256482633)</td> <td>(nan, 0.648079292459)</td> <td>(91093.0, 0.266895026444)</td> <td>(30766064.0, 4.19630821004)</td> </tr> <tr> <th>KAMINSKI WINCENTY J</th> <td>(400000.0, 0.556138245963)</td> <td>(nan, 0.67065886001)</td> <td>(nan, 0.690552246623)</td> <td>(850010.0, 0.383592797689)</td> <td>(83585.0, 0.637476115725)</td> <td>(14368.0, 7.47363149225)</td> <td>(41.0, 0.274724723819)</td> <td>(171.0, 1.29672636328)</td> <td>(323466.0, 0.490226746415)</td> <td>(4669.0, 0.331439407211)</td> <td>(False, 0.375173052658)</td> <td>(126027.0, 0.454000599932)</td> <td>(275101.0, 0.0507338450054)</td> <td>(583.0, 0.503654613618)</td> <td>(4607.0, 0.980810226817)</td> <td>(1086821.0, 0.161950156636)</td> <td>(976037.0, 0.363704047455)</td> </tr> <tr> <th>LAVORATO JOHN J</th> <td>(8000000.0, 4.71549135347)</td> <td>(nan, 0.67065886001)</td> <td>(nan, 0.690552246623)</td> <td>(4158995.0, 0.21810105193)</td> <td>(49537.0, 0.100958023148)</td> <td>(2585.0, 1.07342360688)</td> <td>(528.0, 5.32431220222)</td> <td>(411.0, 3.69497297064)</td> <td>(2035380.0, 1.4936409531)</td> <td>(1552.0, 0.33368230657)</td> <td>(False, 0.375173052658)</td> <td>(1008149.0, 0.0619063591605)</td> <td>(339288.0, 0.311636142127)</td> <td>(3962.0, 2.3639931937)</td> <td>(7259.0, 2.00764222154)</td> <td>(10425757.0, 0.822328102755)</td> <td>(5167144.0, 0.277836132837)</td> </tr> <tr> <th>LAY KENNETH L</th> <td>(7000000.0, 4.02185587986)</td> <td>(202911.0, 0.495369827029)</td> <td>(-300000.0, 0.29833016899)</td> <td>(34348384.0, 5.70763022327)</td> <td>(99832.0, 0.98984158372)</td> <td>(36.0, 0.311124462355)</td> <td>(123.0, 0.668028926971)</td> <td>(16.0, 0.252141237305)</td> <td>(3600000.0, 3.30681561025)</td> <td>(10359729.0, 7.11975001798)</td> <td>(True, 2.64705431598)</td> <td>(14761694.0, 6.05140425399)</td> <td>(1072321.0, 4.44999996622)</td> <td>(2411.0, 1.0477097521)</td> <td>(4273.0, 0.851488248598)</td> <td>(103559793.0, 10.6382007936)</td> <td>(49110078.0, 7.00425896119)</td> </tr> <tr> <th>MARTIN AMANDA K</th> <td>(nan, 0.612134343218)</td> <td>(85430.0, 0.586488215565)</td> <td>(nan, 0.690552246623)</td> <td>(2070306.0, 0.16169859209)</td> <td>(8211.0, 0.997237664333)</td> <td>(230.0, 0.205748893335)</td> <td>(8.0, 0.654125583284)</td> <td>(0.0, 0.412024344462)</td> <td>(5145434.0, 5.09775639018)</td> <td>(2818454.0, 1.6932755666)</td> <td>(False, 0.375173052658)</td> <td>(nan, 0.52589323995)</td> <td>(349487.0, 0.36921495869)</td> <td>(477.0, 0.593613378808)</td> <td>(1522.0, 0.21367570973)</td> <td>(8407016.0, 0.609562657351)</td> <td>(2070306.0, 0.196202351233)</td> </tr> <tr> <th>SHAPIRO RICHARD S</th> <td>(650000.0, 0.382729377561)</td> <td>(nan, 0.67065886001)</td> <td>(nan, 0.690552246623)</td> <td>(607837.0, 0.427628659031)</td> <td>(137767.0, 1.81257710587)</td> <td>(1215.0, 0.329276547308)</td> <td>(74.0, 0.104676135645)</td> <td>(65.0, 0.237500778364)</td> <td>(nan, 0.670576589457)</td> <td>(705.0, 0.33429178227)</td> <td>(False, 0.375173052658)</td> <td>(379164.0, 0.341483782727)</td> <td>(269076.0, 0.0847481963923)</td> <td>(4527.0, 2.84349038551)</td> <td>(15149.0, 5.06258356331)</td> <td>(1057548.0, 0.165035387918)</td> <td>(987001.0, 0.362025768533)</td> </tr> <tr> <th>WHITE JR THOMAS E</th> <td>(450000.0, 0.521456472283)</td> <td>(nan, 0.67065886001)</td> <td>(nan, 0.690552246623)</td> <td>(1297049.0, 0.302304844785)</td> <td>(81353.0, 0.58906842664)</td> <td>(nan, 0.487467982744)</td> <td>(nan, 0.694769235346)</td> <td>(nan, 0.532233915598)</td> <td>(nan, 0.670576589457)</td> <td>(1085463.0, 0.446267416662)</td> <td>(False, 0.375173052658)</td> <td>(13847074.0, 5.64486498335)</td> <td>(317543.0, 0.188873972681)</td> <td>(nan, 0.794256482633)</td> <td>(nan, 0.648079292459)</td> <td>(1934359.0, 0.072623789327)</td> <td>(15144123.0, 1.80502999986)</td> </tr> </tbody> </table> </div> <p>Looking through these, I found one instance of a valid outlier - Mark A. Frevert (CEO of Enron), and removed him from the dataset.</p> <p>I should emphasize the benefits of doing all this in IPython Notebook. Being able to tweak parts of the code without reexecuting all of it and reloading all the data made iterating on ideas much faster, and iterating on ideas fast is essential for exploratory data analysis and development of machine learned models. It’s no accident that the Matlab IDE and RStudio, both tools commonly used in the sciences for data processing and analysis, have essentially the same structure. I did not understand the benefits of IPython Notebook when I was first made to use it for class assignments in College, but now it has finally dawned on me that it fills the same role as those IDEs and became popular because it is similaly well suited for working with data.</p> <h2 id="feature-visualization-engineering-and-selection">Feature Visualization, Engineering and Selection</h2> <p>The project also instructed me to choose a set of features, and to engineer some of my own. In order to get an initial idea of possible promising features and how I could use them to create new features, I computed the correlation of each feature to the Person of Interest classification:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">corr</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="o">.</span><span class="n">corr</span><span class="p">()</span> <span class="k">print</span><span class="p">(</span><span class="s">&#39;</span><span class="se">\n</span><span class="s">Correlations between features to POI:</span><span class="se">\n</span><span class="s"> &#39;</span> <span class="o">+</span><span class="nb">str</span><span class="p">(</span><span class="n">corr</span><span class="p">[</span><span class="s">&#39;poi&#39;</span><span class="p">]))</span></code></pre></div> <pre><code>Correlations between features to POI: bonus 0.306907 deferral_payments -0.075632 deferred_income -0.334810 exercised_stock_options 0.513724 expenses 0.064293 from_messages -0.076108 from_poi_to_this_person 0.183128 from_this_person_to_poi 0.111313 long_term_incentive 0.264894 other 0.174291 poi 1.000000 restricted_stock 0.232410 salary 0.323374 shared_receipt_with_poi 0.239932 to_messages 0.061531 total_payments 0.238375 total_stock_value 0.377033 Name: poi, dtype: float64 </code></pre> <p>The results indicated that ‘exercised_stock_options’, ‘total_stock_value’, and ‘bonus’ are the most promising features. Just for fun, I went ahead and plotted these features to see if I could visually verify their significance:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">numeric_df</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s">&#39;exercised_stock_options&#39;</span><span class="p">,</span><span class="n">by</span><span class="o">=</span><span class="s">&#39;poi&#39;</span><span class="p">,</span><span class="n">bins</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span><span class="n">sharex</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span><span class="n">sharey</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">plt</span><span class="o">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">&quot;exercised_stock_options by POI&quot;</span><span class="p">)</span></code></pre></div> <p><img src="" /></p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">numeric_df</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s">&#39;total_stock_value&#39;</span><span class="p">,</span><span class="n">by</span><span class="o">=</span><span class="s">&#39;poi&#39;</span><span class="p">,</span><span class="n">bins</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span><span class="n">sharex</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span><span class="n">sharey</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">plt</span><span class="o">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">&quot;total_stock_value by POI&quot;</span><span class="p">)</span></code></pre></div> <p><img src="" /></p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">numeric_df</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s">&#39;bonus&#39;</span><span class="p">,</span><span class="n">by</span><span class="o">=</span><span class="s">&#39;poi&#39;</span><span class="p">,</span><span class="n">bins</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span><span class="n">sharex</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span><span class="n">sharey</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">plt</span><span class="o">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">&quot;bonus by POI&quot;</span><span class="p">)</span></code></pre></div> <p><img src="" /></p> <p>As well as one that is not strongly correlated:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">numeric_df</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s">&#39;to_messages&#39;</span><span class="p">,</span><span class="n">by</span><span class="o">=</span><span class="s">&#39;poi&#39;</span><span class="p">,</span><span class="n">bins</span><span class="o">=</span><span class="mi">25</span><span class="p">,</span><span class="n">sharex</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span><span class="n">sharey</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">plt</span><span class="o">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">&quot;to_messages by POI&quot;</span><span class="p">)</span></code></pre></div> <p><img src="" /></p> <p>The data and plots above indicated that the exercised_stock_options, total_stock_value, and restricted_stock, and to a lesser extent to payment related information (total_payments, salary, bonus, and expenses), are all correlated to Persons of Interest. Therefore, I created new features as sums and ratios of these ones. Working with Pandas made this incredibely easy due to vectorized operations, and though Numpy could similarly make this easy I think Pandas’ Dataframe construct makes it especially easy.</p> <p>It was also easy to fix any problems with the data before starting to train machine learning models. In order to use the data for evaluation and training, I replaced null values with the mean of each feature so as to be able to use the dataset with Scikit-learn. I also scaled all features to a range of 1-0, to better work with Support Vector Machines:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c">#Get rid of label</span> <span class="k">del</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;poi&#39;</span><span class="p">]</span> <span class="n">poi</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">&#39;poi&#39;</span><span class="p">]</span> <span class="c">#Create new features</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;stock_sum&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;exercised_stock_options&#39;</span><span class="p">]</span> <span class="o">+</span>\ <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;total_stock_value&#39;</span><span class="p">]</span> <span class="o">+</span>\ <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;restricted_stock&#39;</span><span class="p">]</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;stock_ratio&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;exercised_stock_options&#39;</span><span class="p">]</span><span class="o">/</span><span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;total_stock_value&#39;</span><span class="p">]</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;money_total&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;salary&#39;</span><span class="p">]</span> <span class="o">+</span>\ <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;bonus&#39;</span><span class="p">]</span> <span class="o">-</span>\ <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;expenses&#39;</span><span class="p">]</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;money_ratio&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;bonus&#39;</span><span class="p">]</span><span class="o">/</span><span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;salary&#39;</span><span class="p">]</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;email_ratio&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;from_messages&#39;</span><span class="p">]</span><span class="o">/</span><span class="p">(</span><span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;to_messages&#39;</span><span class="p">]</span><span class="o">+</span><span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;from_messages&#39;</span><span class="p">])</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;poi_email_ratio_from&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;from_poi_to_this_person&#39;</span><span class="p">]</span><span class="o">/</span><span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;to_messages&#39;</span><span class="p">]</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;poi_email_ratio_to&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;from_this_person_to_poi&#39;</span><span class="p">]</span><span class="o">/</span><span class="n">numeric_df</span><span class="p">[</span><span class="s">&#39;from_messages&#39;</span><span class="p">]</span> <span class="c">#Feel in NA values with &#39;marker&#39; value outside range of real values</span> <span class="n">numeric_df</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="o">.</span><span class="n">fillna</span><span class="p">(</span><span class="n">numeric_df</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span> <span class="c">#Scale to 1-0</span> <span class="n">numeric_df</span> <span class="o">=</span> <span class="p">(</span><span class="n">numeric_df</span><span class="o">-</span><span class="n">numeric_df</span><span class="o">.</span><span class="n">min</span><span class="p">())</span><span class="o">/</span><span class="p">(</span><span class="n">numeric_df</span><span class="o">.</span><span class="n">max</span><span class="p">()</span><span class="o">-</span><span class="n">numeric_df</span><span class="o">.</span><span class="n">min</span><span class="p">())</span></code></pre></div> <p>Then, I scored features using Scikit-learn’s SelectKBest to get an ordering of them to test with multiple algorithms afterward. Pandas Dataframes can be used directly with Scikit-learn, which is another great benefit of it:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">sklearn.feature_selection</span> <span class="kn">import</span> <span class="n">SelectKBest</span> <span class="n">selector</span> <span class="o">=</span> <span class="n">SelectKBest</span><span class="p">()</span> <span class="n">selector</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">numeric_df</span><span class="p">,</span><span class="n">poi</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span> <span class="n">scores</span> <span class="o">=</span> <span class="p">{</span><span class="n">numeric_df</span><span class="o">.</span><span class="n">columns</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span><span class="n">selector</span><span class="o">.</span><span class="n">scores_</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">numeric_df</span><span class="o">.</span><span class="n">columns</span><span class="p">))}</span> <span class="n">sorted_features</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span><span class="n">key</span><span class="o">=</span><span class="n">scores</span><span class="o">.</span><span class="n">get</span><span class="p">,</span> <span class="n">reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="k">for</span> <span class="n">feature</span> <span class="ow">in</span> <span class="n">sorted_features</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="s">&#39;Feature </span><span class="si">%s</span><span class="s"> has value </span><span class="si">%f</span><span class="s">&#39;</span><span class="o">%</span><span class="p">(</span><span class="n">feature</span><span class="p">,</span><span class="n">scores</span><span class="p">[</span><span class="n">feature</span><span class="p">]))</span></code></pre></div> <pre><code>Feature exercised_stock_options has value 30.528310 Feature total_stock_value has value 22.901164 Feature stock_sum has value 16.090150 Feature salary has value 14.428640 Feature poi_email_ratio_to has value 13.619580 Feature bonus has value 11.771121 Feature money_total has value 11.005135 Feature deferred_income has value 9.058555 Feature total_payments has value 8.334006 Feature restricted_stock has value 7.335986 Feature long_term_incentive has value 6.448285 Feature shared_receipt_with_poi has value 6.340473 Feature other has value 4.067974 Feature money_ratio has value 3.781568 Feature from_poi_to_this_person has value 3.626045 Feature email_ratio has value 2.176411 Feature from_this_person_to_poi has value 1.318493 Feature poi_email_ratio_from has value 1.279491 Feature from_messages has value 0.613342 Feature expenses has value 0.543049 Feature to_messages has value 0.400295 Feature deferral_payments has value 0.223368 Feature stock_ratio has value 0.013109 </code></pre> <p>It appeared that several of my features are among the most useful, as ‘poi_email_ratio_to’, ‘stock_sum’, and ‘money_total’ are all ranked highly. But, since the data is so small I had no need to get rid of any of the features and went ahead with testing several classifiers with several sets of features.</p> <h1 id="training-and-evaluating-models">Training and Evaluating Models</h1> <p>Proceding with the project, I selected three algorithms to test and compare: Naive Bayes, Decision Trees, and Support Vector Machines. Naive Bayes is a good baseline for any ML task, and the other two fit well into the task of binary classification with many features and can both be automatically tuned using sklearn classes. A word on SkLearn: it is simply a very well designed Machine Learning toolkit, with great compatibility with Numpy (and therefore also Pandas) and an elegant and smart API structure that makes trying out different models and evaluating features and just about anything one might want short of Deep Learning easy.</p> <p>I think the code that follows will attest to that. I tested those three algorithms with a variable number of features, from one to all of them ordered by the SelectKBest scoring. Because the data is so small, I could afford an extensive validation scheme and did multiple random splits of the data into training and testing to get an average that best indicated the strength of each algorithm. I also went ahead and evaluated precision and recall besides accuracy, since those were to be the metric of performance. And all it took to do all that is maybe 50 lines of code:</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">sklearn.naive_bayes</span> <span class="kn">import</span> <span class="n">GaussianNB</span> <span class="kn">from</span> <span class="nn">sklearn.svm</span> <span class="kn">import</span> <span class="n">SVC</span> <span class="kn">from</span> <span class="nn">sklearn.grid_search</span> <span class="kn">import</span> <span class="n">RandomizedSearchCV</span><span class="p">,</span> <span class="n">GridSearchCV</span> <span class="kn">from</span> <span class="nn">sklearn.tree</span> <span class="kn">import</span> <span class="n">DecisionTreeClassifier</span> <span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">precision_score</span><span class="p">,</span> <span class="n">recall_score</span><span class="p">,</span> <span class="n">accuracy_score</span> <span class="kn">from</span> <span class="nn">sklearn.cross_validation</span> <span class="kn">import</span> <span class="n">StratifiedShuffleSplit</span> <span class="kn">import</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="nn">warnings</span> <span class="n">warnings</span><span class="o">.</span><span class="n">filterwarnings</span><span class="p">(</span><span class="s">&#39;ignore&#39;</span><span class="p">)</span> <span class="n">gnb_clf</span> <span class="o">=</span> <span class="n">GridSearchCV</span><span class="p">(</span><span class="n">GaussianNB</span><span class="p">(),{})</span> <span class="c">#No params to tune for for linear bayes, use for convenience</span> <span class="n">svc_clf</span> <span class="o">=</span> <span class="n">SVC</span><span class="p">()</span> <span class="n">svc_search_params</span> <span class="o">=</span> <span class="p">{</span><span class="s">&#39;C&#39;</span><span class="p">:</span> <span class="n">scipy</span><span class="o">.</span><span class="n">stats</span><span class="o">.</span><span class="n">expon</span><span class="p">(</span><span class="n">scale</span><span class="o">=</span><span class="mi">1</span><span class="p">),</span> <span class="s">&#39;gamma&#39;</span><span class="p">:</span> <span class="n">scipy</span><span class="o">.</span><span class="n">stats</span><span class="o">.</span><span class="n">expon</span><span class="p">(</span><span class="n">scale</span><span class="o">=.</span><span class="mi">1</span><span class="p">),</span> <span class="s">&#39;kernel&#39;</span><span class="p">:</span> <span class="p">[</span><span class="s">&#39;linear&#39;</span><span class="p">,</span><span class="s">&#39;poly&#39;</span><span class="p">,</span><span class="s">&#39;rbf&#39;</span><span class="p">],</span> <span class="s">&#39;class_weight&#39;</span><span class="p">:[</span><span class="s">&#39;balanced&#39;</span><span class="p">,</span><span class="bp">None</span><span class="p">]}</span> <span class="n">svc_search</span> <span class="o">=</span> <span class="n">RandomizedSearchCV</span><span class="p">(</span><span class="n">svc_clf</span><span class="p">,</span> <span class="n">param_distributions</span><span class="o">=</span><span class="n">svc_search_params</span><span class="p">,</span> <span class="n">n_iter</span><span class="o">=</span><span class="mi">25</span><span class="p">)</span> <span class="n">tree_clf</span> <span class="o">=</span> <span class="n">DecisionTreeClassifier</span><span class="p">()</span> <span class="n">tree_search_params</span> <span class="o">=</span> <span class="p">{</span><span class="s">&#39;criterion&#39;</span><span class="p">:[</span><span class="s">&#39;gini&#39;</span><span class="p">,</span><span class="s">&#39;entropy&#39;</span><span class="p">],</span> <span class="s">&#39;max_leaf_nodes&#39;</span><span class="p">:[</span><span class="bp">None</span><span class="p">,</span><span class="mi">25</span><span class="p">,</span><span class="mi">50</span><span class="p">,</span><span class="mi">100</span><span class="p">,</span><span class="mi">1000</span><span class="p">],</span> <span class="s">&#39;min_samples_split&#39;</span><span class="p">:[</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">],</span> <span class="s">&#39;max_features&#39;</span><span class="p">:[</span><span class="mf">0.25</span><span class="p">,</span><span class="mf">0.5</span><span class="p">,</span><span class="mf">0.75</span><span class="p">,</span><span class="mf">1.0</span><span class="p">]}</span> <span class="n">tree_search</span> <span class="o">=</span> <span class="n">GridSearchCV</span><span class="p">(</span><span class="n">tree_clf</span><span class="p">,</span> <span class="n">tree_search_params</span><span class="p">,</span> <span class="n">scoring</span><span class="o">=</span><span class="s">&#39;recall&#39;</span><span class="p">)</span> <span class="n">search_methods</span> <span class="o">=</span> <span class="p">[</span><span class="n">gnb_clf</span><span class="p">,</span><span class="n">svc_search</span><span class="p">,</span><span class="n">tree_search</span><span class="p">]</span> <span class="n">average_accuracies</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span><span class="p">],[</span><span class="mi">0</span><span class="p">],[</span><span class="mi">0</span><span class="p">]]</span> <span class="n">average_precision</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span><span class="p">],[</span><span class="mi">0</span><span class="p">],[</span><span class="mi">0</span><span class="p">]]</span> <span class="n">average_recall</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span><span class="p">],[</span><span class="mi">0</span><span class="p">],[</span><span class="mi">0</span><span class="p">]]</span> <span class="n">num_splits</span> <span class="o">=</span> <span class="mi">10</span> <span class="n">train_split</span> <span class="o">=</span> <span class="mf">0.9</span> <span class="n">indices</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">StratifiedShuffleSplit</span><span class="p">(</span><span class="n">poi</span><span class="o">.</span><span class="n">tolist</span><span class="p">(),</span> <span class="n">num_splits</span><span class="p">,</span> <span class="n">test_size</span><span class="o">=</span><span class="mi">1</span><span class="o">-</span><span class="n">train_split</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span> <span class="n">best_features</span> <span class="o">=</span> <span class="bp">None</span> <span class="n">max_score</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">best_classifier</span> <span class="o">=</span> <span class="bp">None</span> <span class="n">num_features</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">num_features</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">sorted_features</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span> <span class="n">features</span> <span class="o">=</span> <span class="n">sorted_features</span><span class="p">[:</span><span class="n">num_features</span><span class="p">]</span> <span class="n">feature_df</span> <span class="o">=</span> <span class="n">numeric_df</span><span class="p">[</span><span class="n">features</span><span class="p">]</span> <span class="k">for</span> <span class="n">classifier_idx</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span> <span class="n">sum_values</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">]</span> <span class="c">#Only do parameter search once, too wasteful to do a ton</span> <span class="n">search_methods</span><span class="p">[</span><span class="n">classifier_idx</span><span class="p">]</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">feature_df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">indices</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">],:],</span> <span class="n">poi</span><span class="p">[</span><span class="n">indices</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">]]</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span> <span class="n">classifier</span> <span class="o">=</span> <span class="n">search_methods</span><span class="p">[</span><span class="n">classifier_idx</span><span class="p">]</span><span class="o">.</span><span class="n">best_estimator_</span> <span class="k">for</span> <span class="n">split_idx</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_splits</span><span class="p">):</span> <span class="n">train_indices</span><span class="p">,</span> <span class="n">test_indices</span> <span class="o">=</span> <span class="n">indices</span><span class="p">[</span><span class="n">split_idx</span><span class="p">]</span> <span class="n">train_data</span> <span class="o">=</span> <span class="p">(</span><span class="n">feature_df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">train_indices</span><span class="p">,:],</span><span class="n">poi</span><span class="p">[</span><span class="n">train_indices</span><span class="p">]</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span> <span class="n">test_data</span> <span class="o">=</span> <span class="p">(</span><span class="n">feature_df</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="n">test_indices</span><span class="p">,:],</span><span class="n">poi</span><span class="p">[</span><span class="n">test_indices</span><span class="p">]</span><span class="o">.</span><span class="n">tolist</span><span class="p">())</span> <span class="n">classifier</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_data</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="n">train_data</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="n">predicted</span> <span class="o">=</span> <span class="n">classifier</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">test_data</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="n">sum_values</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">+=</span><span class="n">accuracy_score</span><span class="p">(</span><span class="n">predicted</span><span class="p">,</span><span class="n">test_data</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="n">sum_values</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">+=</span><span class="n">precision_score</span><span class="p">(</span><span class="n">predicted</span><span class="p">,</span><span class="n">test_data</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="n">sum_values</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="o">+=</span><span class="n">recall_score</span><span class="p">(</span><span class="n">predicted</span><span class="p">,</span><span class="n">test_data</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="n">avg_acc</span><span class="p">,</span><span class="n">avg_prs</span><span class="p">,</span><span class="n">avg_recall</span> <span class="o">=</span> <span class="p">[</span><span class="n">val</span><span class="o">/</span><span class="n">num_splits</span> <span class="k">for</span> <span class="n">val</span> <span class="ow">in</span> <span class="n">sum_values</span><span class="p">]</span> <span class="n">average_accuracies</span><span class="p">[</span><span class="n">classifier_idx</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">avg_acc</span><span class="p">)</span> <span class="n">average_precision</span><span class="p">[</span><span class="n">classifier_idx</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">avg_prs</span><span class="p">)</span> <span class="n">average_recall</span><span class="p">[</span><span class="n">classifier_idx</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">avg_recall</span><span class="p">)</span> <span class="n">score</span> <span class="o">=</span> <span class="p">(</span><span class="n">avg_prs</span><span class="o">+</span><span class="n">avg_recall</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span> <span class="k">if</span> <span class="n">score</span><span class="o">&gt;</span><span class="n">max_score</span> <span class="ow">and</span> <span class="n">avg_prs</span><span class="o">&gt;</span><span class="mf">0.3</span> <span class="ow">and</span> <span class="n">avg_recall</span><span class="o">&gt;</span><span class="mf">0.3</span><span class="p">:</span> <span class="n">max_score</span> <span class="o">=</span> <span class="n">score</span> <span class="n">best_features</span> <span class="o">=</span> <span class="n">features</span> <span class="n">best_classifier</span> <span class="o">=</span> <span class="n">search_methods</span><span class="p">[</span><span class="n">classifier_idx</span><span class="p">]</span><span class="o">.</span><span class="n">best_estimator_</span> <span class="k">print</span><span class="p">(</span><span class="s">&#39;Best classifier found is </span><span class="si">%s</span><span class="s"> </span><span class="se">\n\</span> <span class="s"> with score (recall+precision)/2 of </span><span class="si">%f</span><span class="se">\n\</span> <span class="s"> and feature set </span><span class="si">%s</span><span class="s">&#39;</span><span class="o">%</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">best_classifier</span><span class="p">),</span><span class="n">max_score</span><span class="p">,</span><span class="n">best_features</span><span class="p">))</span></code></pre></div> <pre><code>Best classifier found is DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None, max_features=0.25, max_leaf_nodes=25, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best') with score (recall+precision)/2 of 0.370000 and feature set ['exercised_stock_options', 'total_stock_value', 'stock_sum', 'salary', 'poi_email_ratio_to', 'bonus'] </code></pre> <p>Then, I could go right back to Pandas to plot the results. Sure, I could do this with matplotlib just as well, but the flexibility and simplicity of the ‘plot’ function call on a DataFrame is more elegant in my opinion.</p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">results</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_dict</span><span class="p">({</span><span class="s">&#39;Naive Bayes&#39;</span><span class="p">:</span> <span class="n">average_accuracies</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s">&#39;SVC&#39;</span><span class="p">:</span><span class="n">average_accuracies</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;Decision Tree&#39;</span><span class="p">:</span><span class="n">average_accuracies</span><span class="p">[</span><span class="mi">2</span><span class="p">]})</span> <span class="n">results</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xlim</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">sorted_features</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span><span class="n">ylim</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="n">plt</span><span class="o">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">&quot;Classifier accuracy by # of features&quot;</span><span class="p">)</span></code></pre></div> <p><img src="" /></p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">results</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_dict</span><span class="p">({</span><span class="s">&#39;Naive Bayes&#39;</span><span class="p">:</span> <span class="n">average_precision</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s">&#39;SVC&#39;</span><span class="p">:</span><span class="n">average_precision</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;Decision Tree&#39;</span><span class="p">:</span><span class="n">average_precision</span><span class="p">[</span><span class="mi">2</span><span class="p">]})</span> <span class="n">results</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xlim</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">sorted_features</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span><span class="n">ylim</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="n">plt</span><span class="o">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">&quot;Classifier precision by # of features&quot;</span><span class="p">)</span></code></pre></div> <p><img src="" /></p> <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">results</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="o">.</span><span class="n">from_dict</span><span class="p">({</span><span class="s">&#39;Naive Bayes&#39;</span><span class="p">:</span> <span class="n">average_recall</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s">&#39;SVC&#39;</span><span class="p">:</span><span class="n">average_recall</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="s">&#39;Decision Tree&#39;</span><span class="p">:</span><span class="n">average_recall</span><span class="p">[</span><span class="mi">2</span><span class="p">]})</span> <span class="n">results</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xlim</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">sorted_features</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span><span class="n">ylim</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="n">plt</span><span class="o">.</span><span class="n">suptitle</span><span class="p">(</span><span class="s">&quot;Classifier recall by # of features&quot;</span><span class="p">)</span></code></pre></div> <p><img src="" /></p> <p>As output by my code, the best algorithm was consistently found to be Decision Trees and so I could finally finish up the project by submitting that as my model.</p> <h2 id="conclusion">Conclusion</h2> <p>I did not much care for the project’s dataset and overall structure, but I still greatly enjoyed completing it because of how fun it was to combine Pandas data processing with Scikit-learn model training in the process, with IPython Notebook making that process even more fluid. While not at all a well written introduction or tutorial for these packages, I do hope that this write up about a single project I finished using them might inspire some readers to try out doing that as well.</p> <p><a href="/writing/power-of-ipython-pandas-scikilearn/">The Power of IPython Notebook + Pandas + and Scikit-learn</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on June 10, 2016.</p> <![CDATA[Verbosify]]> /projects/hacks/verbosify 2016-05-07T00:00:00-04:00 2016-05-07T00:00:00-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>We made a quick flask app, used NLTP to swap some words with their definitions, and accessed it using JQuery in a Chrome app. Nothing too tricky, but fun.</p> <p><a href="/projects/hacks/verbosify/">Verbosify</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on May 07, 2016.</p> <![CDATA[The Distinct Artistry of Trailers]]> /writing/the-distinct-artistry-of-trailers 2016-05-05T19:19:34-07:00 2016-05-05T19:19:34-07:00 www.andreykurenkov.com contact@andreykurenkov.com <figure> <iframe width="560" height="315" src="https://www.youtube.com/embed/d-S9nKByu5w" frameborder="0" allowfullscreen=""></iframe> <figcaption>The ultimate example of the conventional Hollywood trailer, which the trailers below will stand in contrast to. </figcaption> </figure> <p>Some time ago, I wrote <a href="http://www.andreykurenkov.com/writing/in-my-head-music-videos-i-dont-forget/">a post</a> that was little more than a list of music videos I have long remembered for being great. What compelled me to write such a thing, rather than the sort of essay or technical post I usually peddle, is that there really was a discrete number of music videos that stuck with me and seemed to represent the best of the medium. And so it is with another medium not so dissimilar from the music video - the trailer.</p> <p>The similarities are obvious - like music videos, trailers tend to be short, are free to not represent reality but bound to reflect another piece of art, and tend to make a strong aesthetic impression with imagery and sound. But, then again trailers also have a practical purpose music videos do not have - to inform you about the nature of some movie and make you want to watch it. As a result of this a great majority of trailers take up a fairly predictable form, like the one above, and work well as an advertisement for the movie but not as its own piece of art apart from that movie.</p> <p>Still, I have come across many trailers in my life that did not just make me want to watch the movie, but also excited, entertained, and impressed me wholly on their own. This is not so surprising, since (as stated in <a href="http://filmmakermagazine.com/37093-first-impressions/#.VypVA14oBC0">“The Art of First Impressions: How to Cut a Movie Trailer”</a>) a trailer is really its own film - it has rhythm, appeal, and impact that are utterly distinct from the movie it is derived from. So here I am, posting another list of things I like, and little more. But here’s a twist - thinking back on the trailers that made the greatest lasting impression on me, I noticed they tended to fit into one of a few styles that that differed from the above ‘conventional’ trailer. And so, read on for those greatest trailers stuck in my head excitingly grouped by the styles it occured to me they exemplify.</p> <h2 id="the-music-over-montage-trailer">The Music Over Montage Trailer</h2> <figure> <iframe width="560" height="315" src="https://www.youtube.com/embed/JFbo9uj-TrY" frameborder="0" allowfullscreen=""></iframe> <figcaption>The Double Teaser Trailer. This song is not in the final movie.</figcaption> </figure> <p>This is my favorite style of trailer, perhaps because it tends to least resemble advertising and most function as its own work of art. The cutting together of footage without any continuity, and the muting of dialogue in favor of a single musical track, together almost completely dissasociate the images from the movie they are from. Instead, the audio track and spliced footage make for a new composition that can be quite different from the movie while hinting at it’s overall mood and spirit.</p> <p>I think this is particularly true of the above trailer; The Double is a slow dark comedy about heady themes concerning our need to be acknowledged, but you could hardly guess that from the trailer alone. The normal exposition about themes and characters is totally absent, and about the only thing that is conveyed is the creepy surreal aspect of the doppleganger and the fantastic visual style. It is not telling us what the movie <em>will</em> be about, who these characters <em>will</em> be, but just <em>is</em> its own little film representing the style and spirit of the whole movie. Of course, this is also why this style is reserved for teasers and not ‘real’ trailers that actually need to explain what the movie is about. Still, I think this style deserves recognition since in my humble opinion it is basis for some of the best movie trailers of all time:</p> <figure> <iframe width="560" height="315" src="https://www.youtube.com/embed/WVLvMg62RPA" frameborder="0" allowfullscreen=""></iframe> <figcaption>The Girl With The Dragon Tattoo Trailer #1. Again, we at most see Fincher's visual style, that this will be some sort of gritty thriller-sort of thing.</figcaption> </figure> <figure> <iframe width="560" height="315" src="https://www.youtube.com/embed/SPRzm8ibDQ8" frameborder="0" allowfullscreen=""></iframe> <figcaption>The Clockwork Orange trailer, an especially extreme example of this style. Kubrick, ever the auteur, was among the first to do these sorts of crazy things. </figcaption> </figure> <figure> <iframe width="560" height="315" src="https://www.youtube.com/embed/jQ5lPt9edzQ" frameborder="0" allowfullscreen=""></iframe> <figcaption>The trailer for Alien. This is even more its own thing than the others, as it includes footage not at all in the movie. </figcaption> </figure> <h2 id="the-single-track-trailer">The Single Track Trailer</h2> <iframe width="560" height="315" src="https://www.youtube.com/embed/7iggyFPls4w" frameborder="0" allowfullscreen=""></iframe> <p>A similar but distinct style, in which a single track of dialogue or beat is repeated and intensified over a montage of unrelated footage. It is not quite as divorced from the movie as the previous style, since it is typically the main character monologuing and there may be some cuts to dialogue, but is still far less expository than the likes of the Independnece Day trailer. And, crucially, these trailers can also include an element entirely not in the movie. In the above trailer, the continuous blackboard beat and repeated scenes are elements unique to the trailer. And then there is Inception’s <a href="https://www.youtube.com/watch?v=830I9w7I7wM">now ubiquitous BRAAAM</a>, which was <a href="http://blogs.indiewire.com/theplaylist/who-really-created-the-inception-braaam-composer-mike-zarin-sets-the-record-straight-20131113">in fact conceived</a> for the teaser trailer rather than for the movie itself.</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/Z564VzbQ9Hc" frameborder="0" allowfullscreen=""></iframe> <h2 id="the-condensed-movie-trailer">The Condensed Movie Trailer</h2> <figure> <iframe width="560" height="315" src="https://www.youtube.com/embed/XG8qATRtNuU" frameborder="0" allowfullscreen=""></iframe> <figcaption>The first full trailer for The Double, notably different from the teaser. </figcaption> </figure> <p>Sometimes, just taking all the best elements and plot beats of a movie and succintly showing them off works wonders. In Hopes&amp;Fears’ <a href="http://www.hopesandfears.com/hopes/culture/film/214473-epic-history-movie-trailers-mad-max-independence-day">“An epic history of the movie trailer”</a> this is referred to as the mini-movie, and said to come about in the last few decades:</p> <blockquote> <p>“By the end of the 1980s, movies were making more money than ever. Year after year total grosses and movie budgets were higher. As such, studios took fewer risks on trailers. They would make multiple trailers, premiere them at different times, test them on different markets, and find just the right way to sell their product.</p> </blockquote> <blockquote> <p>More importantly, they started honing in on an abridged version of the movie, advertising with a short version of the film’s three acts: setup, confrontation, and climax—essentially everything but the resolution.”. This may sound like pretty much every trailer out there nowdays, but it takes skill to do well, and is doomed to fail if the movie has nothing interesting going on to begin with. These trailers, at least the ones I consider great, do more than just introduce the characters and plot of their movie - they simultaneously weave together the strongest elements of the movie, leave out just enough to keep the viewer interested, and have a crazy amount of editing flourish that is entertatining even apart from the footage it is executed with. “</p> </blockquote> <p>The above trailer for The Double is wholly different from the teaser in that it introduces the characters and plot at length. But it does more than that - it combines multiple songs from the movie’s memorable score, quickly executes multiple character arcs, and builds to an incredibely energetic conclusion that combines moments from multiple distinct scenes in the source. The following trailers, too, combine exposition with a melding of multiple scenes that is not found in the movie but represents its unique strengths:</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/JJkPLYmUyzg" frameborder="0" allowfullscreen=""></iframe> <iframe width="560" height="315" src="https://www.youtube.com/embed/0nU7dC9bIDg" frameborder="0" allowfullscreen=""></iframe> <iframe width="560" height="315" src="https://www.youtube.com/embed/FJuaAWrgoUY" frameborder="0" allowfullscreen=""></iframe> <iframe width="560" height="315" src="https://www.youtube.com/embed/hEJnMQG9ev8" frameborder="0" allowfullscreen=""></iframe> <h2 id="the-cinematic-video-game-trailer">The Cinematic Video Game Trailer</h2> <iframe width="560" height="315" src="https://www.youtube.com/embed/Kq5KWLqUewc" frameborder="0" allowfullscreen=""></iframe> <p>It should be clear that the above three styles represent increasing levels of exposition about and representation of their source materials. But, trailers don’t only exist for movies. They exist for video games as well, and for video games there is a unique style of trailer that is literally a wholly distinct work of art - the cinematic trailer. Usually these short films are not much more than eye candy, but in some cases - as in these cases - they are quite good short films in their own right. I have seen the above Deus Ex trailer many times now, purely because I think it is marvelously conceived and executed.</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/xt_65k-gv1U" frameborder="0" allowfullscreen=""></iframe> <p>Advertisements are often seen as something to instinctively avoid these days, as something tasteless that intrudes into our lives only to cause annoyane. But, it should be acknowledged that ads can also be this good looking, this well produced, this distinctly artistic. I can only hope to see more like them.</p> <p><a href="/writing/the-distinct-artistry-of-trailers/">The Distinct Artistry of Trailers</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on May 05, 2016.</p> <![CDATA[Planr]]> /projects/major_projects/planr 2016-05-05T05:26:22-04:00 2016-05-05T05:26:22-04:00 www.andreykurenkov.com contact@andreykurenkov.com <p>An app and a server, no big deal right. We used Flask for the server (some basic REST stuff), and a whole lot of Android libraries to make the app. Sadly we did not win (probably because we did not quite finish), but we were close!</p> <p><a href="/projects/major_projects/planr/">Planr</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on May 05, 2016.</p> <![CDATA[A 'Brief' History of Game AI Up To AlphaGo, Part 3]]> /writing/a-brief-history-of-game-ai-part-3 2016-04-18T19:19:34-07:00 2016-04-18T19:19:34-07:00 www.andreykurenkov.com contact@andreykurenkov.com <p>This is the third and final part of ‘A Brief History of Game AI Up to AlphaGo’. Part 1 is <a href="/writing/a-brief-history-of-game-ai">here</a> and part 2 is <a href="/writing/a-brief-history-of-game-ai-part-2">here</a>. In this part, we shall cover the intellectual innovations necessary to finally achieve strong Go programs, and the final steps beyond those that get us all the way to the present and to AlphaGo.</p> <h1 id="and-then-there-was-go">And Then There Was Go</h1> <p>Although Go playing programs by the late 90s were still far from impressive, this was not due to a lack of people trying. Following Bruce Wilcox’s 70s work on Go programs (mentioned in <a href="http://www.andreykurenkov.com/writing/a-brief-history-of-game-ai-up-to-alphago-2">part 2</a>), numerous people continued to dedicate their time and CPU cycles to implementing better Go programs throughout the 80s and 90s. One of the best programs of the 90s, The Many Faces of Go, achieved 13-kyu (good non-professional) performance. It took 30 thousand lines of code written over a decade by its developer, David Fotland, to implement the many Go-specific components it used on top of traditional alpha-beta search<sup id="fnref:ManyFaces"><a href="#fn:ManyFaces" class="footnote">1</a></sup>. The combination of a larger branching factor, longer games, and trickier-to-evaluate board positions rendered Go resistant to the techniques that by this time were achieving master-level Chess play. The so-called “knowledge-based” approach of Many Faces of Go (encoding many human-designed strategies into the program) worked, but by the 2000s hit a point of diminishing returns. Traditional techniques were simply not enough.</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/16-go90s.png" alt="Go90s" /> <figcaption>Go AIs in the 90s were at the level of good non-professionals; the 'master' ranks are actually obtained by relatively few people who typically play in tournaments, and the 'professional' rank is reserved for incredibly good players. <a href="http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Applications_files/grand-challenge.pdf"><b>(Source)</b></a></figcaption> </figure> <p>Fortunately, another family of techniques did exist — <a href="https://simple.wikipedia.org/wiki/Monte_Carlo_algorithm">Monte Carlo algorithms</a>. The general idea of Monte Carlo algorithms is to simulate a process, usually with some randomness thrown in, in order to statistically get a good approximation of some value that is very hard to calculate directly. For instance, <a href="https://en.wikipedia.org/wiki/Simulated_annealing">simulated annealing</a> is a classic optimization technique for finding the optimal parameters for some function. It boils down to starting with some random values and semi-randomly changing them to gradually get more optimal values, as shown in the graphic below. If this is done for long enough, it turns out a solution close to the optimal one can most likely be found with much less computation than the actually optimal one would have taken to find.</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/17-simulated_annealing.gif" alt="SA" /> <figcaption> An example of simulated annealing. Just 'hill climbing' in one direction from any starting point does not work to find the optimal value since locally many places look like a peak, so at first big random jumps in location are allowed. Over time, less randomness is allowed as the algorithm converges on the globally highest peak. <a href="https://en.wikipedia.org/wiki/File:Hill_Climbing_with_Simulated_Annealing.gif"><b>(Source)</b></a></figcaption> </figure> <p class="sidenoteleftlarge">1993</p> <p>Monte Carlo techniques had been applied to games of chance such as Blackjack since the 70s, but they were not suggested for Go until 1993 with Bernd Brügmann’s <a href="http://www.ideanest.com/vegos/MonteCarloGo.pdf">“Monte Carlo Go”</a> <sup id="fnref:MonteCarloGo"><a href="#fn:MonteCarloGo" class="footnote">2</a></sup>. The idea he presented was suspiciously simple: Instead of implementing a complex evaluation function and executing typical tree search, have a program just simulate playing many random games (using a version of simulated annealing) and pick the move that leads to the best outcome on average. Remember, the whole reason for evaluation functions is to search only a few moves ahead and not until the end of the game, since the number of possible ways to get to the end is far too immense to thoroughly explore. But, this does not work well for Go since evaluation functions are harder to write for it and typically take much longer to compute than for Chess. So, it turns out that a great alternative is to use that computation time to randomly play out only a subset of possible games (typically described as doing many <strong>rollouts</strong>), and then evaluate moves based on just counting victories and losses. Despite being radically simple (having little more than the rules of Go), Brügmann’s implementation of the idea showed decent beginner-level play not too far from the vastly more complex Many Faces of Go.</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/18-hugetrees.png" /> <figcaption>Monte Carlo approach compared to classical tree search. <a href="http://www.remi-coulom.fr/JFFoS/JFFoS.pdf"><b>(Source)</b></a></figcaption> </figure> <p>Although it was a novel approach that showed promise for overcoming the problems that stumped traditional Go AI, Brügmann’s work was initially not taken seriously. It was not until the 2000s that a group in Paris made up of Bruno Bouzy, Tristan Cazenave, and Bernard Helmstetter started seriously exploring the potential of Monte Carlo techniques for Go through multiple papers and programs<sup id="fnref:ParisSchool1"><a href="#fn:ParisSchool1" class="footnote">3</a></sup><sup id="fnref:ParisSchool2"><a href="#fn:ParisSchool2" class="footnote">4</a></sup><sup id="fnref:ParisSchool3"><a href="#fn:ParisSchool3" class="footnote">5</a></sup><sup id="fnref:MonteCarloRevolution"><a href="#fn:MonteCarloRevolution" class="footnote">6</a></sup>. Though they simplified and expanded upon the Brügmann approach (stripping out simulated annealing and adding some heuristics and pruning), their programs were still inferior to the strongest standard tree search ones such as GNU Go. However this lasted only a few years, until several milestone achievements in 2006:</p> <ul> <li>Rémi Coulom further improved on the use of Monte Carlo evaluation with tree search, and coined the term <strong>Monte-Carlo Tree Search</strong> (MCTS)<sup id="fnref:MCTS"><a href="#fn:MCTS" class="footnote">7</a></sup>. His program CrazyStone won that year’s KGS computer-Go tournament for the small 9x9 variant of Go, beating other programs such as NeuroGo and GNU Go and thus proving the potential of MCTS.</li> <li>Levente Kocsis and Csaba Szepesvári developed the <strong>UCT</strong> (Upper Confidence Bounds for Trees) algorithm <sup id="fnref:BanditMonte"><a href="#fn:BanditMonte" class="footnote">8</a></sup>. This algorithm solves the choice between exploitation (simulating already good-looking moves to get a better estimate of how good they are) and exploration (trying new or bad-looking moves to try to get something better) in MCTS. The same tradeoff has long been studied in Computer Science in the form of the <a href="https://en.wikipedia.org/wiki/Multi-armed_bandit">multi-arm bandit problem</a>, and UCT was a modification of a theoretically-sound Upper Confidence Bounds formula developed for that problem a few years prior.</li> <li>Lastly, Sylvain Gelly et al. combined MCTS with UCT as well as the older ideas of local pattern matching and tree pruning<sup id="fnref:MoGo"><a href="#fn:MoGo" class="footnote">9</a></sup>. Their program, MoGo, quickly surpassed CrazyStone and became the best computer Go AI.</li> </ul> <p class="sidenoteleftlarge">2006</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/19-monte_carlo.png" /> <figcaption>A nice visualization of (basic) Monte Carlo Tree Search. <a href="https://commons.wikimedia.org/w/index.php?curid=25382061"><b>(Source)</b></a>, By <a href="//commons.wikimedia.org/w/index.php?title=User:Mciura&amp;action=edit&amp;redlink=1" class="new">Mciura</a> - <span class="int-own-work" lang="en">Own work</span>, <a title="Creative Commons Attribution-Share Alike 3.0" href="http://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a></figcaption> </figure> <p>The rapid success of CrazyStone and MoGo was impressive enough, but what happened next was truly groundbreaking. In 2008, MoGo beat professional 8 dan player Kim Myungwan in 19x19 Go. Granted, Myungwan was playing with a large handicap and MoGo was running on an 800-node supercomputer, but the feat was nevertheless historic, given that previous Go programs could at best beat highly-ranked amateurs with large handicaps. In the same year, CrazyStone defeated professional Japanese 4 dan player Kaori Aoba with a smaller handicap while running on a normal PC<sup id="fnref:GoHistory"><a href="#fn:GoHistory" class="footnote">10</a></sup>. And so on it went, with faster computers and various tweaks to MCTS making Go programs rapidly become much, much better.</p> <p class="sidenoteleftlarge">2012</p> <p>By 2012, Go programs got good enough for an amateur player to write a post titled <a href="http://blog.printf.net/articles/2012/02/23/computers-are-very-good-at-the-game-of-go/">“Computers are very good at the game of Go”</a>. The post highlighted the fact that the standout program at the time, Zen19, had improved from a 1 dan ranking (entry-level master) to 5 dan (higher-level master) in the span of 4 years. This is a big deal:</p> <blockquote> <p>“To put the 5-dan rank in perspective: amongst the players who played American Go Association rated games in 2011, there were only 105 players that are 6-dan and above. This suggests that there are only around 100 active tournament players in the US who are significantly stronger than Zen19. I’m sure I’ll never become that strong myself.”</p> </blockquote> <figure class="sidefigureright"> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/20-go_ratings.png" /> <figcaption>The pace of betterment for Go programs after the introduction of MCTS <a href="https://www.usgo.org/files/bh_library/Supercomputer%20Go.pdf"><b>(Source)</b></a></figcaption> </figure> <figure class="sidefigureleft"> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/21-crazystone.jpeg" /> <figcaption>Progress of Go AIs compared to Chess - despite being better than that vast majority of Go players, the best Go AIs were still nowhere near the best humans by this point. But they were getting better and better, very fast. From 2014's <a href="http://spectrum.ieee.org/robotics/artificial-intelligence/ais-have-mastered-chess-will-go-be-next"><b>"AIs Have Mastered Chess. Will Go Be Next?" by Jonathan Schaeffer, Martin Müller &amp; Akihiro Kishimoto</b></a></figcaption> </figure> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/22.5-go2012.png" /> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/22-progress.jpeg" /> <figcaption>An image from 2013 of Rémi Coulom (left) placing stones for Crazy Stone, against professional player Ishida Yoshio. Crazy Stone won. <b><a src="https://gogameguru.com/crazy-stone-computer-go-ishida-yoshio-4-stones/">(Source)</a></b></figcaption> </figure> <p>Still, Go programs were only beating lower-level professionals, and only with significant handicaps. A revolutionary idea — Monte Carlo Tree Search — enabled the ability to brute-force good Go play, and solved the ‘type A’ (smart use of brute-force) strategy part of the problem. It accomplished roughly the same feat as alpha-beta search for Chess programs, but worked better for Go because it replaced the human-implemented evaluation functions component with many random rollouts of the game that were easy to do fast and parallelize to multiple processing cores. But in order to go up against the best humans at, ‘type B’ intelligence (emulation of human-like learned instincts) would also be needed. That would soon be developed, but it would require a second and wholly different revolution - deep learning.</p> <h1 id="go-ais-ascend-to-divinity-with-deep-learning">Go AIs Ascend to Divinity with Deep Learning</h1> <p class="sidenoteleftlarge">1994</p> <p>To understand the revolution of deep learning, we must revisit the 90s. We previously covered the success of Neurogammon, a backgammon program powered by neural nets and supervised learning, as well as TD-Gammon, its successor that was also powered by neural nets but based on reinforcement learning. These programs demonstrated that machine learning was a viable alternative to the knowledge-based approach of hand-coding complex strategies. Indeed, there was an attempt to to apply the approach behind TD-Gammon to Go with 1994’s <a href="http://www.gatsby.ucl.ac.uk/~dayan/papers/sds94.pdf">“Temporal difference learning of position evaluation in the game of Go”</a><sup id="fnref:TDGo"><a href="#fn:TDGo" class="footnote">11</a></sup> by Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski.</p> <p>Schraudolph et al. noted that efforts to make a Go AI with supervised learning (like Neurogammon) were hindered by the difficulty of generating enough examples of scored Go boards. So they suggested an approach similar to TD-Gammon — training a neural net to evaluate a given board position though playing itself and other programs. However, the team found that using a plain neural net as in TD-Gammon was inefficient, since it did not capture the fact that many patterns on Go boards can be rotated or moved around and still hold the same significance. So they used a <strong>convolutional neural net</strong> (CNN), a type of neural net constrained to perform the same computation for different parts of the input. Typically, the constraint is to apply the same processing to small patches all over the input, and then have some number of additional <strong>layers</strong> of looking for specific combinations of patterns, and then eventually use those combinations to compute the overall output. The CNN in Schraudolph et al’s work was simpler, but helped their result by exploiting rotational, reflectional, and color inversion symmetries in Go.</p> <figure> <img class="postimagesmall" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/23-CNN.jpg" alt="CNN" /> <figcaption>A nice visualization of how multi-layer CNNs work. They are typically much better than plain neural nets for computer vision tasks such as number recognition, since it helps to find small features such as circles first and then use those to compute the output. A Go board is somewhat like an image, in that it is a grid of 'pixels' (Go stones) that are not in themselves significant but the combination of which forms the 'image' (game position). <a href="http://image.slidesharecdn.com/bp2slides-090922011749-phpapp02/95/the-back-propagation-learning-algorithm-10-728.jpg?cb=1253582278">(Source)</a></figcaption> </figure> <p>Though this approach could potentially learn to intuitively ‘see’ the value of a Go position (as Zobrist sought to achieve in the 60s), neither the computer power nor the understanding of neural nets were sufficient to make this effective in 1994. In fact, a large wave of hype for neural nets in general died out in the mid 90s as limited computing power and missing algorithmic insights led to underwhelming results. So, the CNN-based program could only beat Many Faces of Go at a low level, and was therefore not even close to good human players.</p> <p class="sidenoteleftlarge">2006</p> <p>Neural nets were viewed unfavorably well into the 2000s, while other machine learning methods gained favor and Go programs failed to get much better. The history here is not exact, but a paper titled <a href="https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf">“A fast learning algorithm for deep belief nets”</a><sup id="fnref:DBN"><a href="#fn:DBN" class="footnote">12</a></sup> is often credited with rekindling interest in neural nets by suggesting an approach for successfully training “deep” neural nets with many layers of computing units. This was the start of what would become the huge phenomenon of deep learning (which is just a term for large neural nets with many layers). In a wonderful historic coincidence, this paper was published in 2006 — the same year as all those MCTS papers!</p> <p>Deep neural nets continued to gain attention in the following years, and by 2009 achieved record-setting results in speech recognition. Besides algorithmic improvements, their resurgence was in large part due to the availability of large amounts of training data and the use of the massively parallel computational capabilities of GPUs - both things that did not really exist in the 90s. Larger neural nets were trained with more data and more layers, more quickly than before thanks to modern hardware, and benchmark after broken benchmark showed this to be a powerful methodology. But the event that really set off a huge wave of research and investment in deep learning was the ILSVRC (ImageNet Large Scale Visual Recognition Challenge) 2012 computer vision competition. A CNN-based submission performed far, far better than the next-best entry, surpassing the record on that competition’s ImageNet benchmark problem. This, the first and only CNN entry in that competition, was an undisputed sign that deep learning was a big deal. Now, almost all entries to the competition use CNNs.</p> <p class="sidenoteleftlarge">2012</p> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/24-cnn2.png" alt="CNN2" /> <figcaption>Another good visualization of a basic deep CNN. The Imagenet benchmark is precisely this, a set of images with thousands of categories of objects. <a href="https://www.clarifai.com/technology">(Source)</a></figcaption> </figure> <p>Up to this point, deep learning was largely applied to supervised learning tasks unrelated to game-playing, such as outputting the right category for a given image or recognizing human speech. But in 2013, a company called DeepMind made a big splash with the publication of <a href="http://arxiv.org/abs/1312.5602">“Playing Atari with Deep Reinforcement Learning”</a><sup id="fnref:Atari"><a href="#fn:Atari" class="footnote">13</a></sup>. Yep, they trained a neural net to play Atari games. More specifically, they presented a new approach to doing reinforcement learning with deep neural nets, which was largely abandoned as a research direction since the failure of TD-Gammon’s approach to work for other games. With just the input of the pixels you or I would see on screen, and the game’s score, their Deep Q-Networks (named after Q-learning, the basis for their algorithm) learned to play Breakout, Pong, and more. It bested all other reinforcement learning schemes, and in some cases humans as well!</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/25-breakout.gif" alt="breakout" /> <figcaption>DeepMind's Atari player playing Breakout. <a href="https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner">(Source)</a></figcaption> </figure> <p class="sidenoteleftlarge">2014</p> <p>This Atari work was a great innovation in AI research, so much so that in 2014 Google payed $400 million to acquire DeepMind. But I digress - as with IBM’s 2011 win in Jeopardy with Watson, this Atari feat is not directly relevant to the history of AI for chess-like board games, so let’s get back to that. Since the 90s, there were several more papers on learning to predict moves or to evaluate positions in Go with machine learning<sup id="fnref:Go2000sA"><a href="#fn:Go2000sA" class="footnote">14</a></sup><sup id="fnref:Go2000sB"><a href="#fn:Go2000sB" class="footnote">15</a></sup><sup id="fnref:Go2000sC"><a href="#fn:Go2000sC" class="footnote">16</a></sup>, but up to this point none really used large and deep neural nets. That changed in 2014, when two groups independently trained large CNNs to predict, with great accuracy, what move expert Go players would make in a given position.</p> <p>The first to publish were Christopher Clark and Amos Storkey at the University of Edinburgh, on December 10th of 2014 with <a href="https://arxiv.org/abs/1412.3409">“Teaching Deep Convolutional Neural Networks to Play Go”</a><sup id="fnref:Go2014A"><a href="#fn:Go2014A" class="footnote">17</a></sup>. Unlike the DeepMind Atari AI, here the researchers did purely supervised learning: Using two datasets of 16.5 million move-position pairs from human-played games, they trained a neural net to produce the probability of a human Go player making each possible move from a given position. Their deep CNN surpassed all prior results on move prediction for both datasets, and even defeated the conventionally-implemented GNU Go 85% of the time. Prior research on move prediction with weaker accuracy could never achieve better play than GNU Go, but this research showed that it could be done by just playing the moves deemed likely to be the choice of a skilled player by a neural net trained on lots of data!</p> <p>It’s important to understand that the neural net had no prior knowledge of Go before being trained with the data - it did not know the rules of the game and did not simulate playing Go for its move selection. So, you could say it was playing purely by ‘intuition’ for good moves, based on the data it had seen in training. Therefore it is hugely impressive that Clark et al’s neural net was able to beat a program specifically written to play Go well, but also not surprising it alone could not beat the stronger existing MCTS-based Go programs (which by that point played as well as very skilled people). The researchers concluded by suggesting the ‘intuition’ of machine-learned move prediction could be combined with brute-force MCTS move evaluation to make a much stronger Go AI than had ever existed:</p> <blockquote> <p>“Our networks are state of the art at move prediction, despite not using the previous moves as input, and can play with an impressive amount of skill even though future positions are not explicitly examined. … The most obvious next step is to integrate a DCNN into a full fledged Go playing system. For example, a DCNN could be run on a GPU in parallel with a MCTS Go program and be used to provide high quality priors for what the strongest moves to consider are. Such a system would both be the first to bring sophisticated pattern recognitions abilities to playing Go, and have a strong potential ability to surpass current computer Go programs.”</p> </blockquote> <p>Within a mere two weeks of the publication of Clark et al’s work, a second paper was released describing research with precisely the same ambitions, by Chris J. Maddison (University of Toronto), Ilya Sutskever (Google), and Aja Huang and David Silver (DeepMind, now also part of Google)<sup id="fnref:Go2014B"><a href="#fn:Go2014B" class="footnote">18</a></sup>. They trained a larger, deeper CNN and achieved even more impressive prediction results plus the ability to beat GNU Go a whopping 97% of the time. Still, when faced with MCTS programs operating at full brute force capacity — 100,000 rollouts per move — the deep neural net won only 10% of the time. As with the first paper, it was suggested that neural nets could be used together with MCTS, that the former could serve as ‘intuition’ to quickly identify potentially good moves, and the latter could serve as ‘reasoning’ to more accurately evaluate how good those moves really are. This team went further by implementing a prototype program that did just that, and showed it could beat their lone neural net 87% of the time. They concluded:</p> <blockquote> <p>“We have provided a preliminary proof-of-concept that MCTS and deep neural networks may be combined effectively. It appears that we now have two core elements that scale effectively with increased computational resource: scalable planning, using Monte-Carlo search; and scalable evaluation functions, using deep neural networks. In the future, as parallel computation units such as GPUs continue to increase in performance, we believe that this trajectory of research will lead to considerably stronger programs than are currently possible.”</p> </blockquote> <p class="sidenoteleftlarge">2016</p> <p>They did not just believe in that trajectory of research — they pursued it. And just shy of a year later, their belief was proven right. In January of 2016, a <a href="http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html">paper in the prestigious publication Nature</a> by these four authors and some sixteen more (all from DeepMind or Google) announced the development of AlphaGo, the first Go AI to beat a high ranked professional player without any handicaps<sup id="fnref:AlphaGo"><a href="#fn:AlphaGo" class="footnote">19</a></sup>. Specifically, they reported having beaten Fan Hui, a European Go champion, in 5 out of 5 no-handicap games.</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/26-fanhui.jpg" alt="CNN2" /> <figcaption>An image of a game between DeepMind's AlphaGo and Fan Hui. <b><a href="http://gardinerchess.com.au/gm-rogers-chess-ego-boost-from-alphago/">(Source)</a></b></figcaption> </figure> <p>How did they build a Go AI this good? With a small battalion of incredibly intelligent people working to achieve quite a list of innovative ideas:</p> <ol> <li>First, they created a much better neural net for predicting the best move for a given position. They did this by starting with supervised learning from a dataset of 30 million position-move pairs from games played by people (aided by some simple Go features), and then improving this neural net with reinforcement learning (making it play against older versions of itself to learn to get better, TD-Gammon style). This singular neural net — the <strong>policy network</strong> — could by itself beat the best MCTS Go programs 85% of the time, a huge leap from the 10% that was achieved before. But they did not stop there.</li> <li>A natural next step to get even better performance was to add MCTS into the mix, but the policy neural net is far too slow to be evaluated continually for each move of the tens of thousands of game rollouts typically done with MCTS. So, the supervised learning data was also used to train a second network that is much faster to evaluate - the <strong>rollout network</strong>. The full policy network is only ever used once to get an initial estimate on how good a move is, and then the much faster rollout policy is used in choosing the many more moves needed to get to an end of the game in an MCTS rollout. This makes the move selections in simulation better than random but fast enough to have the benefits of MCTS. These two components together with MCTS already made for a far better Go AI than has every been achieved, but there was a third trick that really pushed AlphaGo into the highest ranks of human skill.</li> <li>A huge part of why MCTS works so well for Go is that it removes the need to write evaluation functions for positions in Go, which is hard. Well, if we can train a neural net to predict what good moves are, it’s not hard to imagine we could also train a neural net to evaluate a Go position. And that is precisely why this group did: They used the already-trained high quality policy network to generate a dataset of positions and final outcomes in that game, and trained a <strong>value network</strong> that evaluated a position based on the overall probability of winning the game from that position. So the policy net suggests promising moves to evaluate, which is then done through a combination of MCTS rollouts (using the rollout net) and the value network prediction, which together turn out to work significantly better than either by itself.</li> <li>To top everything off, all of this was implemented in a hugely scalable manner that can and did leverage hardware that easily surpassed anything that was used for Go play in the past. AlphaGo used 40 search threads running on 48 CPUs, with 8 GPUs for neural net computations being done in parallel. And that’s just one one computer! AlphaGo was also implemented in a distributed version which could run on multiple machines, and scale to more than a thousand CPUs and close to 200 GPUs.</li> </ol> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/27-alphagonets.jpg" alt="AlphaGoA" /> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/28-alphagosearch.jpg" alt="AlphaGoB" /> <figcaption>(A) A great visual breakdown of all the neural nets involved <a href="http://www.nature.com/nature/journal/v529/n7587/fig_tab/nature16961_F1.html"><b>(From the AlphaGo Nature Paper)</b></a><br /> (B) And a breakdown of how these networks are used in conjunction with MCTS. I recommend <a href="https://xcorr.net/2016/02/03/5-easy-pieces-how-deepmind-mastered-go/"><b>this technical summary</b></a> for more detail. <b>(Source: also from the Nature Paper, with modified annotations by <a href="http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works">Shane (Seungwhan) Moon)</a></b></figcaption> </figure> <p>Besides training a state-of-the-art move predictor through a combination of supervised and reinforcement learning, probably the most novel aspect of AlphaGo is the strong integration of machine-learned intelligence with MCTS. Whereas the 2014 paper only contained a very preliminary combination of MCTS with a CNN, and before that MCTS programs contained at most modest integrations with supervised learning, AlphaGo gets its strength from being a well designed hybrid AI approach that is run on modern-day supercomputer hardware. Put simply, AlphaGo is a huge successful synthesis of multiple prior effective approaches that is far better than any of those individual elements alone. At least one other team, Yuandong Tian and Yan Zhu from Facebook, explored the same MCTS+deep learning idea in the same timeframe and likewise achieved impressive results <sup id="fnref:DarkForest"><a href="#fn:DarkForest" class="footnote">20</a></sup>. But the Google team definitely invested much more effort and resources into AlphaGo, and that ultimately paid off when it achieved a historic milestone for AI.</p> <blockquote> <p>AlphaGo = Deep Supervised Learning (with domain-specific features) + Deep Reinforcement Learning + Monte Carlo Tree Search + Hugely parallel processing</p> </blockquote> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/29-go2016.png" alt="AlphaGoRankA" /> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/30-alphagoranking.jpg" alt="AlphaGoRankB" /> <figcaption>AlphaGo, now at the 'divine' top rank of Go play. <b>(From the AlphaGo Nature Paper)</b> </figcaption> </figure> <p>And now, finally, we are back to where we began: Just a few months after the announcement of AlphaGo’s existence, it faced off against one of the best living Go players and decidedly won. This will perhaps go down as a historic moment not just for the field of AI, but for the whole history of human invention and ingenuity. I think it is fascinating to see how we can trace the intellectual efforts to accomplish this decades back, with every idea used to build AlphaGo being a refinement on one that came before it, and see that (as always in science and engineering) credit here is due to both a large team at Google and dozens of people that made their work possible. Now, what of the future?</p> <figure> <a href="/writing/images/2016-4-15-a-brief-history-of-game-ai/0-history.png"> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/0-history.png" alt="History" /> </a> <figcaption>The past - just about the scope of this series of posts, as promised. Created with <a src="http://www.readwritethink.org/classroom-resources/student-interactives/timeline-30007.html">Timeline</a>.</figcaption> </figure> <h1 id="epilogue-ai-after-alphago">Epilogue: AI After AlphaGo</h1> <figure> <a href="/writing/images/2016-4-15-a-brief-history-of-game-ai/31-venn.png"> <img class="postimagesmall" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/31-venn.png" alt="Games" /> </a> <figcaption>Go and Chess in the spheres of AI problems. I recommend <a href="https://xcorr.net/2016/02/03/5-easy-pieces-how-deepmind-mastered-go/"><b>this for more quality discussion how significant AlphaGo is for AI as a research field.</b></a></figcaption> </figure> <p>Congratulations, dear reader, you made it. I am impressed and appreciative that you got to this point. With this full history well trodden, I hope that my secret motivation in writing this clear — to show that AlphaGo is not some scary incomprehensible AI program, but really a quite reasonable feat of human research and engineering. And also to note that Go, ultimately, is still a strategic board game. Yes, it is a game for which it is hugely challenging to make good computer programs. But, in being a game, it also possesses multiple nice properties that make AlphaGo possible, as shown in the above Venn Diagram.</p> <p>There are many, many problems still in the realms of AI that are outside the intersection of those nice properties. To name just a few: playing Starcraft or Soccer, human-like conversation, (fully) autonomous driving. This is an incredible time for AI research, as we are starting to see these problems being tackled by well funded researchers like the ones at Google. In fact, I think it is appropriate to end with this quote <a href="https://googleblog.blogspot.com/2016/03/what-we-learned-in-seoul-with-alphago.html?m=1">from the AlphaGo team itself</a>:</p> <blockquote> <p>“But as they say about Go in Korean: “Don’t be arrogant when you win or you’ll lose your luck.” This is just one small, albeit significant, step along the way to making machines smart. We’ve demonstrated that our cutting edge deep reinforcement learning techniques can be used to make strong Go and Atari players. Deep neural networks are already used at Google for specific tasks — like image recognition, speech recognition, and Search ranking. However, we’re still a long way from a machine that can learn to flexibly perform the full range of intellectual tasks a human can — the hallmark of true artificial general intelligence.<br /><br /> With this tournament, we wanted to test the limits of AlphaGo. The genius of Lee Sedol did that brilliantly — and we’ll spend the next few weeks studying the games he and AlphaGo played in detail. And because the machine learning methods we’ve used in AlphaGo are general purpose, we hope to apply some of these techniques to other challenges in the future. Game on!”</p> </blockquote> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/32-leesedol.jpg" alt="LeeSedol" /> <figcaption>"Demis and Lee Sedol hold up the signed Go board from the Google DeepMind Challenge Match" <a href="https://googleblog.blogspot.com/2016/03/what-we-learned-in-seoul-with-alphago.html?m=1"><b>(From the AlphaGo team's post quoted)</b></a></figcaption> </figure> <h2 id="acknowledgements">Acknowledgements</h2> <p>Big thanks to <a href="http://cs.stanford.edu/people/abisee/">Abi See</a> and <a href="https://www.linkedin.com/in/pavel-komarov-a2834048">Pavel Komarov</a> for helping to edit this.</p> <h2 id="references">References</h2> <div class="footnotes"> <ol> <li id="fn:ManyFaces"> <p>David Fotland (1993). Knowledge representation in the Many Faces of Go. <a href="#fnref:ManyFaces" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:MonteCarloGo"> <p>Brügmann, B. (1993). Monte carlo go (Vol. 44). Syracuse, NY: Technical report, Physics Department, Syracuse University. <a href="#fnref:MonteCarloGo" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:ParisSchool1"> <p>Bouzy, B., &amp; Helmstetter, B. (2004). <a href="http://www.ai.univ-paris8.fr/~bh/articles/acg10-mcgo.pdf">Monte-carlo go developments</a>. In Advances in computer games (pp. 159-174). Springer US. <a href="#fnref:ParisSchool1" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:ParisSchool2"> <p>Cazenave, T., &amp; Helmstetter, B. (2005). <a href="http://www.ai.univ-paris8.fr/~bh/articles/searchmcgo.pdf">Combining Tactical Search and Monte-Carlo in the Game of Go</a>. CIG, 5, 171-175. <a href="#fnref:ParisSchool2" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:ParisSchool3"> <p>Bouzy, B. (2005). <a href="http://web.mi.parisdescartes.fr/~bouzy/publications/Bouzy-JCIS03.pdf">Associating domain-dependent knowledge and Monte Carlo approaches within a Go program</a>. Information Sciences, 175(4), 247-257. <a href="#fnref:ParisSchool3" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:MonteCarloRevolution"> <p>Coulom, R. (2009, January). <a href="http://www.remi-coulom.fr/JFFoS/JFFoS.pdf">The Monte-Carlo Revolution in Go</a>. In The Japanese-French Frontiers of Science Symposium (JFFoS 2008), Roscoff, France. <a href="#fnref:MonteCarloRevolution" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:MCTS"> <p>Coulom, R. (2006). Efficient selectivity and backup operators in Monte-Carlo tree search. In Computers and games (pp. 72-83). Springer Berlin Heidelberg. <a href="#fnref:MCTS" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:BanditMonte"> <p>Kocsis, L., &amp; Szepesvári, C. (2006). <a href="http://www.sztaki.hu/~szcsaba/papers/ecml06.pdf">Bandit based monte-carlo planning</a>. In Machine Learning: ECML 2006 (pp. 282-293). Springer Berlin Heidelberg. <a href="#fnref:BanditMonte" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:MoGo"> <p>Gelly, S., Wang, Y., Teytaud, O., Patterns, M. U., &amp; Tao, P. (2006). <a href="https://hal.inria.fr/inria-00117266/document">Modification of UCT with patterns in Monte-Carlo Go</a>. Technical Report RR-6062, 32, 30-56. <a href="#fnref:MoGo" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:GoHistory"> <p><a href="http://www.britgo.org/computergo/history">History of Go-playing programs.</a> http://www.britgo.org/computergo/history <a href="#fnref:GoHistory" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:TDGo"> <p>Schraudolph, N. N., Dayan, P., &amp; Sejnowski, T. J. (1994). <a href="http://www.gatsby.ucl.ac.uk/~dayan/papers/sds94.pdf">Temporal difference learning of position evaluation in the game of Go</a>. Advances in Neural Information Processing Systems, 817-817. <a href="#fnref:TDGo" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:DBN"> <p>Hinton, G. E., Osindero, S., &amp; Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554. <a href="#fnref:DBN" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:Atari"> <p>Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., &amp; Riedmiller, M. (2013). <a href="http://arxiv.org/abs/1312.5602">Playing atari with deep reinforcement learning</a>. arXiv preprint arXiv:1312.5602. <a href="#fnref:Atari" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:Go2000sA"> <p>Sutskever, I., &amp; Nair, V. (2008). <a href="http://www.cs.utoronto.ca/~ilya/pubs/2008/go_paper.pdf">Mimicking Go experts with convolutional neural networks</a>. In Artificial Neural Networks-ICANN 2008 (pp. 101-110). Springer Berlin Heidelberg. <a href="#fnref:Go2000sA" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:Go2000sB"> <p>Stern, D., Herbrich, R., &amp; Graepel, T. (2006, June). <a href="http://www.autonlab.org/icml_documents/camera-ready/110_Bayesian_Pattern_Ran.pdf">Bayesian pattern ranking for move prediction in the game of Go</a>. In Proceedings of the 23rd international conference on Machine learning (pp. 873-880). ACM. <a href="#fnref:Go2000sB" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:Go2000sC"> <p>Wu, L., &amp; Baldi, P. F. (2006). <a href="https://papers.nips.cc/paper/3094-a-scalable-machine-learning-approach-to-go.pdf">A scalable machine learning approach to go</a>. In Advances in Neural Information Processing Systems (pp. 1521-1528). <a href="#fnref:Go2000sC" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:Go2014A"> <p>Clark, C., &amp; Storkey, A. (2014). <a href="https://arxiv.org/abs/1412.3409">Teaching deep convolutional neural networks to play go</a>. arXiv preprint arXiv:1412.3409. <a href="#fnref:Go2014A" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:Go2014B"> <p>Maddison, C. J., Huang, A., Sutskever, I., &amp; Silver, D. (2014). <a href="https://arxiv.org/abs/1412.6564">Move evaluation in go using deep convolutional neural networks</a>. arXiv preprint arXiv:1412.6564. <a href="#fnref:Go2014B" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:AlphaGo"> <p>David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel &amp; Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484-503. <a href="#fnref:AlphaGo" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:DarkForest"> <p>Tian, Y., &amp; Zhu, Y. (2015). Better Computer Go Player with Neural Network and Long-term Prediction. arXiv preprint arXiv:1511.06410. <a href="#fnref:DarkForest" class="reversefootnote">&#8617;</a></p> </li> </ol> </div> <p><a href="/writing/a-brief-history-of-game-ai-part-3/">A 'Brief' History of Game AI Up To AlphaGo, Part 3</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on April 18, 2016.</p> <![CDATA[A 'Brief' History of Game AI Up To AlphaGo, Part 2]]> /writing/a-brief-history-of-game-ai-part-2 2016-04-18T19:19:34-07:00 2016-04-18T19:19:34-07:00 www.andreykurenkov.com contact@andreykurenkov.com <p>This is the second part of ‘A Brief History of Game AI Up to AlphaGo’. Part 1 is <a href="/writing/a-brief-history-of-game-ai">here</a> and part 3 is <a href="/writing/a-brief-history-of-game-ai-part-3">here</a>. In this part, we shall cover just about four decades of progress, from the first victories of computers against people at Checkers and Chess all the way up to DeepBlue’s victory against humanity’s then-best living Chess player.</p> <h1 id="computers-start-to-win">Computers Start To Win</h1> <p class="sidenoteleftlarge">1958</p> <p>By the late 1950s, the industrious engineers at IBM were far from the only ones working on AI — excitement for the new field filled research groups in universities from the US to the Soviet Union. One such group was made up of Allen Newell and Herbert Simon (both attendants of the Dartmouth Conference) from Carnegie Mellon University, and Cliff Shaw from RAND Corporation. They collaborated on Chess AI from 1955 to 1958, culminating in <a href="http://aitopics.org/sites/default/files/classic/Feigenbaum_Feldman/C&amp;T-Newll-Shaw-Simon.pdf">“Chess Playing Programs and the Problem of Complexity”</a><sup id="fnref:NSS"><a href="#fn:NSS" class="footnote">1</a></sup> which both summarized existing Chess AI research and contributed new ideas that they tested with the NSS (Newell, Shaw, and Simon) Chess program.</p> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/7-NSS.png" alt="bernstein_chess" /> <figcaption> A summary of 50s work on Chess AI from the NSS group. <a href="http://aitopics.org/sites/default/files/classic/Feigenbaum_Feldman/C&amp;T-Newll-Shaw-Simon.pdf"><b>(Source)</b></a></figcaption> </figure> <p>Just as Shannon noted that master players use intuition to think selectively about moves, Newell, Shaw and Simon considered heuristics to be an important aspect of human Chess-playing. Like Bernstein’s program, the NSS algorithm used a type of simple “intelligence” to choose which moves to explore. The group’s most significant addition to Minimax was an approximation of something that became an essential part of future Chess playing programs: <strong>alpha-beta pruning</strong>. This is a small modification to Minimax that makes the algorithm avoid simulating moves that are clearly bad (‘pruning’ branches of the tree that need not be simulated), thus saving those precious computing resources for more promising moves. The efficiency gains can be huge, and alpha-beta pruning became as standard for future Chess programs as Minimax itself.</p> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/8A-alphabeta.jpg" /> <figcaption>A tiny example of alpha-beta pruning. You are currently at position A and have three move options: B, C and D. You want to maximize your end-game score. For any move you make, the opponent will choose a move so as to minimize your score. The worst score you might get with option B is 3, so as soon as you see your opponent has a response in option C that only nets you a score of 2, you can cease to explore option C because option B is already definitely better. <a href="http://cs.stackexchange.com/questions/1069/what-use-are-the-minimum-values-on-minimax-trees"><b>(Source)</b></a></figcaption> </figure> <p>In emphasizing the need for such heuristics, the NSS group also argued that implementing them would be much easier with higher-level “interpreted” programming languages — again, this is in 1958! Back then programmers worked in the binary language of the computer, so another notable aspect of the NSS group’s work is their use of a symbolic compiled programming language to implement a more complex program. As with Bernstein’s program, the limitations of the hardware and of the code resulted in a rather shoddy Chess player. Still, it <a href="https://chessprogramming.wikispaces.com/NSS">has been said</a> to be the first chess program to beat (an almost humorously inexperienced) human player:</p> <blockquote> <p>“In 1958, a chess program (NSS) beat a human player for the first time. The human player was a secretary who was taught how to play chess one hour before her game with the computer. The computer program was played on an IBM 704. The computer displayed a level of chess-playing expertise greater than an adult human could gain from one hour of chess instruction.”</p> </blockquote> <p class="sidenoteleftlarge">1962</p> <p>Meanwhile, Arthur Samuel’s Checkers program played Checkers well already, and continued to get better. In 1962, Samuel and IBM had enough faith in the program to publicly pit it against a good human player. As described in a wonderful <a href="https://webdocs.cs.ualberta.ca/~chinook/project/legacy.html">retrospective about the event</a>, they strangely chose the human opponent to be Robert Nealy, who considered himself a master but was not ranked highly as a tournament player. Partially because of this, and partially because the program was good at Checkers, Nealy lost. Though it would soon be clear Samuel’s program was no match for the best human players at the game — it was easily beaten by two of them at the 1966 world championship — the public and media reaction to its win in 1962 was not unlike the current media frenzy around AlphaGo:</p> <blockquote> <p>“Wait! Hold the presses! A computer defeated a master checkers player! This was a major news story. Computers could solve the game of checkers. Mankind’s intellectual superiority was being challenged by electronic monsters. To the technology-illiterate public of 1962, this was a major event. It was a precursor to machines doing other intelligent things better than man. How long could it possibly be before computers would be smarter than man? After all, comput­ers have only been around for a few years, and already rapid progress was being made in the fledgling computer field of artificial intelligence. Paranoia.” <a href="https://webdocs.cs.ualberta.ca/~chinook/project/legacy.html">Source</a></p> </blockquote> <figure class="sidefigureleft"> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/8B-GoChess.jpg" alt="Chess vs Go" /> <figcaption>A comparison of Chess vs Go. Go has a much larger branching factor and a set of rules for which it is much harder to write an evaluation function (and here's <a href="https://en.wikipedia.org/wiki/Go_%28game%29">a handy link to Wikipedia</a> for those). <a href="http://spectrum.ieee.org/computing/software/cracking-go/chess-vs-go"><b>(Source)</b></a></figcaption> </figure> <p>AlphaGo’s victory is of course in a different league - Lee Sedol is unquestionably among the best players in the world and our computer-acclimated culture is less shocked by such a feat — but it is interesting to note the similarities between the two highly publicized events. Despite the fact Samuel’s program was nowhere near as good as the best humans, its win gave the lasting impression Checkers was a ‘simple’ game that computers had already conquered and that Chess was the real challenge, much as Go was seen after Deep Blue’s success with Chess. Speaking of which, 1962 was the year the first computer Go program was attempted with <a href="http://www.britgo.org/files/computergo/remus.pdf">“Simulation of a Learning Machine For Playing Go”</a><sup id="fnref:RemusGo"><a href="#fn:RemusGo" class="footnote">2</a></sup> by H. Remus (also at IBM!), though the resulting program was incomplete and never played a full game of Go. It would be half a decade more until a true Go program akin to Bernstein’s Chess program or Samuel’s Checkers program would play human players.</p> <p>Meanwhile, yet more research teams in the Soviet Union and in the US were working on implementing Chess AIs. Notably, a group of students at MIT led by AI legend John McCarthy developed a Chess-playing program based on Minimax with alpha-beta pruning, and in 1966 faced it off against a program developed at the Moscow Institute of Theoretical and Experimental Physics (ITEP) by telegram. The Kotok-McCarthy program lost 3-1, and was in general very weak due to being limited to searching very few positions (fewer than Bernstein’s program, even). But, another student named Richard Greenblatt saw the program and, being a skilled chess player, was inspired to write his own - the <a href="https://en.wikipedia.org/wiki/Mac_Hack">Mac Hack</a>. This program searched through many more positions and had other refinements, to the point that it could beat a ranked human player in a tournament in 1967 and win or draw multiple times more in succeeding tournaments. But it was still nowhere near as good as the best players.</p> <div><button class="btn" data-toggle="collapse" data-target="#greenblatt"> Aside: more on Richard Greenblatt's Chess Program &raquo; </button></div> <blockquote class="aside"><p id="greenblatt" class="collapse" style="height: 0px;"> There is <a href="http://archive.computerhistory.org/resources/text/Oral_History/Greenblatt_Richard/greenblatt.oral_history_transcript.2005.102657935.pdf">a fun oral history of Richard Greenblatt's</a> that is quite worth looking over if you are curious. Here are some choice excerpts:<br /><br /> "Anyway, I looked at this thing and I could see that the quality of the analysis was not good. And I said, gee, I can do better than that. And so I immediately set to it, and within just a few weeks, after I got back, I had the thing playing chess.<br /> ...<br /> And so then as word got around — Well, there was a guy a MIT in those days named Hubert Dreyfuss, who was a prominent critic of artificial intelligence, and made some statements of the form, you know, computers will never be any good for chess, and so forth. And, of course, he was, again, very romanticized. He was not a strong chess player. However, he thought he was, or I guess he knew he wasn’t world class, but he thought he was a lot better than he was. So anyway, I had this chess program and basically Jerry Sussman, who’s a professor at MIT now, and who was also a member of our group, had played. It was around and it was available on the machine. People played it, and so forth. And basically Sussman brought over Dreyfuss and said, well, how would you like to have a friendly game or something. Dreyfuss said, oh, sure. And sure enough, Dreyfuss sat down and got beat. So this immediately got quite a bit of publicity. " </p></blockquote> <figure class="sidefigureright"> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/8C-1968go.jpg" /> <figcaption>A figure showing Zobrist's visual representation from <a href="http://www.computer.org/csdl/proceedings/afips/1969/5073/00/50730103.pdf">the paper</a>.</figcaption> </figure> <p>Then, in 1968 a Go playing program reached the milestone that was conquered for Chess a whole decade earlier: beating a wholly inexperienced amateur. The program did not rely on tree search, but was rather based on emulating the way a human player “sees” an internal representation of a game position in Go so as to recognize patterns that matter for choosing the correct move. Interestingly, much of the power of AlphaGo is based on creating powerful internal representations of the board with Machine Learning techniques commonly applied to visual tasks, so the intuition here was in a way quite right. This feat was achieved by Alfred Zobrist, as described in <a href="http://www.computer.org/csdl/proceedings/afips/1969/5073/00/50730103.pdf">“A novel of visual organization for the game of GO”</a><sup id="fnref:ZobristGo"><a href="#fn:ZobristGo" class="footnote">3</a></sup>:</p> <blockquote> <p>“Given that a player “sees” a fairly stable and uniform internal representation, it follows that familiar and meaningful configurations may be recognized in terms of it. The result of visual organization is to classify a tremendous number of possible board situations into a much smaller number of recognizable or familiar board situations. Thus a player can respond to a board position he has never encountered, because it has been mapped into a familiar internal representation. This report will describe a simulation model for visual organization. … The program now has a record of two wins and two losses against human opponents. The opponents can best be described as intelligent adults who know how to play GO, have played from two to twenty games but have not studied the game. The program appears to have reached the bottom rung of the ladder of human GO players. “</p> </blockquote> <p>Because tricky ideas like this were necessary in order to cope with the huge branching factor and hard-to-codify heuristics of the game, progress for Go playing programs was much slower than for Chess or Checkers. It would be another decade until Bruce Wilcox developed a stronger program, again without reliance on traditional game AI techniques but with some limited tree search (as covered in <a href="http://www.wired.com/2014/05/the-world-of-computer-go/">this great Wired story</a>). The approach there was to subdivide the bigger board into smaller regions that were easier to reason about, which would continue to be a hallmark of Go AIs. But even then, it was nowhere near even decent human play.</p> <p>The same could not be said of Chess programs. Throughout the 70s, Chess AI progressed mostly by refining previously successful approaches. For instance, in the early 70s the Chess AI group at ITEP refined their program into a better version they named Kaissa, which went on to become the first computer Chess champion of the world in 1974 after squaring off against US programs. The program significantly benefited from faster computers and an efficient implementation that included alpha-beta pruning and some other tricks, for the first time showing the strength of the Shannon ‘type A’ AI strategy that relied more on fast search than smart heuristics or position evaluation.</p> <p>But also by this point, it was becoming typical to use extra ‘type B’ ideas such as <a href="https://chessprogramming.wikispaces.com/Quiescence+Search">quiescence</a> (basically searching further only after moves involving captures or checks, to not mistake trades for piece captures). It turned out to be very beneficial to selectively search further in certain tree paths than in others, so as to not miss critical turns in the game. As we’ll see, these techniques ultimately proved sufficient to write Chess AIs that can beat all of humanity — though it would take a while longer to get there…</p> <h1 id="humans-stop-winning-except-for-go">Humans Stop Winning… except for Go</h1> <p class="sidenoteleftlarge">1989</p> <p>The first computer program to completely dominate humans at a complex game was not developed until about 3 decades after Samuel’s Checkers program won that one game against Robert Nealy, and it was the Checkers program CHINOOK. The program was developed by a team at the University of Alberta led by Jonathan Schaeffer, starting in 1989. By 1994 the best Checkers player on the planet only managed to play CHINOOK to a draw <sup id="fnref:Chinook"><a href="#fn:Chinook" class="footnote">4</a></sup>.</p> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/9-chinook.jpg" alt="games_history" /> <figcaption>Chinook being put to the test against the world champion in Checkers. <a href="http://afflictor.com/2015/12/16/within-the-decade-the-computer-will-know-how-the-game-will-turn-out-even-before-it-begins/"><b>(Source)</b></a></figcaption> </figure> <p>By the 90s computers got orders of magnitude faster, and computer memory orders of magnitude larger, compared even to the computers of the 70s. This both enabled and enhanced the several techniques that powered CHINOOK: (A) a database of opening moves from games played by grandmasters, (B) alpha-beta tree search with an evaluation function based on a linear combination of many handcrafted features, and (C) an end-game database for all positions with fewer than eight pieces. And that’s it! That’s the recipe to a world-class Checkers playing program.</p> <p>A similar recipe also powered a world-class Chess program developed around that time - Deep Thought. Developed by a team headed by Feng-hsiung Hsu, it incorporated all these ideas and had two notable extra strengths: custom hardware and smart selective extensions. According to a <a href="http://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/753/671">retrospective about its success</a>, it was the fastest Chess program up to that point in terms of how many positions it could consider per second. This was achieved by performing move simulation and evaluation with custom circuit boards, which worked in tandem with software running on a powerful computer. In addition to being fast, Deep Thought was also smart: it had <em>singular extensions</em>, a nice type of <a href="https://chessprogramming.wikispaces.com/Extensions">selective extension</a> of search past the default depth at promising positions. This allowed search depth to be extended considerably: “The result is that on the average, an N ply search penetrates along the principal variation to a depth of 1.5N and reaches a depth of 3N about once in a game”<sup id="fnref:DeepThoughtWins"><a href="#fn:DeepThoughtWins" class="footnote">5</a></sup><sup id="fnref:DeepThoughtExtensions"><a href="#fn:DeepThoughtExtensions" class="footnote">6</a></sup>.</p> <p>So, Deep Thought was successful precisely because it was a combination of ‘type A’ brute force AI (searching all positions up to a certain depth) and ‘type B’ selective search (searching past that depth in certain cases). By 1988, Deep Thought became the computer Chess champion of the world and, more impressively, beat Chess grandmaster Bent Larsen.</p> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/10-deepthought.jpg" alt="deep_thought" /> <figcaption>The Deep Thought team showing off their custom hardware when they won the Fredkin Intermediate Prize: "In 1988 Deep Thought and Grandmaster Tony Miles shared first place in the Software Toolworks Open in Los Angeles. Deep Thought had a 2745 performance rating, and moved its U.S. Chess Federation (USCF) rating up to 2551, and qualified for the $10,000 Fredkin Intermediate Prize as the first computer to achieve a USCF performance rating of 2500 over a set of 25 contiguous games in human tournaments." <a href="https://chessprogramming.wikispaces.com/Deep+Thought"><b>(Source)</b></a></figcaption> </figure> <p>Another interesting aspect of Deep Thought is that its evaluation function was automatically tuned using a database of games between master chess players, rather than having all the function’s parameters hardcoded by its programmers. In this respect it harkened all the way back to Arthur Samuel’s Checkers program, which also had the ability to ‘learn’ by tuning its evaluation function from experience. Though Chess programs improved over the decades due to increased computer speeds and ideas such as alpha-beta pruning and selective extensions, almost all programs still had no learning component and ultimately derived all their intelligence fully from their human creators. Deep Thought was a notable break from this trend.</p> <p>Still, the structure of Deep Thought’s evaluation function encoded a lot of human intuition and knowledge about the game of chess, as was the norm. This is problematic, because the tough part in playing a game (how to evaluate positions and select moves) is essentially still solved by the programmer and not the program itself. It should be possible to write AI programs that could just learn this stuff by themselves, right? Right, and this was soon done for the first time using an algorithm that was later essential to AlphaGo: <strong>neural networks</strong>. Neural networks are a technique for <strong>supervised machine learning</strong>, which is just a category of algorithms that can learn to produce a desired output for some type of input by viewing many training examples of known input/output pairs of the same type. (For an in-depth explanation feel free to look at <a href="http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning/">my little neural net writeup</a>). Following a major 1986 paper describing how larger neural nets can be trained for tougher problems, they were all the rage in the late 80s and were being applied to many sorts of problems. One such application is the backgammon AI dubbed Neurogammon.</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/11.5-supervised.png" alt="Supervised Leraning" /> <figcaption>Visualization of supervised learning. The inputs are size and domestication, and the output is a classification of 'dog' or 'cat'. The dots already on the graph are the <b>training set</b> for learning, and the lines are the learned functions for getting an output for inputs not in the training set. <a href="https://en.wikipedia.org/wiki/Perceptron#/media/File:Perceptron_example.svgl"><b>(Source)</b></a>, By <a href="//commons.wikimedia.org/w/index.php?title=User:Elizabeth_goodspeed&amp;action=edit&amp;redlink=1" class="new" title="User:Elizabeth goodspeed (page does not exist)">Elizabeth Goodspeed</a> - <span class="int-own-work" lang="en">Own work</span>, <a title="Creative Commons Attribution-Share Alike 4.0" href="http://creativecommons.org/licenses/by-sa/4.0">CC BY-SA 4.0</a></figcaption> </figure> <p>Like Go, Backgammon has a huge branching factor and the traditional tree-search-with-handcrafted-evalution-function approach does not work well. A large branching factor makes it impossible to search many moves ahead, and it is very difficult to write a great evaluation function to compensate. Gerald Tesauro, a researcher at the University of Illinois and later IBM (surprise!), and renowned Machine Learning researcher Terrence Sejnowski explored an approach based on learning a good evaluation function (a goal that had been abandoned since Arthur Samuel’s work). As explained in their 1989 paper <a href="http://papers.cnl.salk.edu/PDFs/A%20Parallel%20Network%20That%20Learns%20to%20Play%20Backgammon%201989-2965.pdf">“A parallel network that learns to play backgammon”</a>, they trained a neural net to accept as input a backgammon game position and a potential move, and to output a score measuring the quality of that move <sup id="fnref:NeuroGammon"><a href="#fn:NeuroGammon" class="footnote">7</a></sup>. This approach removes the need for engineers to attempt to encode human intuition when writing the program, which is ideal. However, to make the approach work well some human intuition was still encoded in the system in the form of <strong>features</strong> — derived aspects about the game position, for example piece counts in Chess — also used as input in addition to the raw game position.</p> <figure class="sidefigureleft"> <div><button class="btn" data-toggle="collapse" data-target="#features"> Aside: why use features? &raquo; </button></div> <blockquote class="aside"><p id="features" class="collapse" style="height: 0px;"> When building machine learning systems with large 'raw data' (images, audio, or game states), it is typical to use informative <b>features</b> extracted from the data with human-written code. These features are then used as the input instead of the raw data. Intuitively, this makes the learning program easier by giving the machine learning algorithm only the useful information from the input and not forcing it to figure that bit out by itself. So-called <b>feature-engineering</b> used to be a standard step in building machine learning systems, and was possibly one of the most time-consuming steps since (as with evaluation functions) coming up and implementing good features is not always easy. Nowadays 'deep learning' (which we shall get to soon) has made it more typical to learn directly from the raw data. Indeed deep learning seems to derive much of its power from its ability to learn useful features better than the ones humans can implement. </p></blockquote> </figure> <figure> <img class="postimagesmall" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/11-nntraining.png" alt="Backprop" /> <figcaption>Supervised learning with neural nets. Basically, neural nets are made up of a bunch of units that each just output a weighted sum of their input, and the correct weights for a given application are learned from training data. Neurogammon worked exactly like this, except that the inputs were backgammon game positions, as well as derived features of the game positions, and the outputs were scores for the game position. <a href="http://devblogs.nvidia.com/parallelforall/inference-next-step-gpu-accelerated-deep-learning/">(Source)</a></figcaption> </figure> <p class="sidenoteleftlarge">1992</p> <p>With further improvements, the program was dubbed Neurogammon 1.0 and went on to win against all other programs at the 1989 First Computer Olympiad <sup id="fnref:NeuroGammonWins"><a href="#fn:NeuroGammonWins" class="footnote">8</a></sup>. However, it was still not as strong as the best human players, a feat that would soon go to another neural net based program by Gerald Tesauro: TD-Gammon. First unveiled to the world in 1992, TD-Gammon was a hugely successful application of <strong>reinforcement learning</strong>. Unlike supervised learning, which approximates some function with particular types of input and outputs, reinforcement learning deals with finding optimal choices in different situations. More specifically, we think in terms of states (situations), in which an agent (the program) can take actions that change the agent’s state in a known way (choices). Every transition between states results in a numeric ‘reward’, and figuring out the right action to take in a given state in order to get the highest reward in the long term, is what reinforcement learning is broadly about. Whereas supervised learning learns to approximate a function via examples of inputs and outputs, reinforcement learning generally learns from ‘experience’ of receiving rewards after trying actions in different states.</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/12-rl.png" alt="RL" /> <figcaption>A visualization of the general idea of reinforcement learning. Rather than learning to compute a correct output given some input, as in supervised learning, the goal is to learn to choose a correct action in any state in order to obtain the maximum reward in the long term. <a href="http://www2.hawaii.edu/~chenx/ics699rl/grid/rl.html"><b>(Source)</b></a></figcaption> </figure> <p>So, TD-Gammon learned by just playing games of backgammon against prior versions of itself, observing which player won, and using that experience to tune a neural net to produce a probability of winning from any given position. This is fundamentally different from Neurogammon, which required compiling a dataset of hundreds of moves with human-assigned scores and was thus much more cumbersome than just letting the program play games against itself for a few hours. Note this is very similar to what Arthur Samuel was trying to do all the way back in 1957 with his Checkers program that learned from self-play. In fact, the type of reinforcement learning TD-Gammon is based on, Temporal Difference Learning, was developed in 1986 by Richard Sutton as a formalization of the learning in Samuel’s work<sup id="fnref:TDSutton"><a href="#fn:TDSutton" class="footnote">9</a></sup>.</p> <p class="sidenoteleftlarge">1994</p> <p>Besides learning through self-play, there was nothing fancy in the approach - instead of using tuned tree search the program just exhaustively looked at all positions two steps ahead and used the move that led to the largest probability of winning. With just the raw board positions as input — essentially no human intuition engineered into it — TD-Gammon achieved a level of play comparable to Neurogammon. And with the addition of Neurogammon’s features, it became comparable to the best human players in the world<sup id="fnref:TDGammon"><a href="#fn:TDGammon" class="footnote">10</a></sup>.</p> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/13-tdgammon.png" alt="TDGammon" /> <figcaption>The TD-Gammon neural net that learned to play expert-level Backgammon. The input later included features in addition to the raw board positions. <a href="https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node108.html"><b>(Source)</b></a></figcaption> </figure> <p>TD-Gammon is to this day a milestone in the history of AI. But, when researchers naturally tried to use the same approach for other games, the results were not quite as impressive. Sebastian Thrun’s NeuroChess<sup id="fnref:NeuroChess"><a href="#fn:NeuroChess" class="footnote">11</a></sup> (1995) was only comparable to commercial Chess programs on a low difficulty setting, and Markus Enzenberger’s NeuroGo<sup id="fnref:NeuroGo"><a href="#fn:NeuroGo" class="footnote">12</a></sup> (1996) likewise did not match the skill of existing (poor) Go AIs. In the case of NeuroChess, the discrepancy was surmised to be in large part due to the large amount of time it took to compute the evaluation function (“Computing a large neural network function takes two orders of magnitude longer than evaluating an optimized linear evaluation function (like that of GNU-Chess)”), making NeuroChess unable to explore nearly as many moves ahead as the commercial Chess program. The benefit of a better evaluation function just did not win out over a simpler one that allowed for many more positions to be explored during search.</p> <p class="sidenoteleftlarge">1997</p> <p>Which brings us back to Deep Thought. After the success of that program, some of the same team were hired by IBM and set out to create Deep Thought II, which was later renamed Deep Blue (Deep Thought x Big Blue = Deep Blue). By and large, Deep Blue was conceptually the same as Deep Thought but much, much beefier in terms of computing power — it was a custom- built supercomputer! Still, when it played Kasparov in 1996 Deep Blue lost with a score of 2-4. The team then spent a year making Deep Blue yet more powerful and tuning its evaluation function, and it was this version that historically beat Kasparov with a score of 3.5-2.5 on May 11th of 1997<sup id="fnref:DeepBlue"><a href="#fn:DeepBlue" class="footnote">13</a></sup>.</p> <figure class="sidefigureright"> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/15-deepblue.jpg" alt="Kasparov" /> <figcaption>The supercomputer that powered Deep Blue <a href="https://www-03.ibm.com/ibm/history/exhibits/vintage/vintage_4506VV1001.html"><b>(Source)</b></a></figcaption> </figure> <figure class="sidefigureleft"> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/14-kasparov.jpg" alt="Kasparov" /> <figcaption>Kasparov vs Deep Blue <a href="http://stanford.edu/~cpiech/cs221/apps/deepBlue.html"><b>(Source)</b></a></figcaption> </figure> <figure> <figure> <iframe src="https://www.youtube.com/embed/NJarxpYyoFI" frameborder="0" allowfullscreen=""></iframe> </figure> <figcaption>A short documentary about Kasparov vs Deep Blue.</figcaption> </figure> <p>The team credited many things with getting Deep Blue to the point that it could score this victory<sup id="fnref:DeepBlue:1"><a href="#fn:DeepBlue" class="footnote">13</a></sup>:</p> <blockquote> <p>“There were a number of factors that contributed to this success, including:<br /> 1. a single-chip chess search engine,<br /> 2. a massively parallel system with multiple levels of parallelism,<br /> 3. a strong emphasis on search extensions,<br /> 4. a complex evaluation function, and<br /> 5. effective use of a Grandmaster game database”<br /></p> </blockquote> <p>So, it would be wrong to claim DeepBlue won purely through “brute-force”, since it included decades of ideas about how to tackle the AI problem of Chess. But brute-force surely was hugely important - DeepBlue was run with thirty processors inside a supercomputer working jointly with 480 single-chip chess search engines (16 per processor). When playing Kasparov it observed 126 million positions per second on average, and typically searched to a depth of between 6 and 12 plies and to a maximum of forty plies. All this allowed it to barely win, arguably due to uncharacteristic blunders on Kasparov’s part. But, all that hardly matters; since then computers have continued to become exponentially faster, and today humanity’s best Chess players are likely no match for programs you can run on your smartphone.</p> <p>So, Checkers, Chess, and Backgammon had all been mastered by AI programs by the late 90s - what about Go? Even the best computer programs were poor matches for amateurs with some experience. The techniques we’ve seen so far — supervised learning, reinforcement learning, and well-tuned tree search — were all attempted and found insufficient to make a Go program that could challenge serious human players. To see why these approaches failed, and how their defects were addressed over the span of two decades culminating in the creation of AlphaGo, go on ahead to <a href="/writing/a-brief-history-of-game-ai-part-3">the final part of this history</a>.</p> <h2 id="acknowledgements">Acknowledgements</h2> <p>Big thanks to <a href="http://cs.stanford.edu/people/abisee/">Abi See</a> and <a href="https://www.linkedin.com/in/pavel-komarov-a2834048">Pavel Komarov</a> for helping to edit this.</p> <h2 id="references">References</h2> <div class="footnotes"> <ol> <li id="fn:NSS"> <p>Allen Newell, Cliff Shaw, Herbert Simon (1958). Chess Playing Programs and the Problem of Complexity. IBM Journal of Research and Development, Vol. 4, No. 2, pp. 320-335. Reprinted (1963) in Computers and Thought (eds. Edward Feigenbaum and Julian Feldman), pp. 39-70. McGraw-Hill, New York, N.Y. pdf <a href="#fnref:NSS" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:RemusGo"> <p>Remus, H. (1962, January). Simulation of a learning machine for playing Go. In COMMUNICATIONS OF THE ACM (Vol. 5, No. 6, pp. 320-320). 1515 BROADWAY, NEW YORK, NY 10036: ASSOC COMPUTING MACHINERY. <a href="#fnref:RemusGo" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:ZobristGo"> <p>Zobrist, A. L. (1969, May). A model of visual organization for the game of Go. In Proceedings of the May 14-16, 1969, spring joint computer conference (pp. 103-112). ACM. <a href="#fnref:ZobristGo" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:Chinook"> <p>Schaeffer, J., Lake, R., Lu, P., &amp; Bryant, M. (1996). <a href="https://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/1208/1109">CHINOOK, The World Man-Machine Checkers Champion</a>. AI Magazine, 17(1), 21. <a href="#fnref:Chinook" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:DeepThoughtWins"> <p>Berliner, H. J. (1989). <a href="https://pdfs.semanticscholar.org/bf2d/10d4bc292762f8ca5e648a0668baafd2e551.pdf">Deep Thought Wins Fredkin Intermediate Prize</a>. AI Magazine, 10(2), 89. <a href="#fnref:DeepThoughtWins" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:DeepThoughtExtensions"> <p>Anantharaman, T., Campbell, M. S., &amp; Hsu, F. H. (1990). Singular extensions: Adding selectivity to brute-force searching. Artificial Intelligence, 43(1), 99-109. <a href="#fnref:DeepThoughtExtensions" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:NeuroGammon"> <p>Tesauro, G., &amp; Sejnowski, T. J. (1989). <a href="http://papers.cnl.salk.edu/PDFs/A%20Parallel%20Network%20That%20Learns%20to%20Play%20Backgammon%201989-2965.pdf">A parallel network that learns to play backgammon. Artificial Intelligence</a>, 39(3), 357-390. <a href="#fnref:NeuroGammon" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:NeuroGammonWins"> <p>Tesauro, G. (1989). Neurogammon wins computer olympiad. Neural Computation, 1(3), 321-323. <a href="#fnref:NeuroGammonWins" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:TDSutton"> <p>Sutton, R. S. (1988). <a href="https://webdocs.cs.ualberta.ca/~sutton/papers/sutton-88-with-erratum.pdf">Learning to predict by the methods of temporal differences</a>. Machine learning, 3(1), 9-44. <a href="#fnref:TDSutton" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:TDGammon"> <p>Tesauro, G. (1994). <a href="http://www.aaai.org/Papers/Symposia/Fall/1993/FS-93-02/FS93-02-003.pdf">TD-Gammon, a self-teaching backgammon program, achieves master-level play</a>. Neural computation, 6(2), 215-219. <a href="#fnref:TDGammon" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:NeuroChess"> <p>Thrun, S. (1995). <a href="http://www-preview.ri.cmu.edu/pub_files/pub1/thrun_sebastian_1995_8/thrun_sebastian_1995_8.pdf">Learning to play the game of chess</a>. Advances in neural information processing systems, 7. <a href="#fnref:NeuroChess" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:NeuroGo"> <p>Enzenberger, Markus. <a href="http://www.cgl.ucsf.edu/go/Programs/neurogo-html/neurogo.html">“The integration of a priori knowledge into a Go playing neural network.”</a> (1996). <a href="#fnref:NeuroGo" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:DeepBlue"> <p>Campbell, M., Hoane, A. J., &amp; Hsu, F. H. (2002). <a href="http://www.mimuw.edu.pl/~ewama/zsi/deepBlue.pdf">Deep blue</a>. Artificial intelligence, 134(1), 57-83. <a href="#fnref:DeepBlue" class="reversefootnote">&#8617;</a> <a href="#fnref:DeepBlue:1" class="reversefootnote">&#8617;<sup>2</sup></a></p> </li> </ol> </div> <p><a href="/writing/a-brief-history-of-game-ai-part-2/">A 'Brief' History of Game AI Up To AlphaGo, Part 2</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on April 18, 2016.</p> <![CDATA[A 'Brief' History of Game AI Up To AlphaGo, Part 1]]> /writing/a-brief-history-of-game-ai 2016-04-18T19:19:34-07:00 2016-04-18T19:19:34-07:00 www.andreykurenkov.com contact@andreykurenkov.com <figure class="figure"><div class="figure__main"> <p><a href="/writing/images/2016-4-15-a-brief-history-of-game-ai/0-history.png"><img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/0-history.png" alt="History" /></a></p> </div><figcaption class="figure__caption"><p>Just about the scope of this series of posts. Created with <a href="http://www.readwritethink.org/classroom-resources/student-interactives/timeline-30007.html">Timeline</a>.</p> </figcaption></figure> <p>This is the first part of ‘A Brief History of Game AI Up to AlphaGo’. Part 2 is <a href="/writing/a-brief-history-of-game-ai-part-2">here</a> and part 3 is <a href="/writing/a-brief-history-of-game-ai-part-3">here</a>. In this part, we shall cover the birth of AI and the very first game-playing AI programs to run on digital computers.</p> <h1 id="prologue-at-long-last-algorithms-triumph-over-humans-at-go">Prologue: At Long Last, Algorithms Triumph Over Humans At Go</h1> <p class="sidenoteleftlarge">2016</p> <p>On March 9th of 2016, a historic milestone for AI was reached when the Google-engineered program AlphaGo defeated the world-class Go champion Lee Sedol. Go is a two-player strategy board game like Chess, but the larger number of possible moves and difficulty of evaluation make Go the harder problem for AI. So it was a big deal when, a week and four more games against Lee Sedol later, AlphaGo was crowned the undisputed winner of their match having lost only one game. How big a deal? Media coverage accurately described AlphaGo as a <a href="http://www.theguardian.com/technology/2016/mar/09/google-deepmind-alphago-ai-defeats-human-lee-sedol-first-game-go-contest">“major breakthrough for AI”</a> that achieved <a href="http://www.theverge.com/2016/3/12/11210650/alphago-deepmind-go-match-3-result">“one of the most sought-after milestones in the field of AI research”</a>. Comment boards were less reserved, with many describing AlphaGo’s victory as scary or even a sign that superhuman AI was now imminent.</p> <p>Months before that day, I was excitedly skimming <a href="https://research.google.com/pubs/pub44806.html">the paper on AlphaGo</a> after Google <a href="http://googleresearch.blogspot.com/2016/01/alphago-mastering-ancient-game-of-go.html">first announced its development</a> <sup id="fnref:AlphaGo"><a href="#fn:AlphaGo" class="footnote">1</a></sup>. It struck me as a hugely impressive leap over state-of-the-art Go play, achieved through a cool combination of already successful techniques. So, when the media frenzy over AlphaGo broke out I thought to write a little post explaining why it is cool from an AI perspective, but also why it is not some scary baby-Skynet AI. In doing so, I stumbled on so many noteworthy developments and details in the 60-year history of game AI that I could not stop at writing that short little blog post. So, I hope you enjoy this ‘brief’ summary of all those exciting ideas and historic milestones that preceded AlphaGo and led to this latest marvel of human ingenuity.</p> <div><button class="btn" data-toggle="collapse" data-target="#sources"> Disclaimer: not an expert, more in depth sources, corrections &raquo; </button></div> <blockquote class="aside"> <p id="sources" class="collapse" style="height: 0px;"> As with my previous 'brief' history, I should emphasize I am not expert on the topic and just wrote it out of personal interest. I have not covered all periods or aspects of this history, so some other good resources are <a href="http://academicworks.cuny.edu/cgi/viewcontent.cgi?article=1181&amp;context=gc_pubs">"The History of Computer Games"</a>, <a href="https://www.chess.com/article/view/a-short-history-of-computer-chess">"A Short History of Computer Chess"</a>, and <a href="http://www.britgo.org/computergo/history">"History of Go-playing Programs"</a>. I am also not a professional writer, and consulted some good pieces written on the topic by professional writers such as <a href="http://www.wired.com/2014/05/the-world-of-computer-go/">"The Mystery of Go, the Ancient Game That Computers Still Can’t Win"</a> by Alan Levinovitz. I also will stay away from getting too technical here, but there is a plethora of tutorials on the internet on all the major topics covered in brief by me. <br /> Any corrections would be greatly appreciated, though I will note some omissions are intentional since I want to try and keep this 'brief' through a good mix of simple technical explanations and storytelling. </p> </blockquote> <p><br /></p> <h1 id="humble-beginnings">Humble Beginnings</h1> <p class="sidenoteleftlarge">1949</p> <p>Since the inception of the modern computer, there were people pondering whether it could match — or supersede — human intelligence. And since measuring human intelligence is difficult, many of those people reasoned they could tackle the question by first making computers good at certain tasks that challenged the human intellect. So, strategy games. As early as 1949, no less than <a href="https://en.wikipedia.org/wiki/Claude_Shannon">Claude Shannon</a> published his thoughts on the topic of how a computer might be made to play Chess <sup id="fnref:ShannonChess"><a href="#fn:ShannonChess" class="footnote">2</a></sup>. He both justified the usefulness of solving such a problem and defined its scope:</p> <blockquote> <p>“The chess machine is an ideal one to start with, since: (1) the problem is sharply defined both in allowed operations (the moves) and in the ultimate goal (checkmate); (2) it is neither so simple as to be trivial nor too difficult for satisfactory solution; (3) chess is generally considered to require ‘thinking’ for skillful play; a solution of this problem will force us either to admit the possibility of a mechanized thinking or to further restrict our concept of ‘thinking’; (4) the discrete structure of chess fits well into the digital nature of modern computers. … It is clear then that the problem is not that of designing a machine to play perfect chess (which is quite impractical) nor one which merely plays legal chess (which is trivial). We would like to play a skillful game, perhaps comparable to that of a good human player.”</p> </blockquote> <p>The approach Shannon suggested is today called <strong>Minimax</strong> (named after John Vonn Neuman’s minimax theorem, proven by him in 1928) and would be hugely influential for the future game-playing AIs. It is perhaps the most obvious approach one can take to making a game-playing AI. The idea is to assume both players will consider all future moves of the whole game, and so play optimally. In other words, you should always choose a move such that, even if the opponent chooses the absolute best response to that move and to every future move of yours, you will still get the highest score possible at the end of the game.</p> <p>It’s easy to make a computer do this. With a representation of the positions (or <strong>states</strong>) and the rules of the game, all one needs to do is write a program to generate all possible next game states from the current state, and the possible states for those states, and so on. By doing this, the program can simulate the game past the current point and build a <strong>tree</strong> of possible paths toward the end of the game. Then, it just needs to follow the best-case path of moves to get to the best end game. This has the flaw of not capitalizing on potential mistakes the opponent might make, but on the whole is a very safe and sensible strategy.</p> <figure class="sidefigureleft"> <div><button class="btn" data-toggle="collapse" data-target="#minimax"> Aside: minimax step by step &raquo; </button></div> <blockquote class="aside"><p id="minimax" class="collapse" style="height: 0px;"> Broken down, minimax works as follows:<br /><br /> 1. At the start of your move, consider all the possible moves you can take.<br /> 2. For each of your possible moves, consider each of your opponent's response moves.<br /> 3. Now consider every possible response to the opponents' response, and keep thinking into the future until you reach the end of the game in your head when you can get a score.<br /> 4. Assume that just like you, the opponent can think through all the moves right through the end of the game, and so will never make a mistake — they will play optimally, and assume you will play optimally. Choose the current move to maximize your final score, assuming your opponent will always choose all future moves to minimize your score in response to whatever moves you make. </p></blockquote> </figure> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/1-TicTacToe.gif" alt="MinimaxTree" /> <figcaption>Example minimax game tree on the simple Tic-Tac-Toe game. Each successive layer in this tree represents possible game states some number of moves ahead of the current one and is traditionally called a <b>ply</b>. <a href="https://www.cs.cmu.edu/~adamchik/15-121/lectures/Game%20Trees/Game%20Trees.html"><b>(Source)</b></a></figcaption> </figure> <p>One more detail: though in theory Minimax search involves finding all paths to the end of the game, in practice this is impossible due to the combinatorial explosion of game states to keep track of with each additional move simulated into the future. That is, for every move there is some number of options (known as the <strong>branching factor</strong>) and so every additional ply (i.e. layer) in the tree roughly multiplies its size by that branching factor. If the branching factor is 10, looking one move ahead would require considering 10 game states, two moves ahead requires 10+10*10=110, three moves gets us to 1110, and so on. So, we use an <strong>evaluation function</strong> to evaluate positions that are not yet the end of the game. An evaluation function can be as simple as counting the number of pieces each player has, or much more complicated, making it possible to only search 3 or 6 moves ahead (or, a <strong>depth</strong> of 3 or 6 <strong>plies</strong>) rather than the 40-ish moves involved in the average Chess game.</p> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/2-evalfunc.png" alt="chess_eval func" /> <figcaption>Evaluation non-end game positions in a search tree. <a href="http://stanford.edu/~cpiech/cs221/apps/deepBlue.html"><b>(Source)</b></a></figcaption> </figure> <p>Shannon’s paper defined how one could use Minimax with an evaluation function, and set the course for future work on Chess AI by proposing two possible strategies to go about it: (A) doing brute-force Minimax tree search with an evaluation function, OR (B) using a ‘plausible move generator’ rather than just the rules of the game to look at a small subset of next moves at each ply during tree search. Future Chess playing programs would often be categorized as ‘type A’ or ‘type B’ based on which strategy they were mainly based on. Shannon specifically noted the first strategy was simple but not practical since the number of states grows exponentially with each additional ply and the overall number of possible positions (the <strong>state-space</strong>) is huge. For the second strategy, Shannon took inspiration from master Chess players, who selectively consider only promising moves. However, a good ‘plausible move generator’ is not at all trivial to write, so massive-scale search as in strategy (A) is still useful. As we shall see later, Deep Blue (the program that beat Chess world champion Gary Kasparov) was basically a combination of both approaches.</p> <figure> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/2-Shannon.jpg" alt="shannon_machine" /> <figcaption>Shannon demonstrating a machine he built to try programming rules for a limited version of Chess. <a href="https://videogamehistorian.wordpress.com/tag/computer-game/"><b>(Source)</b></a></figcaption> </figure> <p class="sidenoteleftlarge">1951</p> <p>But, the supercomputer that powered Deep Blue was still decades away from existence, at most a dream in the minds of the early computer pioneers of the early 50s (in fact, the famous <a href="https://en.wikipedia.org/wiki/Moore's_law">Moore’s law</a> would not be defined until a decade later). In fact, the first Chess program was run not with silicon or vacuum tubes, nor any sort of digital computer, but rather by the gooey fleshy neurons of the human brain — that of Alan Turing, to be precise. Turing, a mathematician and pioneering AI thinker, spent years working on a Chess algorithm he completed in 1951 and called TurboChamp<sup id="fnref:short_chess_history"><a href="#fn:short_chess_history" class="footnote">3</a></sup>.</p> <p>TurboChamp was not as extensive as Shannon’s proposed systems, and very basic by future standards, but still, it could play Chess. In 1952, Turing manually executed the algorithm in what must have been an excruciatingly slow game, which the program ultimately lost. Still, Turing also published his thoughts on Chess AI and posited that <em>in principle</em> a program that could learn from experience and play at the level of humans ought to be completely possible<sup id="fnref:TuringChess"><a href="#fn:TuringChess" class="footnote">4</a></sup>. Just a few years later, the first ever computer Chess program would be executed…</p> <h1 id="theory-becomes-code">Theory Becomes Code</h1> <p class="sidenoteleftlarge">1956</p> <p>All of this happened before AI — the field of Artificial Intelligence — was really born. This can be said to have happened at the 1956 Dartmouth Conference, a sort of month-long brainstorming session among future AI luminaries where the term “Artificial Intelligence” was coined (or so claims <a href="https://en.wikipedia.org/wiki/Dartmouth_Conferences">Wikipedia</a>). Besides the University mathematicians and researchers in attendance (Claude Shannon among them), there were also two engineers from IBM: Nathaniel Rochester and Arthur Samuel. Nathaniel Rochester headed a small group that began a long tradition of people at IBM achieving breakthroughs in AI, with Arthur Samuel being the first.</p> <p>Samuel had been thinking about Machine Learning (algorithms that enable computers to solve problems through learning rather than through hand-coded human solutions) since 1949, and was particularly focused on developing an AI that could learn to play the game of Checkers. Checkers, which has 10<sup>20</sup> possible board positions, is simpler than Chess (10<sup>47</sup>) or Go (10<sup>250</sup> ! We’ll get to that one in a bit…) but still complicated enough that it is not easy to master. With the slow and unwieldy computers of the time, Checkers was a good first target. Working with the resources he had at IBM, and particularly their first commercial computer (the IBM 701), Samuel developed a program that could play the game of Checkers well, the first such game-playing AI to run on a computer. He summarized his accomplishments in the seminal <a href="https://www.cs.virginia.edu/~evans/greatworks/samuel1959.pdf">“Some studies in machine learning using the game of Checkers”</a><sup id="fnref:SamulCheckers"><a href="#fn:SamulCheckers" class="footnote">5</a></sup>:</p> <blockquote> <p>“Two machine-learning procedures have been investigated in some detail using the game of checkers. Enough work has been done to verify the fact that a computer can be programmed so that it will learn to play a better game of checkers than can be played by the person who wrote the program. Furthermore, it can learn to do this in a remarkably short period of time (8 or 10 hours of machine-playing time) when given only the rules of the game, a sense of direction, and a redundant and incomplete list of parameters which are thought to have something to do with the game, but whose correct signs and relative weights are unknown and unspecified. The principles of machine learning verified by these experiments are, of course, applicable to many other situations.”</p> </blockquote> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/3-minimax.png" alt="samuel_minimax" /> <figcaption>A great visual from Samuel's paper explaining Minimax. <a href="https://www.cs.virginia.edu/~evans/greatworks/samuel1959.pdf"><b>(Source)</b></a></figcaption> </figure> <p>Fundamentally, the program was based on Minimax, but had an additional hugely important aspect: <strong>learning</strong>. The program became better over time without human intervention, through two novel methods: (A) “rote-learning”,— meaning it could store the values of certain positions as previously evaluated with Minimax, and so not need to spend computational resources considering moves further down those branches — and (B) “learning-by-generalization”, i.e. modifying the multipliers for different parameters (thus modifying the evaluation function) based on previous games played by the program. The multipliers were changed so as to lower the difference between the calculated goodness of a given board position (according to the evaluation function) and its actual goodness (found through playing out the game to completion).</p> <p>Rote learning was a fairly obvious way to make the program more efficient and capable over time, and it worked well. But it was learning-by-generalization that was particularly groundbreaking, as it showed that a program could learn to ‘intuitively’ know how good a game position was without tons of simulation of future moves. Not only that, but the program was made to learn by playing past versions of itself, which would one day be a key component of AlphaGo! But let’s not get ahead of ourselves…</p> <figure class="sidefigureright"> <div><button class="btn" data-toggle="collapse" data-target="#samuel_learning"> Aside: Quote from Samuel about learning procedures &raquo; </button></div> <blockquote class="aside"><p id="samuel_learning" class="collapse" style="height: 0px;"> Here is a fun excerpt from the paper about the advantages of each learning strategy:<br /> "Some interesting comparisons can be made between the playing style developed by the learning-by-generalization program and that developed by the earlier rote-learning procedure. The program with rote learning soon learned to imitate master play during the opening moves. It was always quite poor during the middle game, but it easily learned how to avoid most of the obvious traps during end-game play and could usually drive on toward a win when left with a piece advantage. The program with the generalization procedure has never learned to play in a conventional manner and its openings are apt to be weak. On the other hand, it soon learned to play a good middle game, and with a piece advantage it usually polishes off its opponent in short order.<br /> Apparently, rote learning is of the greatest help, either under conditions when the result of any specific action are long delayed, or in those situations where highly specialized techniques are required. Contrasting with this, the generalization procedure is most helpful in situations in which the available permutations of conditions are large in number and when the consequences of any specific action are not long delayed." </p></blockquote> </figure> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/4-rote_learning.png" /> <figcaption>Another great figure from Samuel's paper showing the use of rote learning. <a href="https://www.cs.virginia.edu/~evans/greatworks/samuel1959.pdf"><b>(Source)</b></a></figcaption> </figure> <p>Not only were these ideas groundbreaking, but they also worked in practice: The program could play a respectable game of Checkers, which was no small feat given the limited computing power at the time. And so, as this <a href="https://webdocs.cs.ualberta.ca/~chinook/project/legacy.html">great retrospective</a> details, when Samuel’s program was first demonstrated in the very early days of AI (in the same year as the Dartmouth Conference, in fact) it made a strong impression:</p> <blockquote> <p>“It didn’t take long before Samuel had a program that played a respectable game of checkers, capable of easily defeating novice players. It was first publicly demonstrated on television on February 24, 1956. Thomas Watson, President of IBM, arranged for the program to be exhibited to shareholders. He predicted that it would result in a fifteen-point rise in the price of IBM stock. It did.”</p> </blockquote> <figure> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/5-samuel.jpg" alt="samuel_checkers" /> <figcaption>"On February 24, 1956, Arthur Samuel’s Checkers program, which was developed for play on the IBM 701, was demonstrated to the public on television." <a href="http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/ibm700series/impacts/"><b>(Source)</b></a></figcaption> </figure> <p class="sidenoteleftlarge">1957</p> <p>But, this is Checkers — what of the game everyone really cared about, Chess? Well, once again it was employees at IBM who pioneered the first Chess AI, and as with Samuel those employees were supervised by Nathaniel Rochester. The work was chiefly led by Alex Bernstein, a mathematician and experienced Chess player. Like Samuel, he decided to explore the problem out of personal interest and ultimately led the implementation of a fully functional Chess playing AI on the IBM 701, which was completed in 1957 <sup id="fnref:BernsteinChess"><a href="#fn:BernsteinChess" class="footnote">6</a></sup>. The program also used Minimax, but lacked any learning capability and was constrained to look at most 4 moves ahead, and consider only 7 options per move. Until the 70s, most Chess-playing programs would be similarly constrained, plus perhaps some extra logic to choose which moves to simulate, rather like the type (B) strategy outlined by Shannon in 1949. Bernstein’s program had some <b>heuristics</b> (cheap to compute ‘rules of thumb’) to select the best 7 moves to simulate, which in itself was a new contribution. Still, these limitations meant the program only achieved very basic Chess play.</p> <figure class="sidefigureleft"> <img class="postimage" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/6-chessai.png" alt="bernstein_chess" /> <figcaption>Illustration of the limited Minimax search the Bernstein program did, from the article on it. <a href="http://archive.computerhistory.org/projects/chess/related_materials/text/2-2.Computer_V_ChessPlayer.Bernstein_Roberts.Scientific_American.June-1958/Computer_V_ChessPlayer.Bernstein_Roberts.Scientific_American.June-1958.062303059.sm.pdf"><b>(Source)</b></a></figcaption> </figure> <figure class="sidefigureright"> <img class="postimageactual" src="/writing/images/2016-4-15-a-brief-history-of-game-ai/6A-Bernstein.jpg" alt="bernstein_chessB" /> <figcaption>Bernstein playing his program. <a href="https://chessprogramming.wikispaces.com/The+Bernstein+Chess+Program"><b>(Source)</b></a></figcaption> </figure> <figure> <figure> <iframe src="https://www.youtube.com/embed/iT_Un3xo1qE" frameborder="0" allowfullscreen=""></iframe> </figure> <figcaption>Bernstein's Chess program starring in its very own TV report!</figcaption> </figure> <p>Still, it was the first fully functional Chess-playing program and demonstrated that even extremely limited Minimax search with a simple evaluation function and no learning can yield passable novice Chess play. And this was in 1957! So much more is yet to come in <a href="/writing/a-brief-history-of-game-ai-part-2">the coming decades</a>…</p> <h2 id="acknowledgements">Acknowledgements</h2> <p>Big thanks to <a href="http://cs.stanford.edu/people/abisee/">Abi See</a>, <a href="https://www.linkedin.com/in/pavel-komarov-a2834048">Pavel Komarov</a>, and Stefeno Fenu for helping to edit this.</p> <h2 id="references">References</h2> <div class="footnotes"> <ol> <li id="fn:AlphaGo"> <p>David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel &amp; Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484-503. <a href="#fnref:AlphaGo" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:ShannonChess"> <p>Shannon, C. E. (1988). <a href="http://vision.unipv.it/IA1/aa2009-2010/ProgrammingaComputerforPlayingChess.pdf">Programming a computer for playing chess</a> (pp. 2-13). Springer New York. <a href="#fnref:ShannonChess" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:short_chess_history"> <p>A Jenery (2008). A Short History of Computer Chess. chess.com <a href="https://www.chess.com/article/view/a-short-history-of-computer-chess">link</a> <a href="#fnref:short_chess_history" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:TuringChess"> <p>Alan Turing (1953). Chess. part of the collection Digital Computers Applied to Games. in Bertram Vivian Bowden (editor), Faster Than Thought, a symposium on digital computing machines, reprinted 1988 in Computer Chess Compendium <a href="#fnref:TuringChess" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:SamulCheckers"> <p>Samuel, A. L. (1959). <a href="](https://www.cs.virginia.edu/~evans/greatworks/samuel1959.pdf)">Some studies in machine learning using the game of checkers</a>. IBM Journal of research and development, 3(3), 210-229. <a href="#fnref:SamulCheckers" class="reversefootnote">&#8617;</a></p> </li> <li id="fn:BernsteinChess"> <p>Bernstein, A., &amp; Roberts, M. D. V. (1958). <a href="http://archive.computerhistory.org/projects/chess/related_materials/text/2-2.Computer_V_ChessPlayer.Bernstein_Roberts.Scientific_American.June-1958/Computer_V_ChessPlayer.Bernstein_Roberts.Scientific_American.June-1958.062303059.sm.pdf">Computer v chess-player</a>. Scientific American, 198(6), 96-105. <a href="#fnref:BernsteinChess" class="reversefootnote">&#8617;</a></p> </li> </ol> </div> <p><a href="/writing/a-brief-history-of-game-ai/">A 'Brief' History of Game AI Up To AlphaGo, Part 1</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on April 18, 2016.</p> <![CDATA[Fun Visualizations of the 2015 StackOverflow Developer Survey]]> /writing/fun-visualizations-of-stackoverflow 2016-02-12T19:19:34-07:00 2016-02-12T19:19:34-07:00 www.andreykurenkov.com contact@andreykurenkov.com <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/14-occ_lang_rB.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/14-occ_lang_rB.png" alt="Langs hmat" /> </a> <figcaption>Where this post is heading. The entirety of the code used for this post <b><a href="https://github.com/andreykurenkov/stackoverflow-R-dataviz">can be found here</a></b>. </figcaption> </figure> <h1 id="what">What?</h1> <p>What will follow are a selection of what I think are cool visualizations of data from the <a href="http://stackoverflow.com/research/developer-survey-2015">StackOverflow 2015 Developer Survey</a>. The survey asked software developers a bunch of questions concerning their background and work, and got an impressive 26086 responses. Due to there being multiple ‘Select up to’ questions, the data contains 222 observations for every response - lots of data to play with and visualize! The visualizations were made with R, as part of a project for the <a href="https://www.udacity.com/courses/ud651">Data Analysis with R</a> class.</p> <h1 id="fun-with-developers-personalprofessional-metrics">Fun With Developers’ Personal+Professional Metrics</h1> <p>Having loaded and cleaned up the data, the natural place to start is looking at who responded to the survey:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/1-gender_age.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/1-gender_age.png" alt="gender_age" /> </a> </figure> <p>Yep, big surprise, most respondents are male and younger than 35. Though, interestingly, females in the 20-24 range outnumber 25-29 range, which is not true for males. Also predictably, the graph shows that people with more experience tend to be older, though there are developers past their 40s with less than 10 years of experience which shows that it is possible to become a developer even at a later age. There is also the strong suggestion some respondents are jokesters, since there exist some people who are not yet 20 but have 11+ years of experience.</p> <p>The fact that there is a strong correlation between age and experience (0.6424202, in fact) is hardly surprising, but it begs a question: does the increased experience translate to better compensation? With this huge spreadsheet in hand, that question is easy to answer:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/2-comp_exp.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/2-comp_exp.png" alt="comp_exp" /> </a> </figure> <p>Yep, the world is (at least to some degree and in this context) just. It is also very unequal, though those who like me who have made it to the US can at least feel good about rather cushy average compensations. Besides being in the US, I also happen to be the sort of person to work on these sorts of blog posts in my free time, so a follow up question that makes sense is whether hours spent programming as a hobby also correlates positively with compensation. Perhaps lots of hobbyist programming indicates a more skilled programmer who might get better positions, or perhaps increased compensation translates to less time for hobby programming. Once again, let’s have a look at the data:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/3-comp_hours.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/3-comp_hours.png" alt="comp_hours" /> </a> </figure> <p>It seems the second is correct - more hours programming for fun does not equate to better compensation, though we can of course be optimistic and hope those with bigger compensations and less hobby programming relish their jobs and are in no need for extra projects. In fact, we can go ahead and look at the job satisfaction data and see:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/3-comp_sat.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/3-comp_sat.png" alt="comp_sat" /> </a> </figure> <p>So, seems having a higher paid job does somewhat correlate with having a more satisfying job - all those making big cash are not secretly miserable, good to know. Now then, all these line graphs are fun, but not very efficient, so using R magic we can make a graph that communicates quite a bit more about the variables we have been exploring thus far (click for larger version):</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/3-comp_satB.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/3-comp_satB.png" alt="comp_satB" /> </a> <figcaption>R makes it incredibely easy to do stuff like this! (not that you necessarily should make visualizations this packed with variables...)</figcaption> </figure> <p>Fancy graphs, right? The fun is just getting started, trust me, the best is yet ahead…</p> <h1 id="exploring-countries-industries-occupations">Exploring Countries, Industries, Occupations</h1> <p>Now, the ‘in US’/’not in US’ dichotomy is rather simplistic (and perhaps stereotypically American), so it’d be interesting to see what others countries programmers fare well in:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/4-country_comp.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/4-country_comp.png" alt="comp_country" /> </a> <figcaption>One could argue the use of (logarithmic) coloring is not needed here, but I for one like to know the relative number of instances from which averages are computed.</figcaption> </figure> <p>The results, after filtering to require at least twenty observations per country, are as America and Europe heavy as this posts’ likely readership - no surprise there. It is somewhat surprising that high tech countries such as South Korea and Germany are ranked relatively low in the list, though. But, as we said before money is not everything, so let’s go ahead and see how the countries fare by job satisfaction:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/4-country_sat.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/4-country_sat.png" alt="sat_country" /> </a> </figure> <p>Quite a different result, though seemingly one with less disparity than compensation averages. The US is merely 20th! It looks like Denmark and Israel really hit the sweet spot in terms of balance on both measures. Alas, I live in the US, and that will not change anytime soon. And as a software developer in the US, the next question for me to ask is what industries and occupations correlate with high compensations here:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/5-occ_comp.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/5-occ_comp.png" alt="occ_comp" /> </a> </figure> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/6-ind_comp.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/6-ind_comp.png" alt="ind_comp" /> </a> </figure> <p>Again, a few unexpected details here - data scientists are apparently not making as much as I thought they might - but most of this seems plausible. Seems it is a good thing I have no particular desire to work in gaming and non-profit industries or as a mobile dev - but then these are just averages, and money is much less important than job fulfillment anyway. So, once again, let’s have a look at that too:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/5-occ_sat.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/5-occ_sat.png" alt="occ_sat" /> </a> </figure> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/6-ind_sat.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/6-ind_sat.png" alt="ind_sat" /> </a> </figure> <p>Well, good news for the gaming developer - you may statistically be likely to earn less than most occupations, but also top the charts in terms of job satisfaction. As we saw before, it is possible to display the count and variance of such data fairly nicely with a combination of boxplot and translucent points, so may as well do just that:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/7-occ_comp_exp.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/7-occ_comp_exp.png" alt="occ_comp_exp" /> </a> </figure> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/8-ind_comp_exp.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/8-ind_comp_exp.png" alt="ind_comp_exp" /> </a> </figure> <p>The data is discrete, but R makes it very easy to introduce jitter to be able to roughtly have a sense of the distributions underling the means we’ve been looking at. There is quite a lot going on these graphs, but on the whole I think they do a good job of combining and conveying all involved information. Let’s skip the whole tradition of doing the same for satisfaction, as I am sure you are getting tired of all this satisfaction and money talk and boring bar graphs, and change things up.</p> <h1 id="turning-up-the-heat">Turning Up the Heat</h1> <p>Having played this much with occupations and industries, I naturally started to wonder which industries have the most of each occupation. The only sane way I could envision tackling the question is with heatmaps, and so after several fervent hours of massaging the data and plotting I got this:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/9-occ_ind.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/9-occ_ind.png" alt="occ_ind" /> </a> </figure> <p>I know, not fun - the full-stack web dev occupation and Software Products industry dwarf all the others in terms of counts and so make most of the heatmap a bland monotonous red. One option is to try to log scale the coloring, but I think that’d be sort of cheating in this case (my mind is built to assume heatmaps are linear), so instead we can produce a heatmap where the coloring is scaled within one row:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/10-occ_ind_r.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/10-occ_ind_r.png" alt="ind_comp_exp_r" /> </a> <figcaption>Notice that this in effect shows the breakdown of industries for each occupation separate from the rest</figcaption> </figure> <p>Not bad! The intersection of ‘Student’ and ‘I’m a student’ is bright yellow, so this is at least somewhat correct. There are lots of little neat nuggets of info here, such as the amount of mobile devs and designers in consulting or the prevalence of embedded developers in telecommunications. Admittedly, I produced this heatmap only after another one I knew I would want to make from the start - a heatmap showing the technologies developers in different occupations use:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/11-occ_lang.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/11-occ_lang.png" alt="occ_lang" /> </a> </figure> <p>Again with the huge majority of web devs! Well, no matter, we can do the same thing as before and view the data with per-row scaling:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/12-occ_lang_r.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/12-occ_lang_r.png" alt="occ_lang_r" /> </a> <figcaption>Now then we see the technologies developers in each occupation</figcaption> </figure> <p>Turns out executives rely mostly on JavaScript and the Cloud, who knew. As these technologies are sorted by overall usage and in fact only the top 20 are shown, I was surprised the niche ones such as LAMP and Redis still have more users than favorites of mine like Django or R. Overall though, this heatmap is nice for simply confirming common sense expectations about technology usage for each occupation. Still, I was not too big a fan of this look, so I also generated the same heatmap with a different:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/13-occ_langB.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/13-occ_langB.png" alt="occ_langB" /> </a> </figure> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/14-occ_lang_rB.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/14-occ_lang_rB.png" alt="occ_lang_rB" /> </a> <figcaption>Oh yeahhh you know those colors represent percentages per row now. I think this may be my favorite result from this project, to be honest.</figcaption> </figure> <h1 id="lastly---how-do-you-become-a-dev-anyway">Lastly - How Do You Become A Dev Anyway</h1> <p>Let’s finish up by looking at a topic this post is implicitly really about - how people learn. The survey helpfully asked what sort of training (education, online classes, even mentorships) each responder had. And so, again, we can make use of the information-dense wonder of a heatmap:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/15-occ_train.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/15-occ_train.png" alt="occ_train" /> </a> </figure> <p>So, most devs get at most a Bachelors degree, learn on the job, or have no formal training at all. However, online classes are also popular, no doubt in addition to these other forms of training. There are also exceptions in the fields of Machine Learning devs and data scientists, which I have some interest in. Just to take a break from all these heatmaps, we can take a closer look at these two with good ol’ bar graphs:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/16-occ_trainB.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/16-occ_trainB.png" alt="occ_trainB" /> </a> </figure> <p>Based on the low average salary of data scientists and the large numbers of them with non-formal or on the job training, I think it is likely the term has just gotten to be used very loosely to encompass many data-oriented positions. Machine learning developers, on the other hand, are still in a more exclusive club of more educated types who no doubt meddle with more math and algorithms. Regardless, the size of both of these occupations (that were not even really around a decade back) is as good a sign of our times as it gets.</p> <p>It would be tiresome to do this sort of visualization for all the jobs types, but why not go ahead and cap off this spade of visuals with something truly over the top:</p> <figure> <a href="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/17-occ_trainC.png"> <img class="postimage" src="/writing/images/2016-2-12-fun-visualizations-of-stackoverflow/17-occ_trainC.png" alt="occ_trainC" /> </a> <figcaption>Just because you can do something does not mean you should.</figcaption> </figure> <h1 id="why-do-all-this">Why Do All This?</h1> <p>For fun! But also, for learning. One of the great things about being a person who writes software for fun (and a living) is the lack of barriers to self-teaching new skills - all one needs is a laptop, an internet connection, and large quantities of time and perseverance to go ahead and learn something new. Though in theory just about anything can be self taught with a textbook and time, learning this way can be very difficult if the knowledge is not applied and tested along the way. Computer Science allows for the best of both worlds here in that there are massive amounts of tutorials and information for free online, and using that information requires no breadboards, no tools, no chemicals - just a computer and a mediocre internet connection.</p> <p>What barriers do exist have been further diminished by the rise of online classes in the vein of those on Udacity and Coursera, which are particularly well suited to teaching software skills - there are now dozens of such classes about algorithms, mobile dev, web dev, machine learning, and so on. Having taken and completed several of these classes I think they can very effectively help with learning through briskly paced lessons and high quality assignments. So, when I found out about the <a href="https://www.udacity.com/course/data-analyst-nanodegree--nd002">Udacity Data Analyst Nanodegree</a> I was naturally intrigued given my existing fondness for machine learning and lack of experience with data visualization - intrigued enough to sign up for the program (after confirming my company was willing to foot the bill here).</p> <p>‘Data Analysis with R’ is the fourth class in the program so far, and was my introduction to the language (a sort of mix between Python and SQL, known for being good for data analysis and visualization). The final project for the class called for using the R skills so far for self-led exploration of some data, with the option to choose among a few supplied datasets or find your own - precisely the sort of freeform class assignment I can get excited about. It took me some time to settle on a fun dataset to work with, but when I remembered about the StackOverflow survey data results I became sure there had to be many more opportunities for neat data analysis there - and I’d say I was right!</p> <p><a href="/writing/fun-visualizations-of-stackoverflow/">Fun Visualizations of the 2015 StackOverflow Developer Survey</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on February 12, 2016.</p> <![CDATA[What Brief Hacker News Fame Looks Like]]> /writing/what-brief-hacker-news-fame-looks-like 2016-01-22T19:19:34-07:00 2016-01-22T19:19:34-07:00 www.andreykurenkov.com contact@andreykurenkov.com <p>The most traffic this site has ever received in one hour is precisely 1,814 pageviews, at 11:00 AM on January 15th, 2016, when <a href="http://www.andreykurenkov.com/writing/a-brief-history-of-neural-nets-and-deep-learning/">A ‘Brief’ History of Neural Nets and Deep Learning’</a> hit the front page on Hacker News (a site very popular among programmers, researchers and basically all manner of technical people who can slack off at work by browsing the internet). As an extremely humble person (and fan of wry ironic wording), I debated whether to write about this for a while due to it perhaps seeming self congratulatory. But there are some fun graphs to share and it would be nice to write something short for a change, so why not.</p> <p>Let’s start back when I first released this ‘Brief’ history writeup.</p> <figure> <a href="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/1-traffic.png"> <img class="postimage" src="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/1-traffic.png" alt="Traffic 1" /> </a> </figure> <p>I finally finished an acceptable draft of the history on December 24th, after roughly a month and a half of working on it. Being quite ready to be done with this prolonged writing project and go on vacation, I went ahead and posted it, and shared the result on Facebook. As you can see, 7 people actually took a look at it on Christmas, surprisingly. But, as was typical then, the highest session counts per day did not crack the double digits. Fast forward two weeks, and the picture looks quite different:</p> <figure> <a href="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/2-traffic.png"><img class="postimage" src="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/2-traffic.png" alt="Traffic 2" /></a> </figure> <p>The reason for the views uptick? Reddit. On January 5th the good people over at /r/machinelearning graced my little history with 13 upvotes and a good deal of traffic, which for me was very exciting. I’ve never shared the writing outside of Facebook and LinkedIn before, but since this one was by far my largest writing effort I figured it was worth seeing if others would like it. And worth it it was - the sessions were now in the triple digits. Not only that, but I was getting a bit of positive feedback, which was very exciting - I mostly worked on the history for my own gratification, so seeing others read and compliment it felt great. And, well, can you guess what happened next?</p> <figure> <a href="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/3-traffic.png"><img class="postimage" src="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/3-traffic.png" alt="Traffic 3" /></a> </figure> <p>Yep, on Friday, January the 15th, my history made it to the front page of Hacker News after I resubmitted it (it having failed to make much of a splash on my first submission). Knowing the sort of traffic the site receives, I was admittedly a bit stunned - I even took a screenshot to commemorate the event:</p> <figure> <img class="postimagesmall" src="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/4-famous.png" alt="Traffic 4" /> <figcaption>This still sits in my Pictures folder, and is named famous.png</figcaption> </figure> <p>The traffic spike was huge. For a better perspective, have a look at the hourly breakdown of the sessions:</p> <figure> <a href="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/5-traffic.png"><img class="postimage" src="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/5-traffic.png" alt="Traffic 5" /></a> <a href="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/6-traffic.png"><img class="postimage" src="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/6-traffic.png" alt="Traffic 6" /></a> </figure> <p>Suffice it to say, I was happy. Predictabely the pageviews diminished very quickly, but not as quickly as I expected - the views to this day exceed my expectations. To be fair, I had to go back and clean up tons of typos and things, but i’d say that’s a fair deal. And for once, I actually had some reason to dig into the Google Analytics of the traffic. Especially neat is the behavior flow graph:</p> <figure> <a href="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/7-traffic.png"><img class="postimage" src="/writing/images/2016-1-21-what-brief-hacker-news-fame-looks-like/7-traffic.png" alt="Traffic 7" /></a> </figure> <p>Understandably, most people start at Part 1 and drop off there - only 12% make it to part 2. Still, hundreds if not more than a thousand people have made it all the way through the whole thing - exciting! It is a fairly long read at ~15000 words, so it’s good to see a fair deal of people enjoy these sorts of mini-treatises.</p> <p>Who knows if I’ll ever write something so ambitious, or so widely read, again. Either way, I expect to keep on writing, and I’ll always have famous.png.</p> <p><a href="/writing/what-brief-hacker-news-fame-looks-like/">What Brief Hacker News Fame Looks Like</a> was originally published by Andrey Kurenkov at <a href="">Andrey Kurenkov's Web World</a> on January 22, 2016.</p>