Jan 30, 2018

Tyler Hughes, Craftsman | Kingsmen Analytics™

If your holiday season is anything like mine, eating is a statistically significant part of it. To earn my quarry, every year it falls to me to cook a portion of the holiday meal. It sounds nerve-wracking, but I don’t mind - cooking is my favorite and most enduring hobby. My grandmother gave me my first cookbook when I was five years old, and I’ve loved cooking ever sense. As I’ve grown, I’ve realized that many of the things that endear me to cooking also pull me towards data science. There’s a scientific discipline required to follow a recipe, but learning cooking techniques and flavor profiles allow a cook to experiment and push boundaries.

Hopefully that provides an adequate explanation as to why I was inspired while cooking Christmas dinner by the data presentations in a Better Homes and Gardens cookbook. Before you call me crazy, it’s important that I recount some of the foundational principles of information design to lend some credence to my inspiration. Who better to lay that foundation then the “father of information design”, Edward Tufte? Edward Tufte is a statistician, political, and computer scientist whose books on presenting information are unique, informative, and visually stunning. His approach to presenting data is multi-layered, but one of the key concepts is the maximization of “data-ink”. “Data-ink” is a term he coined to mean the density of information contained in an area of paper. When taking this notion of data-ink into account, one can understand why he rebuffs many modern visualization techniques. It’s his view that many of the graphs we’re all familiar with favor style over function and break what he sees as the cardinal rule of data visualization: “above all else, show the data”. In short, don’t get him started on pie charts. He often advocates the use of a data table over any other visual, but has an expert eye for truly effective visualizations, and champions them as paragons of information design. His book, The Visual Display of Quantitative Information, is flush with examples of these throughout history, and is a truly beautiful piece of work.

However, some find Tufte’s approaches to be old-fashioned, bordering on anachronistic. There’s a prevailing notion among many data scientists I’ve talked to that data needs to fluffed and trimmed and shoved into a pretty picture in order to be understood by the businesspeople they’re presenting to. This proliferates the idea that the data is secondary to the presentation, which only compounds the problem of data illiteracy among non-data scientists. We’re not gatekeepers; we’re locksmiths. We should be focused on unlocking secrets and advising the decision-makers in our organizations, not becoming another piece of red tape. I have definitive proof that anybody can interpret data when it’s presented well, and it comes in the form of the 1989 Better Homes and Gardens New Cookbook.

For anyone unfamiliar, Better Homes and Gardens releases new editions of this book every few years or so, and have been doing so since 1930. They must be doing something right if these books have been selling for over eighty years, and don’t let the fact that my copy was two dollars at Goodwill color your opinion. The pages are contained in a binder – it’s designed to be used often and hold up over the years. Good thing too, as there’s precious information contained in these pages.

Let’s start at the beginning. Actually, let’s start before the beginning; let’s start with the inside cover. The first time an unsuspecting reader opens the 1989 Better Homes and Gardens New Cookbook, they’re greeted with table after table of useful information before they see any recipes. Just look at the density of information contained within the inside cover of the binder:

While not data per se, the authors have nonetheless managed to encapsulate a great deal of reference information into a small amount of space, even if I question the choice of “heating pancake syrup” as a necessary microwave skill. In any case, the reader can learn the methods and cooking times for different meats with different thicknesses to different temperatures, the equivalencies in both US and metric for many different common measurements, and a variety of different microwaving tasks, all within the front cover of the cookbook. The information is organized and readily available for quick reference, you know, just in case you forget how many milliliters are in a tablespoon.

The authors continue the precedent set by the inside cover on every divider tab. Each different course (think meats, appetizers, desserts), is given its own divider tab in the book. This makes it easy to navigate to any section from anywhere in the book. Doing so brings the reader to one of the best table of contents I’ve ever seen:

Look at the volume of information contained in a single page. For every recipe in this section, you can see the page it’s found on, the number of servings it makes, and nutritional facts per serving, reported in percent USDA daily value for necessary vitamins and minerals. On a single page. In the table of contents. This data table has 15 features and 51 entries. I consulted the abacus and calculated that there’s 765 unique values for the reader to consider on this single divider tab from this one chapter alone.

These data tables, interspersed throughout the book before every chapter, detailing the recipes that follow and the nutritional details of each, inspired me to write this entry. This transforms the cookbook from merely a compendium of recipes into something much more powerful. A cook is given the freedom to plan a menu from the ground up, and have a single reference source for recipes, nutrition, and cooking techniques.

Let’s take a moment to consider the audience that Better Homes and Gardens is hoping to reach with this book. This book was not written for professional cooks – they would be more focused on either cooking a set menu or developing their own recipes. It’s certainly not written for data scientists. I couldn’t imagine a better way to ensure nobody’s buying your book (except for me!). No, this book is written for the home cook, someone who likely has a baseline understanding of data. The authors put the responsibility of understanding the data on the reader, rather than trying to gussy it up with a pretty picture. They are simultaneously respecting their reader’s intelligence and aiding their comprehension by presenting the data in as succinct and effective a manner as possible.

This is what we as data scientists seek to do when presenting data analyses to higher-ups and business partners. When performing our own analyses, our job is not to obfuscate the truth with complicated presentations – it’s to reveal the truth in a way that speaks to our audience. True data experts will present findings that are understandable by anyone who reads them carefully. Businesspeople are used to making decisions based on data, and the prevailing notion that they’re unable to understand data reflects a failure on our parts to adequately communicate. To be fair, that goes both ways, and businesspeople should also respect our work, and take time to understand our analyses. Addressing these issues will strengthen the modern company by coalescing business insights and technical know-how, a combo capable of solving any problem.

Thanks for taking the time to read this entry. I’m planning to start delving into machine learning and new things on the data science horizon. Look forward to learning more on those topics, and all other things analytics.


