Hi everyone! I'm Scott Berinato, senior editor at Harvard Business Review and author of Good Charts, a new book about dataviz for managers. As a senior editor at HBR, I write for the magazine and website, but also spend a lot of time editing big ideas from academics and others. When I'm not doing that, I'm probably in my garden getting my hands dirty.
While most of you on here are at the cutting edge of dataviz trends, there are countless managers who recognize the need to improve themselves beyond the typical 'click-and-dataviz-and-paste-into-powerpoint' approach that has dominated the business world for two decades. Good Charts is meant to help these folks get better at using dataviz, largely through a design-focused approach. Let's talk about what I'm hearing from executives and non-specialists about what they're excited about, what they're intimidated by, and why the first question is still, always, 'can I use a pie chart?'
I'll be back at 12pm ET to answer all of your questions. In the meantime, Ask Me Anything!
Okay I'm ready to get started. Some great questions already. Let's get going.
Hey, thanks for coming out for this. I'll check back here the rest of the day and for the rest of the week to answer more questions. Keep up the good and positive work r/DataIsBeautiful!
Can you remember a time where the use of statistics or a dataviz dramatically changed your opinion on something? A scenario where the stats disproved many of your preconceived notions about a topic?
I can think of several. Fallen.io changed my perception of the death toll of WWII (or at least made it more visceral). One of my favorites is a David McCandless viz that showed the amount of water different household tasks consumed, when taking into consideration the entire supply chain of getting that thing to your house. It showed that boiling an egg uses A LOT of water. Another I'm researching now is what Tesla's doing with data. They are seeing how people ACTUALLY drive, not how we think we do, and it's eye opening.
Hi Scott! Thanks for stopping by on /r/DataIsBeautiful.
One big question I'm wondering about is: How do we -- as dataviz practitioners -- convince manager-types to adopt better dataviz practices? Our managers are too busy worrying about their own business, and the "old way" of doing things (e.g., 3D pie chart all the things!) works well enough for them. What can we do when our managers insist that we create that 3D pie chart, even when we know it's a terrible idea?
Three things: 1) once execs/managers see better viz, they tend to want better viz. So if someone in the org takes the initiative to do good viz, they'll get noticed for it and be asked to continue to produce that sort of thing. I heard this story over and over researching the book (“I didn’t mean to become the dataviz person here, it just sort of happened…”) Also, many more better examples exist out in the world now, so when they look at their FitBit charts they start to get used to simplicity and better uses of color etc.
What is your favorite example of how data vizualization has improved journalism or storytelling? What are some great examples of data visualization that newbies should look at?
The set of charts by Tynan deBold in the Wall Street Journal about vaccination were just insanely good. I’m sure you’ve all seen them. He managed to show how vaccinations make disease disappear in an unbelievably simple, clear, effective way.
One other is r2d3.us, a scrollytelling explainer of machine learning.
Do you have any "go to" online resources for choosing the right visualization for a particular data set?
This is a tough one. I wish there were one or two tools that were easy to use and did most things well, but I don’t think we’re there yet. I’ve been telling audiences at my talks that there are tons of great tools out there, but none do everything well and all do different things well. Some of my go-tos for prototyping: Plot.ly, Quadrigram, Datawrapper, Raw. I'm playing with more advanced tools like Exploratory.io but I'm not an R/ggplot expert so the learning curve is steep. Of course, Tableau.... but for me the most important tool for picking a visual approach is still pen and paper or whiteboard and marker.
Do you have a favorite post from r/dataisugly?
I try to stay positive with viz and I'm put off by the sort of public shaming that goes on alot now with viz crit. I look at those forums more for examples of egregious lying with viz because I think that's a growing problem. The chart about abortions and cancer screenings at Planned Parenthood that was presented in congress, for example, was just a horrible violation.
One of my favorite things that /r/DataIsBeautiful does is reworks of existing data visualizations. What advice do you have for approaching a data visualization when you want to constructively critique and rework it?
One of my pet peeves is the destructive criticism that goes on everyday on Twitter w/r/t dataviz. It’s a real problem and I’ve had many managers tell me they want to get better but are basically intimidated by this attitude. Martin Wattenberg and Fernanda Viegas wrote an excellent essay about this on Medium that’s worth checking out. We lay out a whole system for visual crit in the book based on classic design crit. Most important is that it’s not broadcast. It’s local, with a few people (even just yourself) and it’s constructive. First you react to what you see. Then you ask yourself what you feel is missing or confusing. You find things that you want to preserve or that you think are good and then you try a new approach. But when you want to post some bad chart and say 'ha ha this is awful' remember, everyone is trying. No place for shame in all this.
Do you have any suggestions for how to encourage a culture of data-driven decision making? I'm a Business Intelligence manager and I find that it is difficult to get some people to step outside of their day-to-day process and critically examine the big picture.
The most successful BI/data scientists are moving their bosses to change the business based on what they find. The thing I see most of them having in common is a clear, crisp early win. They find some trend in the data that suggests and easy to implement change which is made and makes a difference. I've seen this at Carlson Wagonlit Travel, Tesla and others.
My experience with business users is that many are extremely focused on the actual numbers, and therefore love their giant tables and can struggle to get value out of even a simple bar graph. How can we convince and train these people to see that there is more to their data than just the number?
Storytelling. But also, show them the visual first. Give them the tables when you're done. They may realize then, I don't need the number, (s)he already showed me what I need to know. I think some of their reliance on actual numbers is because they aren't confident in their visual literacy. The more they use charts, the better they'll get.
What's a clear cut tool/method to cut through stats to find out if they have been spun or are accurate?
Statistics courses and visual literacy. The more you know...
What is your favorite statistical anomaly?
a Trump nomination.
Do you see a place for Virtual reality and dataviz in the future? Maybe as a data discovery tool?
This is an interesting question! I think people will try to put us "in" the data but here's an interesting twist: While most of us have roughly equal capability to be visually literate with charts, our spatial skills as humans vary much more widely. I wonder if this difference in understanding spatial relationships will hold it back?
As someone who's just starting to learn to deploy effective data visualization, what are the tools/platforms that are most commonly used in industry and which ones would you recommend someone learn to use?
If you want to be a developer, get to know d3.js, highcharts and the like. If you want to just do better basic charts etc. use plot.ly, datawrapper, quadrigram and the like. if you want to do visual exploration try to learn Tableau, but most importantly, practice sketching.
Good data visualization already has a few very well known champions including Edward Tufte, Stephen Few, Nathan Yau, and Alberto Cairo. What do you feel is unique about your perspective?
(Or, to put it more bluntly, why should I buy your book when I already have a bookshelf full of books on data visualization?)
Great question. Basically, the target audience of my book is me: a manager who sense he needs to get better at visual communication but doesn't know how to start and is intimidated by it all. We purposefully avoided a rule-book approach. There's not a lot of "do this" and "don't do that" in our book. All of the people you mentioned have produced great stuff. Ours is just meant to be a bit more how-to, practical, and disarming.
What do you think of execs that say they are too busy to read more than a summary? I have heard a senior exec say as a rule they stop reading an email after the first paragraph. Simple, focused messages are important - but why is it so acceptable to ignore detailed evidence and require spoon feeding?
It depends on the setting. An exec who turns down important knowledge because (s)he's too busy is going to miss opportunities and risks. An exec who trusts those producing the data to bring the most important risks/opps to him/her in a simple way will be fine. The more critical the topic/data, the more detailed they should be willing to get. Also, I like the two-second version for presentations and the two-minute more complex version for one-on-one use.
What's your favourite example of how visuals communicate the meaning or purpose of some data?
I mentioned a couple already. But here's one we produced. We tried to visualize the global oligarchy by looking at how big companies share board members across the world. This was really useful analysis for our audience and changed my thinking. Turns out the oligarchy used to be a lot more tightly knit, which surprised me. link
What do you think about D3.js? Do you like the idea of data-viz on the web?
I think d3.js is amazing and getting better (4.0 is going to be great). I taught myself to use it just a little (I’m no programmer) and it’s really a smart approach. I also think as long as it’s relegated to being a development library its power to move viz in the business world will be limited. That’s why I like what’s happening with tools that make it easier to use d3 by creating easy-to-use interfaces on top of it.
How much type do you spend on the different stages when creating a new visualization? F.ex Data gathering, cleansing, design and implementation?
We don't tackle the data gathering and cleansing in the book, there are great books out there on that. We tried to focus mostly on the viz--the last mile as it were. In the book we say you can get better with about 15 minutes of talking, 20-30 minutes of sketching and 30-45 minutes of prototyping.
What comes first, the viz or the idea?
I've always been an egg guy, not a chicken guy. I like developing several ideas and visual approaches. So it's 1. familiarity with data 2. generative sketching 3. choose a visual approach and iterate...
Hi Scott -
Thanks for doing this AMA. Where do you see a lot of these new technologies, especially d3js, fitting into the business world? Obviously they have a much steeper learning curve, but provide amazing flexibility and expressiveness. However, businesses might not need the level of expressiveness that a journalistic outlet might need, so an ugly bar chart in PowerPoint often is deemed good enough by management. Where do you see the dividing line between tools/technologies, expressiveness and time? How much expressiveness and flexibility do businesses need in their visualization software? Would love to hear your thoughts. Thanks again!
I see that dividing line moving. I don't think the ugly bar in powerpoint is good enough anymore. And I think as management sees more and more better visuals from those motivated to create them, they will continue to demand better viz. The key is the powerful tools need to become just slightly easier to use, and they're getting there. Thanks for asking.
Can you code in R (ggplot2) and make baller figures?
Also, opinions on Excel.
I can't. This doesn't make me a lesser person. :) Excel is ubiquitous. To dismiss it would be like to dismiss trees because you don't like them. It's also a damn fine data tool for everyday data in business. And it's easy enough to dump that data into viz tools that I don't think it's worth having religious arguments over.
Have you seen texts/training materials about data visualization requirements from the 30s/40s/50s?
Yes. Willard Brinton's "Graphic Methods for Presenting Facts" from 1914 is tremendous. Mary Eleanor Spear's "Charting Statistics" from the 50s is likewise really nice. Number one lesson from both is most of what you think is modern and new isn't.
Hey Scott! Thank you for your time. I'm currently pursuing an MIS degree from a large university. I only recently switched from Broadcast Journalism after a great experience with a startup left me wanting more. My question is: what resources are out there to help someone new to this field really get excited about it? Obviously this sub is great, but are there also some more academic sources that could help me get a leg up? (For example, I didn't even know the term "dataviz" was a thing.)
There's this great book out now called Good Charts. ;)
Sites like this are great. The answer to your question depends on how academic you're talking. A google scholar research on visualization will show you the vast and growing body of research on the topic. If you're looking for learning, check out local University's extension courses (some have good basic viz courses) and there are tons of courses online as well.
How much work needs to be done scoping/designing data collection in the first place? Is that a major area of improvement in the business world?
This is crucial and businesses need to get better at it. When 'big data' came on, the money poured in and companies got good--real good--at collecting data, but as you know that's not the same as organizing and using it. That's where many businesses are now, dealing with a backlash, cleaning up data, and trying to find value in it.
How often are libraries like p5js or Processing used in the business world for generating data visualization? I seen it used in academics, but have heard too much about them in the business world.
They aren't terribly well deployed. Some D3 is making its way in but if you don't have developers you won't see this level of deployment. I'm eager for some of these tools get get easy-to-use interfaces for the rest of us....
Can you recommend a good training to attend to build up a good base of skills? I've been doing this for about 5 years, but I feel like my fundamentals have some big gaps and I want to close them. I'm also fairly heavily invested in a narrow array of tools - Tableau and Excel. I'd like to broaden my toolset and have found that formal training is the way I learn best.
A bootcamp, perhaps?
Also, is there a 'hackathon' for data viz that you'd recommend?
There are some great d3.js tutorials out there and that's a great place to start. Don't know of any hackathons currently but if they're out there, people here will probably know about them. I'd like to do some workshops myself with folks who want to improve their basic viz skills--take their Excel charts and make something better.
When someone asks you "what do you do for a living?", how do you respond?
"i drink and i know things. that's what i do."
Does the simplicity of data visualized have a downside? I know research has shown when presented with charts in a paper or report, people gloss over the writing. Does that get abused? Does it over-simplify the debate sometimes?
Simplicity can be a problem if it's hiding important detail, but for the most part I believe in the business world, simplicity benefits communicating to execs etc., and it's not practiced enough or well enough. I like the idea of the two-second chart that's as simple as possible and presented to a group, and a two-minute version that's given to each person, on paper or screen, that's more detailed and that they can spend more time with on their own. Pro-tip: Don't give them the two-minute version until you've presented the two-second one. If you hand them paper or things to look at on screen they'll flip through that rather than pay attention to you. As the visualizer you better be able to talk through the more complex version even if you're showing the simpler because people will challenge your simple view.
Does the simplicity of data visualized have a downside? I know research has shown when presented with charts in a paper or report, people gloss over the writing. Does that get abused? Does it over-simplify the debate sometimes?
One other thing on this: Many managers are afraid of simplicity because they either 1) aren't sure what they're trying to say so they put it all in there and hope the idea comes out somehow and 2) they feel that busy charts reflect how much data they have and how busy they are...simplicity looks too easy.
What is the process for getting published in HBR or is it just staff writers? Is there paid content? If so, what percentage is paid content?
no paid content. it's an arduous process, even if you work here.
I'm a finance student going into the senior year of my degree. I have a choice between taking an interesting seminar economics course on recent research done in the field or a quantitative econometrics class. Would you say the econometrics class is worth the significant extra effort it will require, when I don't intend to pursue large-scale data analysis in my career? Reaping the secondary benefits such as a better understanding of statistics is the most compelling reason to take the course, as I see things.
I really wish I did more data/statistics in school. It's harder work but it's so universally important, I'd probably say take that path.
Hi Scott. I am about to start a career in big data analytics and business intelligence consultancy and wondered what resources you thought would provide good initial insight into data visualisation? Where do you get inspiration and ideas for data visualisation techniques/projects/theories?
This forum and dozens more like it. #dataviz on Twitter. Get to know the friendly people in this world like Randy!
What advice can you give to undergraduate students coming from the hard sciences like applied mathematics who want to move into the field of data sciences when they graduate? For context; I'm in my final year of undergraduate study in applied & computational maths, I picked up a few electives that I'm loving right now in data analysis,visualization & exploration and I see my self moving into that field professionally but I'm not to sure how to most efficiently do so with my background.
I would say find a person who can help you get started, a mentor, and a project that will start to build your data science portfolio. I talked to many people who got jobs after showing off their sports data analyses etc. that they had done for fun.
William Cleveland or Edward Tufte?
Willard Brinton and Mary Eleanor Spear and Jacques Bertin.
I'm a manager for a small tech firm who relies on data pretty heavily to drive decisions. What are the necessary tools for me to utilize to create a clean, sleek, and informative array of viz?
See some answers throughout this AMA. It really depends on the nature of the data and the type of viz you need to/want to create.
I bring up excel because I am finishing my PhD this summer and plan on heading into data science. Most things I've read about excel are that it's everywhere, but that in the age of big data, it isn't a good tool for analysis.
Thanks for the reply! Cheers.
That's right. If you're doing data science, you'll be probably using some heavier duty tools. But Excel has its place, and I don't think that's changing.
Assume someone has no experience with quant modeling at all, or a basic understanding of statistics, R, Python, SQL. What would you recommend as a starting point for someone to learn full spectrum dataviz from modeling to vizdesign?
You are describing me in many ways. :) We have a passionate debate here about the best way to teach all this and how stats curricula ought to be changed to accommodate viz in a more profound way. The best starting point is an intro to dataviz class that is integrated into the statistics curriculum. Jacques Bertin's Semiologie Graphique, if you can find an English Translation, is also brilliant.
Wow, this really is a great AMA post! Lots of good stuff to think about.
Glad you like it!
I've been wondering about whether the annual performance review process is a waste of time. Nobody seems to like doing it, I'm not sure it helps companies get better, and it takes a huge amount of time. Are you aware of any strong data visualizations that speak to this issue?
HBR has published extensively on this topic. I don't know of a viz that convincingly shows this,but we have tons of text about it. :)
Data viz allows you to display particular insights very quickly.
In some cases, like the Best American Infographics Series, the visualization is created by an artist or designer. I love these infographics, but they generally serve to highlight trends that someone is already aware of. They may reveal more insight, but its not as speculative as throwing numbers into a blocky powerpoint graph.
From talking to people who deal with data and data viz, they often describe their jobs as playing around with numbers until a trend pops up and then investigating it :) or not really understanding the numbers or how to use their tools :(
Data viz can be artistic and/or utilitarian. In some cases, all that is needed is a simple graphic that illustrates a point so thought can happen more fluidly. How does data viz incorporate into your thought process or the thought processes of others around you?
What do you think about an AI that would troll through statistics to find high correlation for humans to then investigate?
Also how do I get a job at HBR? I think you guys are like who Stevo grows up to be in SLC punk. Rad!
I actually define four types of visual communication in the book and what you describe is one type: visual exploration. There are others, each useful in its own context. For example, a white board session when you try to redefine how you structure your org is a kind of viz work. In general, I think of viz as an abstraction layer, a way to cut through complexity.
I'm not part of HR, so I can't get you a job. Are you calling me a trendy-ass poseur?
Hi Scott, Do you think visualization "standards", such as IBCS will have much impact. IBCS in particular tends to suffer from people disliking it from an aesthetics point of view, but it does have its strong charm in the fact that every chart of a particular type have strict rules.
anytime standards intersect with aesthetics, it's going to be trouble. my hope is the standards encourage better default output than what is currently out there. Tableau is thinking about this all the time. trying to improve the initial output in terms of some basic visual grammar. strict rules won't solve grammar though. if they did we'd never get mark twain. there'll always be room for styling differences that help convey an idea better or take into account context. design is a necessary human endeavor.
Can we abolish the phrase "dataviz" and replace it with something that doesn't leave me expecting the person saying it to then tip their fedora and blow a sick vape cloud?
First off, I'll never forgive you for what you did to Han.
Second, in the book I actually say "the term 'data visualization' is a terrible one, it's like calling Moby Dick a 'word sequentialization' or Starry Night a 'pigment distribution'" It's too focused on mechanics and process and not focused on idea or outcome. So I'm with you.
Thanks for doing this AMA, Scott. I'm an academic in social sciences and would love to get away from bar graphs + error bars for presenting mean performance across different groups (say test scores between groups A B and C). Are there some specific ways of presenting means and variances that are more optimal, or does it mostly depend on the specific dataset?
I believe representing statistical ranges and uncertainty is one of the biggest, most important challenges in visualization. There are some good essays on this. I'll try to scare up links. Often one solution involves the use of gray ranges.
How can someone transition into outputting better dataviz?
How should my data be warehoused with the idea of outputting viz?
Context matters. Are you presenting to a lay audience? Experts? Believe it or not my go-to approach is to work with pro designers. I think we've gotten away from the idea that visualization is a team effort and it's often best when it is a team effort.
Hey Scott, ever heard of Taleb Nassim? I'd love to hear your opinion on him.
He's thought provoking. See my comment above about uncertainty as one of the major dataviz challenges. I think some of his thinking reflects this.
Will you be attending any of Edward Tufte's upcoming events?
I've been in the past. None planned soon.
Is the potential for bad/misleading data something you discuss with the non-specialists you work with?
For a while I worked at a large software company, straddling the line between the people implementing data collection and those using it to make decisions ("I deal with the goddamn customers execs so the engineers don't have to!"). One thing I repeatedly ran into was that the data quality was terrible. The instrumentation around measuring how people used the software was often capturing data in a way that would be non-obvious to someone trying to report on it. In some cases it was straight-up broken. Avoiding drawing faulty conclusions from the data required you to be intimately familiar with how the measurements were taken. Naturally this was completely unrealistic when dealing with non-specialists.
I do discuss this. (Good Office Space reference). My three big concerns are:
1) correlation != causation and on one side data folks can find lots of meaningless correlations and on the other, 'decision makers' can want to believe correlations they're presented are meaningful when they're not. ("It says people who are between 5'8" and 5'11" spend more so lets market to them!").
2) Representing uncertainty. Some model visualization is useful to represent probability and possible futures but they don't always represent uncertainty about those possible futures well (think of pandemic models for Ebola, for example). The lay audience needs to understand that uncertainty or they gain an unwarranted confidence in the model they're looking at.
3) misleading charts and manipulations
How do you recommend geographic data specialists make themselves competitive?
Geo is a hot area in viz. You are competitive by having that skill set. Combining it with good data/stats skills makes you quite marketable.
What are some of the most common errors you see when editing academic papers?
I don't edit many academic papers. I translate them for a broader audience!
Can you use a pie chart? ;-)
i love pie (charts)
- t3_4m1c86_comments.json 244 KB
This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.