The Dangers of Bling Data Visualizations

Here is a great article by Michael O’Connell and Eric Novik (originally published on information management) that discusses how poor visualization techniques lead to loss of data, insight, and usability.

Given the volume of information that’s pouring into the enterprise from so many disparate sources, knowledge workers need to be able to visualize information in order to analyze it and extrapolate insights effectively.

When business users can visualize information, they’re able to process it more effectively and make faster and better decisions, according to Aberdeen research. Business users are constantly seeking the best ways to understand the data behind the data. If a monthly sales figure is low, what are the reasons the sales team is underperforming? The most effective way to help business users understand the data behind the data is by making it visual for them. Data visualization has recently made its way into the mainstream by the way of infographics, business intelligence dashboards and, in some cases, statistical graphics. However, today data visualization comes in many forms and more often than not there might be too much “bling” incorporated into these data representations, leaving an audience with nothing more than a pretty picture. In this article, we contrast some good and bad examples of visualizations via examination of the salient features of the graphical displays. We will also demonstrate how poorly designed visualizations can lead to erroneous decisions. Infographics: The Good, The Bad and The Ugly Infographics are typically designed to grab your attention and tell a story that would otherwise have to be described in narrative form. They can be catchy, aesthetically pleasing, thought-provoking and sometimes puzzling in their method of presentation. The puzzle may be part of their appeal, but sometimes the data is so obscure that the message gets lost on most consumers. For example, consider the stream graph of movie box office receipts over time shown in Figure 1.

Bad Box Office Visualization
Figure 1: From New York Times

(Click here to view a larger image of Figure 1.)

This stream graph is catchy, but the intent of the graphic is not immediately obvious. The reader has to spend considerable time to figure out that the peaks of the curves correspond to weekly sales of each movie and the entire area under the curve (in addition to the color scheme) corresponds to the total receipts. Why certain movies are above the zero line versus below is not clear. Also, it is difficult to make comparisons among different movies. In short, it is a visually appealing graphic, but not a great decision-making tool. In contrast, the infographic in Figure 2 shows the number of gun killings in the U.S. in 2010. Each thin curve corresponds to one person’s life. The victims’ actual life years are drawn in orange and expected years after the murder are shown in gray.

Data Visualization
Figure 2: U.S. Gun Killings in 2010. From guns.periscopic.com

(Click here to view a larger image of Figure 2.)

On the website, the graphic reveals an animation of each life line. The viewer can also make some comparisons. For example, rifle killings, above the line, represent 4 percent of the total. The amount and color intensity captures the relative percentages. The overall shape of the resulting feather conveys the fragility of life and resonates with the numerical summary of the graphic – with 9,595 people killed and an estimated 414,376 years lost. There may be a better way to represent these data points if the goal is to make comparisons, but the design and messaging here are powerful enough for this infographic to stand on its own. BI Dashboards: The What In the less-dramatic world of corporate information design, BI dashboards typically display some kind of business metrics, such as sales over time or by geography, product performance, customer demographics and so on. These dashboards are designed to help management understand the current state of the business from a high level, where other data discovery applications enable drill down, filtering and underlying data exploration. These graphics often use bar charts, line plots, scatter plots and tables to summarize data. The designers may be well-intentioned, but sometimes off-the-shelf graphic tools make the following data visualization atrocities all too common.

Figure 3: Sales of Widgets Over Time in 3D

(Click here to view a larger image of Figure 3.)

The graph in Figure 3 fails both as an infographic, as it is neither catchy nor informative, and as decision-making tool, as it is impossible to make any kind of comparisons either along time dimension or the widget dimension. Visualization expert Edward Tufte appropriately called these types of charts “chartjunk” in his classic 1983 book, “The Visual Display of Quantitative Information.” In contrast, the dashboard in Figure 4 shows sales pipeline by quarter and individual contributor, making it much easier to derive information that is critical to business decisions moving forward.

Figure 4: Typical Sales Dashboard

(Click here to view a larger image of Figure 4.)

The summary bar charts show the breakdown among four different pipeline categories, and detailed tables on the left show the individual contributors. This dashboard allows for selection of both detailed and summary records and provides insight into the corresponding records on the respective charts. Questions like, “Who is the highest contributor in each quarter?” and “What is the total amount of sales closed across quarters?” are easy to answer. Statistical Graphics: The Why Well-designed BI dashboards can answer the “what” types of questions. Answering “why” questions usually requires more advanced data exploration and visualization using statistical graphic techniques. Statistical graphics bring the most meaningful structure in data to the forefront and facilitate comparisons among groups defined by combinations of the variables in the analysis. Data may enter into the graphic in its raw form, after some transformation or as output of a modeling procedure. They often include scatter plots, line charts and box plots or some combination thereof. Just because the graphic is “statistical,” does not make it informative. For example, consider the scatter plot in Figure 5.

Figure 5: Relationship Between the Size of Employees Emails and Salaries

(Click here to view a larger image of Figure 5.)

These hypothetical data points were collected by counting the average number of words in employees’ emails and correlated with their annual salaries and displayed in a linear model. There is a clear linear relationship between the two variables and the naïve view would hold that this model explains approximately 37 percent of the total variance. Beta is estimated at 0.22, which means that each additional written word, on average, adds $2,200 to employees’ paychecks! Given this information, one would conclude that the sensible thing to do would be to start writing huge emails to everyone in the company and waiting for the raise to appear in the next paycheck. A more sensible approach, however, may be to look for other variables that may explain this relationship. Considering the same data, the graph in Figure 6 looks at senior managers and staff separately.

Figure 6: Email Size versus Salary by Employee Type

(Click here to view a larger image of Figure 6.)

Now it is easy to see the correlation between email size and salary. In this particular company, senior managers tend to make more money and write longer emails. But the positive relationship between email size and salary completely disappears when this new information is taken into account. As an example of a more informative statistical graphic, Figure 7 shows a relationship between miles per gallon and engine displacement for four-, six- and eight-cylinder vehicles. A linear regression is fit through each trellis area, the color is assigned to automatic versus manual transmission (one is manual), and the size of the point is mapped to the weight of the car.

Figure 7: Miles Per Gallon versus Engine Displacement for 4, 6 and 8 Cylinder Vehicles

(Click here to view a larger image of Figure 7.)

Heavier cars tend to have larger displacements and also lower mpg, although the latter comparison is less obvious in this presentation. Automatic transmissions are more likely to appear in six- and eight-cylinder cars and less likely to appear in four-cylinder cars. Lower displacement tends to be associated with higher mpg, but only for four-cylinder cars. The relationship is much weaker in eight-cylinder cars and almost nonexistent in six-cylinder cars. One useful conclusion may be that if you are buying a six-cylinder car, you might as well get an engine with a larger displacement, as you are unlikely to get any improvement in mpg. This is a tentative conclusion, however, as we don’t have enough information to make statements about all six-cylinder cars. To do that we would have to understand how these units were sampled (selected) and may need to collect a larger sample as well. But the conclusion is valid for the cars in this data set. More Than Just Bling This article looked at three different types of visualization, each with potentially different objectives and different demands that are placed on the consumer of information. When designing graphical displays, knowledge workers need to be careful about the intent of the graphic and cognizant of the message they are trying to convey. If the goal is to grab a person’s attention, a thoughtful infographic may be the right approach. For communicating the current state of the business, traditional BI dashboards may suffice; although these require data discovery features such as drill down, filtering, marking and brushing. Finally, if knowledge workers really want to extract and understand the information in their data, thoughtful statistical graphics provide and enable the deepest insights. Regardless of which visualization tool fits a particular business need the best, the visualization should be more than just bling. The visualization needs to support the business user or everyday consumers’ enhanced understanding of the data and aid their ability to draw sound conclusions and avoid erroneous decisions. It is also worth noting that most insightful data visualizations or visual analytics are also the result in the first place of data curation and analysis behind the scenes, based on the data need being addressed. Data visualizations that clearly answer the what and why questions and that elegantly represent that information for line of business users to interact with require skilled data scientists to ensure the correct data mining and modeling techniques have been applied to validate the data in the first place. In other words, data visualizations, especially bling visualizations, that are not supported by advanced analytic techniques, can be downright dangerous for business. And that is not a pretty picture.