Here is a great article by Michael O’Connell and Eric Novik (originally published on information management) that discusses how poor visualization techniques lead to loss of data, insight, and usability.
This stream graph is catchy, but the intent of the graphic is not immediately obvious. The reader has to spend considerable time to figure out that the peaks of the curves correspond to weekly sales of each movie and the entire area under the curve (in addition to the color scheme) corresponds to the total receipts. Why certain movies are above the zero line versus below is not clear. Also, it is difficult to make comparisons among different movies. In short, it is a visually appealing graphic, but not a great decision-making tool. In contrast, the infographic in Figure 2 shows the number of gun killings in the U.S. in 2010. Each thin curve corresponds to one person’s life. The victims’ actual life years are drawn in orange and expected years after the murder are shown in gray.
On the website, the graphic reveals an animation of each life line. The viewer can also make some comparisons. For example, rifle killings, above the line, represent 4 percent of the total. The amount and color intensity captures the relative percentages. The overall shape of the resulting feather conveys the fragility of life and resonates with the numerical summary of the graphic – with 9,595 people killed and an estimated 414,376 years lost. There may be a better way to represent these data points if the goal is to make comparisons, but the design and messaging here are powerful enough for this infographic to stand on its own. BI Dashboards: The What In the less-dramatic world of corporate information design, BI dashboards typically display some kind of business metrics, such as sales over time or by geography, product performance, customer demographics and so on. These dashboards are designed to help management understand the current state of the business from a high level, where other data discovery applications enable drill down, filtering and underlying data exploration. These graphics often use bar charts, line plots, scatter plots and tables to summarize data. The designers may be well-intentioned, but sometimes off-the-shelf graphic tools make the following data visualization atrocities all too common.
The graph in Figure 3 fails both as an infographic, as it is neither catchy nor informative, and as decision-making tool, as it is impossible to make any kind of comparisons either along time dimension or the widget dimension. Visualization expert Edward Tufte appropriately called these types of charts “chartjunk” in his classic 1983 book, “The Visual Display of Quantitative Information.” In contrast, the dashboard in Figure 4 shows sales pipeline by quarter and individual contributor, making it much easier to derive information that is critical to business decisions moving forward.
The summary bar charts show the breakdown among four different pipeline categories, and detailed tables on the left show the individual contributors. This dashboard allows for selection of both detailed and summary records and provides insight into the corresponding records on the respective charts. Questions like, “Who is the highest contributor in each quarter?” and “What is the total amount of sales closed across quarters?” are easy to answer. Statistical Graphics: The Why Well-designed BI dashboards can answer the “what” types of questions. Answering “why” questions usually requires more advanced data exploration and visualization using statistical graphic techniques. Statistical graphics bring the most meaningful structure in data to the forefront and facilitate comparisons among groups defined by combinations of the variables in the analysis. Data may enter into the graphic in its raw form, after some transformation or as output of a modeling procedure. They often include scatter plots, line charts and box plots or some combination thereof. Just because the graphic is “statistical,” does not make it informative. For example, consider the scatter plot in Figure 5.
These hypothetical data points were collected by counting the average number of words in employees’ emails and correlated with their annual salaries and displayed in a linear model. There is a clear linear relationship between the two variables and the naïve view would hold that this model explains approximately 37 percent of the total variance. Beta is estimated at 0.22, which means that each additional written word, on average, adds $2,200 to employees’ paychecks! Given this information, one would conclude that the sensible thing to do would be to start writing huge emails to everyone in the company and waiting for the raise to appear in the next paycheck. A more sensible approach, however, may be to look for other variables that may explain this relationship. Considering the same data, the graph in Figure 6 looks at senior managers and staff separately.
Now it is easy to see the correlation between email size and salary. In this particular company, senior managers tend to make more money and write longer emails. But the positive relationship between email size and salary completely disappears when this new information is taken into account. As an example of a more informative statistical graphic, Figure 7 shows a relationship between miles per gallon and engine displacement for four-, six- and eight-cylinder vehicles. A linear regression is fit through each trellis area, the color is assigned to automatic versus manual transmission (one is manual), and the size of the point is mapped to the weight of the car.
Heavier cars tend to have larger displacements and also lower mpg, although the latter comparison is less obvious in this presentation. Automatic transmissions are more likely to appear in six- and eight-cylinder cars and less likely to appear in four-cylinder cars. Lower displacement tends to be associated with higher mpg, but only for four-cylinder cars. The relationship is much weaker in eight-cylinder cars and almost nonexistent in six-cylinder cars. One useful conclusion may be that if you are buying a six-cylinder car, you might as well get an engine with a larger displacement, as you are unlikely to get any improvement in mpg. This is a tentative conclusion, however, as we don’t have enough information to make statements about all six-cylinder cars. To do that we would have to understand how these units were sampled (selected) and may need to collect a larger sample as well. But the conclusion is valid for the cars in this data set. More Than Just Bling This article looked at three different types of visualization, each with potentially different objectives and different demands that are placed on the consumer of information. When designing graphical displays, knowledge workers need to be careful about the intent of the graphic and cognizant of the message they are trying to convey. If the goal is to grab a person’s attention, a thoughtful infographic may be the right approach. For communicating the current state of the business, traditional BI dashboards may suffice; although these require data discovery features such as drill down, filtering, marking and brushing. Finally, if knowledge workers really want to extract and understand the information in their data, thoughtful statistical graphics provide and enable the deepest insights. Regardless of which visualization tool fits a particular business need the best, the visualization should be more than just bling. The visualization needs to support the business user or everyday consumers’ enhanced understanding of the data and aid their ability to draw sound conclusions and avoid erroneous decisions. It is also worth noting that most insightful data visualizations or visual analytics are also the result in the first place of data curation and analysis behind the scenes, based on the data need being addressed. Data visualizations that clearly answer the what and why questions and that elegantly represent that information for line of business users to interact with require skilled data scientists to ensure the correct data mining and modeling techniques have been applied to validate the data in the first place. In other words, data visualizations, especially bling visualizations, that are not supported by advanced analytic techniques, can be downright dangerous for business. And that is not a pretty picture.