Big Data Visualization: 3 Errors To Avoid

Here is a great article by Noah Iliinsky (originally published on information week)

Avoid common visualization mistakes. Here’s advice on how to clarify goals and get better results.

There has been a lot of talk about data visualization lately — almost as much as there has been about big data. We’re told that visualization is the best way (or the only way) to understand data, and that if we’re not visualizing it, we’re missing out.

Visualization is a great way to gain and share insight, but many big data teams are doing it the wrong way. How can it be done wrong? It turns out there are several ways to undermine data visualizations. Let’s look at a few of the most common mistakes.

Error 1: Displaying all the data
Despite what you were told in school, most people don’t care about seeing your work. They don’t care about how much data you can process every day or how big your Hadoop cluster is. Customers and internal users want specific, relevant answers, and the sooner they can get those answers, the better. The closer you can come to giving them exactly what they want, the less effort they have to expend looking for answers. Any irrelevant data on the page makes finding the relevant information more difficult; irrelevant data (no matter how valid) is noise.

Noise is particularly prevalent in dashboards, where the guiding philosophy is often “Show the status of everything.” But most performance measures are normal (and boring), not noteworthy. Showing all the normal conditions gives the abnormal measures a lot of places to hide.

A better dashboard approach is to show only what’s interesting or important. Prioritize what matters, what’s unexpected, and what’s actionable, and deemphasize everything else. Deep dives into data can be important, but dashboards aren’t the place for that. Broad overviews of non-actionable data are better handled as reports.

Error 2: Displaying the wrong data
This error is as dangerous as the first one. Showing subsets of information is fine, as long as the data relationships are relevant. If you care about sales, for example, you may also care about sales per region or sales over time. Consider how the data will be used to make decisions.

Showing several closely related graphs can be a nice compromise between showing too much in one graph and not showing enough overall. A few clean, clear graphs are usually better than a single complicated data visualization.

Error 3: Representing data poorly
Even when you’re graphing the right data, you can still get it wrong. Most exotic graph types are seldom seen, because they don’t work very well. The vast majority of visualization needs are well addressed with bar and line graphs, scatter plots, and (if done well) pie graphs.

Think about the key relationships among data fields, and consider putting those fields on the axes. Group by category, and then order the data by time or magnitude or importance. (Alphabetization is most useful when nothing else matters.) Use color for category, not magnitude; you can use brightness or saturation to illustrate magnitude. Use labels and other marks selectively to call attention without cluttering.

Good design: Think and plan first
The best way to avoid all these errors is to focus on your goals first. Before considering how your visualizations should look, think about the following questions, in this order.

 

  1. What actions to you need to enable (or what do we care about)?
  2. What decisions do you need to inform (and what are we going to do about it)?
  3. What questions do you need to ask?
  4. What data do you need to see?
  5. What is the best structure for revealing the important relationships in the data?
  6. What data do you need to highlight?

 

As you answer these questions, you can begin to design and implement the right visualizations using the right data. It’s likely that you’ll have to make changes. This is a good thing. Iterate, test, try different approaches, test some more, and iterate again. A deliberate, user-oriented design approach will yield effective, efficient, and useful data visualizations.