Your Legend is a Chart

We all know that color is a very powerful part of the data communicator’s toolkit. As Maureen Stone once wrote, “[c]olor used well can enhance and clarify a presentation. Color used poorly will obscure, muddle and confuse.” There is a wealth of information and guidance about choosing colors and color palettes (I really like Lisa Charlotte Muth’s writings in this area) but the one color palette that I see more often used incorrectly than others is the diverging color palette.

I’m not sure why the diverging palette generates so many challenges, but it seems to be used incorrectly most often when it is replacing a correctly-used sequential color palette, which are palettes that utilize the same hue and range in lightness, for example, ranging from light blue for smaller numbers to dark blue for larger numbers. When used correctly, the midpoint of a diverging color palette should be labeled to let the reader know from what value the ranges are diverging, especially when that midpoint is not obvious. This advice harkens to the title of this post–your legend is a chart and should be carefully labeled and annotated.

Let’s look at three definitions of a diverging color palette:

Lisa Charlotte Muth, Which color scale to use when visualizing data (blog post)

Diverging (also called bipolar or double-ended) color scales are the same as sequential color scales – but instead of just going from low to high, they have a bright middle value and then go darker to both ends of the scale in different hues. Diverging color scales are often used to visualize negative and positive values, election results, or Likert scales (“strongly agree, agree, neutral, disagree, strongly disagree”).

Kenneth Field, Cartography (book)

Diverging schemes are appropriately used where data has different extremes that might be best represented with different hues. A diverging scheme emphasizes the midpoint critical class with a light colour and then the two extremes with two diverging hues.

Jonathan Schwabish, Better Data Visualizations (book)

Diverging. In this scheme, the color progress outward, growing darker from a central midpoint. A diverging color palette will share sequential schemes on two different colors and diverge from a shared, lighter color, for example, deviations from zero or a central number.

What is common across each of these definitions is the use of the word “midpoint” or “middle,” though none are specific as to what that midpoint should be. In data sets that have both positive and negative values, zero becomes an obvious midpoint; it’s less clear, however, what to choose when the data are all positive or all negative. Whatever choice the creator makes, what I think is too often missing when using diverging color palettes is clearly defining the midpoint. Omitting that definition can confuse and muddle why a diverging palette is being used in the first place.

How Data Visualization Tools Handle Diverging Color Palettes

Part of the challenge may be due to how some data visualization tools apply diverging color palettes to maps and other visualizations. Tools like Tableau, Excel, Flourish, and Datawrapper (see this review of these and other data visualization tools) make it easy to toggle between different color palettes but in doing so may have made it so easy that users are not aware of what value is being set as the midpoint of the distribution.

In cases where a dataset has both positive and negative numbers, most tools will center the distribution around zero no matter the underlying shape of the distribution. In a dataset that consists of only positive or negative numbers, both Excel and Tableau calculate the “middle” of the data in a diverging palette as the average of the minimum and maximum values. Users can manually change this default by calculating and then entering the new parameters though neither tool permits calculations directly in the visualization menu options.

Some people have told me they think this “midpoint-as-average” approach is the correct default option because (a) it halves the range and (b) is intuitive for most creators and users. It is worth noting that the calculation halves the range on the data values, not the number of observations—more on this below. I understand this argument, but it still feels like that choice of midpoint is odd. Why not tie it to the actual distribution, for example, the mean or median? Even if the creator can manually change the midpoint value, I worry that many (most?) will not do so and will not recognize what value the tool is selecting as the midpoint. I’m no tool builder, but perhaps requiring the user to actively select the midpoint would be an improvement?

Screenshot of Excel showing a map of US state-level populations — Here’s a screenshot of the mapping tool in Excel. Notice the nonsensical way Excel applies the default diverging color palette from light blue to orange to dark blue instead of centering the lightest color in the middle.

In uniform distributions, this midpoint decision isn’t that important. Take state-level unemployment rates as an example. In November 2021, the unemployment rate in the United States ranged from 1.8% in Nebraska to 6.9% in California. The overall median value for the country is 4.15% and the Tableau/Excel calculated midpoint is 4.35%, which are pretty close and only change the number of states above/below that midpoint by one state.

But take a more skewed distribution, like state-level population estimates. The US Census Bureau estimates that in 2020, state populations range from 582,328 people in Wyoming to 39,368,078 people in California. The midpoint of that range (the average of these two states) is 19,975,203; again, keep in mind that the midpoint is calculated on the data values not the number of observations that divide the data into two halves. Only three states have populations larger than that calculated midpoint (Florida, Texas, and California). By comparison, the median state population is 4,561,285, which by definition cuts the sample into two halves of 25 states each; the average state population is 6,575,426, which cuts the sample into two groups of 33 states below that value and 17 states above that value.

Screenshot of Tableau dashboard showing a map of US state-level populations

Bar chart showing the US state-level population with a diverging color palette

Of the two browser-based tools I use most frequently, Flourish and Datawrapper, Flourish follows the same min-max averaging technique to assign the midpoint of the diverging palette. I’m 99% sure Datawrapper selects the median value, though I can’t quite tell because the number is cut off at “456,128” in the webpage box (see the red square in the second image).

Screenshot of Flourish showing a map of US state-level populations

Screenshot of Datawrapper showing a map of US state-level populations

In any programming language, of course, the creator needs to calculate and select the midpoint of the range. Personally, I prefer to build data maps in the R programming language, especially with this awesome guide from my Urban Institute colleagues.

Examples of Diverging Color Palettes

I’ve collected several examples of diverging color palettes being used effectively and ineffectively. Examining how others approach these design and communication challenges is a great way to help my thinking evolve and potentially adjust my approach. This is one reason why I keep updating my data visualization catalog.

Let’s look at a few examples.

Exploring local income deprivation, Office for National Statistics (ONS) (May 2021)

Overall, I like this scrollytelling piece from ONS, but the diverging palette in the map needs some additional clarification. They define income deprivation as the share of “people in an area who are out of work or on low earnings.” You can follow the story for various neighborhoods around London or explore the interactive map on your own.

Check out the histogram at the top-right of this map for the Kensington and Chelsea area, which doubles as the legend, a technique I think people should use more and something my Urban Institute colleague Aaron Williams and I wrote about last year. The diverging palette is centered in the middle of the 10 bars, but what is the center of the distribution? Is it the overall average or median value for the city? Is it the average or median for that specific area? It’s not that the diverging palette is incorrect here, but more annotation would help us understand what the two palettes are diverging from.

Screenshot of ONS map on local income deprivation for the Kensington and Chelsea area of London

COVID-19 Federal financial assistance to US farms varied by State, USDA Economic Research Service (ERS) (May 2021)

This choropleth map from ERS shows each state’s share of total loans and payments from the Paycheck Protection Program (PPP) and the Coronavirus Food Assistance Program (CFAP) in 2020 as a share of each states’ value of production in 2019. The values range from 1.91% in Delaware to 12.20% in Massachusetts. I would interpret the red-to-yellow-to-green color palette as a diverging palette (or what I believe Tableau would call a “Temperature Diverging” palette) even though a sequential palette is probably a better choice here. A diverging palette could work here if the midpoint made sense, but that point is not labeled or marked, so I’m not sure what to make of it.

USDA map of loan payments in US states in 2020

Share of households with broadband use, Eurostat (April 2020)

Here, Eurostat plots the share of households with access to broadband internet across Europe. The color palette is seemingly a diverging palette with shades of pink for the first two categories (<80% and 80-85%) and then shades of blue for the three categories of higher shares. (Apparently, Eurostat likes to these kinds of maps—see here, for example—regularly using similar diverging palettes.) Why should these shares diverge and why are the categories unbalanced? One of the things people probably like about the Tableau/Excel/Flourish midpoint calculation is that it divides the data distribution in half, but this palette does not do that. Perhaps average broadband access for all of Europe is around 85% but, again, that should be labeled in the legend.

Eurostat map of the share of households with broadband use

COVID-19 Data Dashboard, Washington State Department of Health (May 2021; December 2021)

I took the following screenshot of the COVID-19 Data Dashboard from the Washington State Department of Health last May. I would argue this is a classic case of using a diverging color palette incorrectly because the data are not diverging from any specific point (at least one that is obvious). The midpoint/range appears to be in the “10 to <25” category in the light gray, but there is no label to make that clear or to make that the case.

Screenshot of legend from a Washington State dashboard of the number of new COVID cases per 100,000 by county

I went back in December to take another look and noticed that both the dashboard and the color palette had changed. Now, the color palette is a single, sequential color palette of blue shades. I didn’t find an obvious explanation for why they chose the breaks they did, but it’s probably something like quintiles or percentages (yet another reason why I like the technique of turning map legends into histograms).

Map of Washington State COVID cases by county taken in December 2021

How Does Your State Compare?, US Census Bureau (December 2021)

The US Census Bureau has a collection of graphics and dashboards on their site. This map shows the changes in state-level population between July 2020 and July 2021. Notice the diverging color palette with the light gray in the middle and ranging from dark purple to dark orange. The middle of the color palette is here presented as a range (see the Axios example below for more on this) is in the positive range of growth (+0.01% — +0.50%), which seems odd to me. Perhaps that range is not statistically meaningful or is essentially zero, but that’s not noted anywhere. Notice also that the colors are not balanced—the darkest purple and darkest orange represent different (mirror) ranges in the map. I would consider adjusting this in two ways: First, use a well-defined midpoint (or define a mid-range, for example, from -0.25% to +0.25%), and second, use equal increments on either side of that midpoint/midrange.

Map of the change in state-level population from the US Census Bureau

Cold, heat, fires, hurricanes and tornadoes: The year in weather disasters, Washington Post (December 2021)

This screenshot is from a lovely piece by Zach Levitt and Bonnie Berkowitz at the Washington Post. There are a couple of lovely animated maps at the top, but this static image shows extreme temperatures across the United States last February. Notice the unbalanced diverging palette at the top that ranges from -25°F to +20°F, with the Average placed and annotated near the center. That label is important to know because we likely set that point in our minds at 0°F.

Map of changes in temperature around the United States from the Washington Post

Small Businesses Have Surged in Black Communities. Was it the Stimulus?, New York Times (May 2021)

Quoctrung Bui at The New York Times always puts out great content and this article is no exception. The article contains a nice example of how to do small multiples well, but let’s focus on this choropleth map of the change in new businesses in New York City between 2019 and 2020. Notice the unbalanced diverging color palette here with negative values presented in the purple areas, areas with no change in the light gray, and positive changes in the shades of green. While some people might like the Tableau/Excel/Flourish midpoint model, a diverging palette does not necessarily have to be balanced around the middle. Thus, my critique of the palette in the previous graph is not universal, but specific to that visualization. (One other note about this NYT graph: Because the bottom and top of each color is not explicitly labeled, I did need to take a little bit of time to think about what each range was showing.)

Map of the change in new businesses in New York between 2019 and 2020 from the New York Times

Most Outdoor-Friendly States In 2021, InMyArea.com (June 2021)

I’m sure you’re thinking this is sort of a weird reference to include in this list but check out this table that uses a diverging color palette! They split the 50 states in the US into the top and bottom 25 states for outdoor-friendliest. The palette diverges from that light blue/pink color at the midpoint (25^th state) and diverges to dark blue and dark pink on either side. Really nice touch.

Table showing the top and bottom 25 outdoor friendliest states

COVID-19 cases hit lowest point in U.S. since pandemic began, Axios (June 2021)

We observed that the definitions above note a “midpoint” or “middle”—but that doesn’t necessarily have to be a single, specific point. In this map from Sam Baker and Andrew Witherspoon at Axios, they set the middle of the diverging color palette to be a range between -10% and +10%, in the light gray color. Again, labeling those points or that range are important to let people know from where the color palettes (here, green and brown) are diverging.

Map of COVID cases on June 1, 2021 from Axios

Jewish East London, 1900, George Arkell, shared by Laura Vaughan (June 2021) (original at Cornell University Library)

In a little discussion on Twitter earlier last year, Laura Vaughan, a Professor of Urban Form and Society at the Bartlett, University College London, shared this map and legend of East London in 1900. I’m sure this isn’t the first time the diverging color palette was used in a map but notice the switch from the light red color to the light blue color right in the middle. One might argue that exact point should be labeled but in this case, it seems clear that the diverging palette is differentiating between values above and below 50%. Again, as long as it’s clear how the diverging palette is dividing the distribution, I’m arguing that it’s generally okay to use.

Legend corresponding to the map of East London in 1900

Wrapping Up

My point here is not that Tableau, Excel, Flourish, Datawrapper, and other tools have incorrect default choices, but that you should be mindful of the effects those default options have on your visualization. As data visualization creators, we should always be careful with the tools we use. If you want to use a diverging color palette in your chart or map, check a few things before you publish:

Make sure there is a color legend

Be intentional about the midpoint/midrange and how it’s being selected

Make sure your legend is correctly labeled, including the midpoint—especially if that midpoint isn’t zero

Make sure the legend scale is accurate

Big thanks to Alan Wilson and Maxene Graze for reading a draft of this post.

February 7, 2022

2 Comments

4559

2 comments


Matej

February 9, 2022 at 5:52 am

Using palettes that combine several colors and “utilize the same hue and range in lightness” results in visualizations that are potentially inaccessible to colorblind people. If a person cannot distinguish between two hues, lightness is the only way to tell them apart – which cannot be done if the lightness values for both hues are the same. Take a look at the examples when transformed into greyscale (Mac can do it natively, check out Accessibility settings). From the above example, only COVID-19 Data Dashboard by Washington State Department of Health is easily comprehensible in greyscale.

Kavita

May 16, 2022 at 7:04 am

Absolutely agree. All the points you’ve mentioned about the diverging palette and how it should be implemented make sense and will help make the visual more readable. Thanks for the article.

Your Legend is a Chart

How Data Visualization Tools Handle Diverging Color Palettes

Examples of Diverging Color Palettes

Exploring local income deprivation, Office for National Statistics (ONS) (May 2021)

COVID-19 Federal financial assistance to US farms varied by State, USDA Economic Research Service (ERS) (May 2021)

Share of households with broadband use, Eurostat (April 2020)

COVID-19 Data Dashboard, Washington State Department of Health (May 2021; December 2021)

How Does Your State Compare?, US Census Bureau (December 2021)

Cold, heat, fires, hurricanes and tornadoes: The year in weather disasters, Washington Post (December 2021)

Small Businesses Have Surged in Black Communities. Was it the Stimulus?, New York Times (May 2021)

Most Outdoor-Friendly States In 2021, InMyArea.com (June 2021)

COVID-19 cases hit lowest point in U.S. since pandemic began, Axios (June 2021)

Jewish East London, 1900, George Arkell, shared by Laura Vaughan (June 2021) (original at Cornell University Library)

Wrapping Up

2 comments

Leave a Reply Cancel reply

Search

Listen

Categories

Shop

Your Legend is a Chart

How Data Visualization Tools Handle Diverging Color Palettes

Examples of Diverging Color Palettes

Exploring local income deprivation, Office for National Statistics (ONS) (May 2021)

COVID-19 Federal financial assistance to US farms varied by State, USDA Economic Research Service (ERS) (May 2021)

Share of households with broadband use, Eurostat (April 2020)

COVID-19 Data Dashboard, Washington State Department of Health (May 2021; December 2021)

How Does Your State Compare?, US Census Bureau (December 2021)

Cold, heat, fires, hurricanes and tornadoes: The year in weather disasters, Washington Post (December 2021)

Small Businesses Have Surged in Black Communities. Was it the Stimulus?, New York Times (May 2021)

Most Outdoor-Friendly States In 2021, InMyArea.com (June 2021)

COVID-19 cases hit lowest point in U.S. since pandemic began, Axios (June 2021)

Jewish East London, 1900, George Arkell, shared by Laura Vaughan (June 2021) (original at Cornell University Library)

Wrapping Up

Share this:

2 comments

Leave a Reply Cancel reply

Search

Listen

Categories

Shop