Recently I encountered this very nice looking video on LinkedIn. It’s purpose is to show the scale of global warming since1900. It is based on data published by NASA, and it looked very impressive, until I looked closer:
- The bars are confusing. As they display a temperature deviation from some kind of norm, the edges of the bars should actually be at 0.
- The top right insert does not have any scales.
- No definition of temperature anomaly.
- No information on the data analysis steps.
- Identify the data sources used to create the graphs (done)
- Identify the analysis steps used to produce the graph.
- Create a jupyter-notebook to
- download the data (done),
- import data into python and simple post-processing (done)
- visualize the results (done)
- Make the notebook and the data publicly available (done).
- The plots and data shown in this post are static, the notebook is hyper-linked to the Bitbucket repository and periodically updated.
- The notebook tries to download the data from NASA, hence the graphs should update to reflect the latest updates.
- More detailed analysis will be conducted in a follow up post
The NASA GISTEMP Data
The data reference in the LinkedIn video refers to a specific dataset provided by NASA: GISTEMP.This data is available in various formats and context:
- Global data:
Temperature variations averaged over the entire globe as wll as the northern and southern hemisphere only. This data is already heavily post processed.
- Local data:
Local temperature variations available in various formats. We will restrict ourselves to the NETCDF file, specifically the newest file available gistemp1200_ERSSTv5.nc.gz. Note that this data is not raw observation data. It again is heavily post processed.
Global LOTI Data
The LOTI (Land-Ocean Temperature Index) uses a more complicated averaging scheme between the ocean and land data to account for the higher heat capacity of water. According to NASA this leads to a slight underestimation of cooling and warming trends due to the dampening effect of the oceans. The two input datasets for the LOTI calculations are:
- Surface Air Temperatures (SATs) measured by weather stations worldwide, and
- Sea Surface Temperatures (SST) measured by ships, buoys and more recently satellite data.
The units for the LOTI are in degC and denote the temperature deviation with respect to a referece temperature. Currently the reference temperature is the mean temperature from 1951 to 1980. NASA calls this quantity the temperature anomaly.
Nasa executes the averaging procedure every month and make the resulting data files available. For this section, three files were analyzed:
- The global mean fore each month since 1880: GLB.Ts+dSST.csv
- The northern hemisphere mean for each month since 1880: NH.Ts+dSST.csv
- The southern hemisphere mean for each month since 1880: SH.Ts+dSST.csv
CSV File Content
Simple Post Processing
At a first glance we see that the data is organized as a Excel like table: rows are years, columns are the monthly temperature averages, as well as specific mean values calculated over specific month intervals.
I already have a slight problem with this: This data file contains raw data (the monthly temperatures) as well as post processed data (the mean calculations), without denoting this in the meta information. Also as the raw data is already the result of a mean operation, the standard deviation, min and max values would have been nice. Also the D-N statement is confusing: I think it takes the December from the previous year, however for the first year this is not included in the data.
So to get our first simple plots, we use a pandas dataframe to store the csv data. The pandas dataframe allows for a wide range of statistical operations and matplotlib can create some nice graphs.
The data looks quite noisy, with some kind of higher frequency noise obscuring the general trend. We have several options:
- Average the data over several years (the simplest approach adopted in this post), or
- Analysis of the noise and removal of the identified noise (we will leave this for a follow up post)
Now we will look at the multiyear trend. As a baseline we will calculate the annual averages and store them in a pandas data frame. Then we will use the windowing routines to average over multi year spans.
As expected from an averaging operation, the windowed datasets do look smoother. However, except for a general visual trend, it is hard to see how extrapolate this data. A follow up post will delve into the details of the data analysis.
I have not played with the proper formatting (an item on the todo list).
Preliminary Observations and Hasty Conclusions
- The simple averaging plots look very noise. Suggesting that some averaging or noise removal schemes should be applied. However there seems to be a significant and consistent increase since the 1980s. Also the last decade looks really worrying.
- Simple windowed averaging of this data confirms this worrying trend. Also wee seem to see that this effect is more pronounced for the northern hemisphere than the southern hemisphere. This indicates more local patterns, which need to be investigated.
- The heatmpaps did not show as much as I had hoped. So either there is nothing, or it is obscured by the colour scheme. Both is possible.
Things to do
- Store the dataset as a proper time series. Somehow I have the feeling this is the way the data should be presented in the first place.
- Do a proper signal analysis to spot noise and patters.
- We should spot the seasonal periodicity of the data. Fourier transform based methods should pick this up.
- We should find characteristic noise.
- Apply proper filtering and noise removal techniques.
- Identify trends and anomalies in the filtered data. Some of the spikes in the data survive smoothing over 10 years, suggesting the presence of a significant event. We should be able to find these events.
Local LOTI Data
In addition to the globally averaged LOTI data, NASA also provies a locally resolved dataset, retrievable at https://data.giss.nasa.gov/pub/gistemp/gistemp1200_ERSSTv5.nc.gz.
The dataset is stored as NETCDF4 file, containing the following datasets:
- time_bnds, and
To visualize this dataset within this notebook, I followed several websites:
- Howto load NETCDF files with python: http://www.hydro.washington.edu/~jhamman/hydro-logic/blog/2013/10/12/plot-netcdf-data/
- Projections of earth: http://scitools.org.uk/cartopy/docs/latest/crs/projections.html
- Color schemes: http://matplotlib.org/cmocean/
The visualization of this dataset is a little tricky. First python needs to be able to import the NETCDF data file format. Luckily Python has a module for just this purpose: netCDF4. So with the import problem solved, we now have a data visualization problem: The data is in a regular grid with respect to longitude and latitude, describing a spherical coordinatesystem. To plot this in two dimensions, we need to project onto a two-dimensional plane. In addition, some geographical references would be nice (oceans, continents, countries, …). Again a python module comes to the rescue: basemap and its successor package cartopy. I have used the leagcy package basemap for these figures, but the next posts will use cartopy. Both packages provide projections schemes which easily integrate with matplotlib, as well as plotting utilities for supporting geographic data. All in all, creating the figures below only costs a couple of lines of code.
Projection of a single dataset
The gistemp NETCDF file contains monthly temperature data since 1880. Here are three exampled:
Projection of annual averages
Now we can easily calculate the average of the local temperatures over the span of a year. Pandas take care of the missing values in the averaging routine.
Average temperature anomaly resolved by country
In the previous subsection we finally managed to plot 2D projections of the local temperature anomalies and their annual averages. The final step would be to use this data to calculate the annual average per country. The steps to do this would be as follows:
- Create a average scheme which integrates the data over a specific country and divides it by the area of the country. The easiest way to do that, if one could convolute the data with the characteristic of each country. This should be possible with basemap and its successor, but I have not worked this out yet.
- Plot this in a nice radial plot similar to the publication.
It seems that matplotlib can do something similar, but it will take me a little while to figure this out.
In short, this is
Conclusions and additional material
Well, we have not quite reached our original goal to reproduce and improve the graph. However we have made some progress:
- We found the data
- We could interpret the data
- We could plot the data in various forms
- We at least tentatively know, how the original video was made.
We also encountered some difficulties:
- It is not clear how the global data was obtained from the local LOTI data
- It is not clear how the country wide averaging was done.
- Reproducing precisely the same graph is doable, but will take some effort. Perhaps we can do something equally cool, with a little less effort.
- Something interactive would be nice …
The Bitbucket repository for this worksheet can be found here.
It can be viewed below or in a full screen Jupyter-Notebook window.
I am doing this as a hobby, so take all of this with care. Or better reproduce and improve it ….
License to graphs and photos
Unless otherwise stated all figures are published under the most non restrictive Creative Commons License:
To the extent possible under law, Andreas Putz has waived all copyright and related or neighbouring rights to LOTI Temperature Data.
(CC0 Creative Commons Free for commercial use No attribution required )