Summary

This exercise investigates concentration measurements of carbon dioxide (CO_2) in the Earth’s atmosphere, and uses time series methods to forecast their levels into the future. Unlike the majority of study into the Earth’s atmosphere and climate, in this project only these statistical methods are used for generating predictions. The natural sciences, on the other hand, aim to deduce the mechanisms of the environment separately in order to reason about what may occur in the future. This can include just as much math, statistics, and model building as a time series based approach, but it ultimately focuses understanding of physical mechanisms first before extrapolation.

To explore this key difference, many time series methods use only the previously recorded values of a variable in order to predict its levels in the future. An example of this could be predicting a stock’s future price based only on information of what its price was in the past, with no other knowledge or data. In this project, time series methods are used to predict future CO_2 concentrations in the Earth’s atmosphere derived only from past measurements. One advantage of a time series based approach is that the true mechanism of what drives CO_2 to change over time is not needed to make a prediction. This applies to both what is constant over time, such as the physics of molecular interactions involving CO_2, but also any other processes that are changing over time and affecting the variable being modeled.  This is where much ambiguity also exists in this subject matter, as predicting CO_2 levels in the future would also require predicting how any other time variant variables would change as well.

For example, it is known that at atmospheric pressure CO_2 becomes less soluble in sea water with higher temperature^{[1]}, so a sea that is warming over a given time period would act as a source of CO_2. The opposite would be true of a sea that is cooling, where it would act as a sink. If this relationship were to be included in a model, then an estimate of CO_2 concentrations over time would rely on an estimate of that sea’s temperature over time as well. But the temperature of that sea may also rely on CO_2 concentrations in the atmosphere, as a potential trapper of heat, which could result in a set of equations without an analytical solution. To get past this impasse, as done in similar scenarios, optimization techniques are used to arrive at a convergent best solution. 

For many applications, time series methods can produce useful and accurate results without specific knowledge of what drives those changes over time. This is beneficial in that there need not be assumed mechanisms between variables in a model in order to generate an accurate forecast. It carries the disadvantage, of course, that this vacancy can leave the prospect of a variable changing in the real world in a way not accounted for by the model, resulting in an incorrect forecast. While there is also this potential in a more empirically based model, by manually accounting for relationships within a system it can be better handled. With regards to this subject, these would include the potential sources and sinks of CO_2 on the planet. 

Two datasets of measured CO_2 concentrations were used in this exercise. The first is from Charles Keeling, a scientist who began recording atmospheric data at Mauna Loa Observatory in 1958. Keeling’s observations are the earliest continuous measurements of the gas in the atmosphere, where it was found that CO_2 concentrations followed a cyclic pattern due to the seasons and that levels were increasing over time. This data is now famously known as the “Keeling Curve“. The dataset ends in 1997 and is used for examination and preliminary model building. The second comes from the National Oceanic and Atmospheric Administration (NOAA) whom officially continued Keeling’s measurements at the same site. This dataset covers the period of 1974 – 2020, and is used for final modeling and forecasting.

Using the Keeling data, both a polynomial time trend and SARIMA model were created. With the data ending in 1997, the models predicted that in the December 2017 CO_2 concentrations would be 382 and 396 ppm respectively. This puts the SARIMA forecast closer to the true value, which was about ~407 ppm, although both were underestimates. Thus, going forward with the NOAA data, only a SARIMA model was considered. Here, the new model was created using data over the time period of 1974 until June 2019. When forecasting to June 2020, the prediction almost exactly matched the true value NOAA later published at ~416 ppm.^{[3]} This increased accuracy is expected with a much shorter term forecast.

For a longer term forecast with the final model, it was predicted that a CO_2 concentration of 450 ppm would be reached in March 2035. This is the mildest climate change scenario described by researchers in RCP2.6^{[2]}, where it is assumed CO_2 concentrations will peak at this concentration between 2040-2050. With the model predicting that this level will be reached earlier in 2035, and that previous growth will continue, it is likely that the RCP2.6 scenario is an underestimate of what may change in Earth’s climate over the next several decades. This modeling scenario is known for being associated with the limited 2°C increase in the global mean temperature. By the year 2100, the model forecasts CO_2 concentrations of about 600 ppm. However, with such a long forecast, this should be taken only as a rough estimate and could easily be incorrect by that time. 

This work was originally completed as part of the W271 Statistical Methods for Discrete Response, Time Series, and Panel Data course in the Master of Information and Data Science program at University of California, Berkeley. 

Citations

[1] Denis A. Wiesenburg, An Evaluation of the Solubility of Carbon Dioxide in Sea Water at Partial Pressure of 101.3 kPa, Center for Marine Sciences, University of Southern Mississippi, Stennis Space Center, MS 39529 USA
January, 1995

[2] van Vuuren, D.P., Stehfest, E., den Elzen, M.G.J. et al. RCP2.6: exploring the possibility to keep global mean temperature increase below 2°C. Climatic Change 109, 95 (2011). https://doi.org/10.1007/s10584-011-0152-3

[3] https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt