Skip to main content

Machine Learning

The TOMCAT chemical transport model (CTM) is being used to create novel long-term datasets of various atmospheric trace gases, including ozone, methane, and nitrous oxide, by combining model simulations with satellite observations. These datasets are crucial for understanding the long-term changes and variability of these gases in the stratosphere. The primary goal of these studies is to produce consistent, gap-free datasets that overcome the limitations of individual satellite records, which often have shorter durations and may contain biases due to differences in measurement methods, sampling patterns, and retrieval algorithms. The datasets can be used to evaluate models and as reference fields in satellite retrieval algorithms.

To address these limitations, a machine-learning (ML) approach is employed, specifically using the Random Forest/XGBoost algorithms to correct biases in the TOMCAT model outputs using available satellite measurements. For example, in the creation of the ML-TOMCAT ozone dataset, the random forest (RF) method is used to merge satellite data with the TOMCAT model output. The ML algorithm is trained using satellite data during periods of good temporal sampling. The trained model is then used to simulate differences for the entire time period. The ML approach allows for the reconstruction of long-term vertical profile data for key stratospheric species.

In effect, the ML methods are used as a computationally cheap and easy-to-apply form of data assimilation. The following publications describe the ML-TOMCAT methodology and its use in constructing long-term datasets:

  • Dhomse et al. (2021) (ML-TOMCAT): This paper describes the creation of a long-term ozone profile dataset by combining the TOMCAT CTM with a random forest ensemble learning method, using the Stratospheric Water and OzOne Satellite Homogenized (SWOOSH) dataset for training.
  • Dhomse and Chipperfield (2023) (TCOM) - This paper discusses the methodology for creating long-term methane and nitrous oxide profile data sets by applying corrections to TOMCAT output based on profile measurements from HALOE and ACE-FTS.

The datasets created using the TOMCAT model and ML techniques are publicly available, and links are provided in the publications.

 

Our aim is to provide valuable resources for the scientific community to improve understanding of stratospheric processes and climate change.

 

Publications

Dhomse, S.S., C. Arosio, W. Feng, A. Rozanov, M. Weber and M.P. Chipperfield, ML-TOMCAT: machine-learning-based satellite-corrected global stratospheric ozone profile data set from a chemical transport model, Earth Syst. Sci. Data, 13, 5711-5729, doi:10.5194/essd-13-5711-2021, 2021.

Dhomse, S.S., and M.P. Chipperfield, Using machine-learning to construct TOMCAT model and occultation measurement-based stratospheric methane (TCOM-CH4) and nitrous oxide (TCOM-N2O) profile data sets, Earth Syst. Sci. Data, 15, 5105-5120, doi:10.5194/essd-15-5105-2023, 2023.