Acessing model data using intake-esm#

To access the CMIP6 model data and CESM-PPE data we created intake catalogs to help browse and subset the data. The catalogs are stored in shared folder /mnt/craas1-ns9989k-geo4992/data/intake-catalogs/

# Importing the required packages
import intake
import xarray as xr
import intake_esm
import numpy as np
import matplotlib.pyplot as plt

Reading and browsing the catalog#

# Open the catalog

# col = intake.open_esm_datastore('https://storage.googleapis.com/cmip6/pangeo-cmip6.json') # Remote Pangeo

col = intake.open_esm_datastore('/mnt/craas1-ns9989k-geo4992/data/catalogs/cmip6.json') # Local data stored on NIRD
col

cmip6 catalog with 155 dataset(s) from 536945 asset(s):

unique
variable_id 583
table_id 24
source_id 75
experiment_id 94
member_id 190
grid_label 11
time_range 9100
activity_id 18
institution_id 35
version 577
path 536945
dcpp_init_year 0
derived_variable_id 0

Under the hood intake-esm uses a large table csv, which contains some metadata and the paths of where to find it. The data can be stored both locally or in a remote cloud storage i.e pangeo-cloud.

Note

Since the paths listed in the csv table are absolute, notebook starting from the same catalog can be run from any directory, without needing to change the in side the notebook paths.

Browsing the catalog: Comparing the change in AOD over the historical period across CMIP6 models#

The keywords when browsing is columns of the table, e.g. variable_id, source_uid etc. To list all the keys available for a given key word you can use the col.unique()['<keyword>'] function.

col = col.search(
    variable_id='od550aer',
    experiment_id='histSST'
)
col

cmip6 catalog with 9 dataset(s) from 190 asset(s):

unique
variable_id 1
table_id 1
source_id 9
experiment_id 1
member_id 4
grid_label 3
time_range 185
activity_id 2
institution_id 9
version 9
path 190
dcpp_init_year 0
derived_variable_id 0

Models available for this request:

col.unique()['source_id']
['CNRM-ESM2-1',
 'MPI-ESM-1-2-HAM',
 'MIROC6',
 'UKESM1-0-LL',
 'MRI-ESM2-0',
 'GISS-E2-1-G',
 'CESM2-WACCM',
 'GFDL-ESM4',
 'EC-Earth3-AerChem']

Loading the data and plotting#

Warning

The catalog can be huge, always query and subset catalog before loading the data. Preferably down to level of experiment and variable.

The .to_dataset_dict function can accept an optional preprocessing function which can be used to harmonize the datasets, temporal resampling, or slicing. Below we made a preprocessing function for resampling the data into annual means and calculate the global average to allow for easily plotting the time series.

def resample_calculate_and_global_avg(ds):
    """
    Function to resample the data to annual mean and calculate the global average.
    """
    ds=ds.resample(time='YE').mean() # Resample to annual mean
    ds = ds.drop_vars(['lat_bnds','lon_bnds'], errors='ignore') # create conflict when calculating global average
    weights = np.cos(np.deg2rad(ds.lat)) # Make weighted global average
    ds_weighted = ds.weighted(weights)
    weighted_mean = ds_weighted.mean(dim=['lon','lat'])
    
    return weighted_mean
# dataset_dict = col.to_dataset_dict()
dataset_dict = col.to_dataset_dict(xarray_open_kwargs={'use_cftime':True, 'chunks':{'time':48}}, 
                           aggregate=True,
                           preprocess=resample_calculate_and_global_avg,
                           skip_on_error=True,
                           xarray_combine_by_coords={'combine_attrs': 'override'} 
                          )
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id'
100.00% [9/9 00:13<00:00]
/opt/conda/envs/pangeo-notebook/lib/python3.11/site-packages/xarray/conventions.py:286: SerializationWarning: variable 'od550aer' has multiple fill values {1e+20, 1e+20} defined, decoding all values to NaN.
  var = coder.decode(var, name=name)

Without the providing a preprocess function the time aggregated output as dictionary with grouped by the following keys "activity_id", "institution_id", "source_id", "experiment_id", "table_id", "grid_label".

col.to_dataset_dict(xarray_open_kwargs={'use_cftime':True, 'chunks':{'time':48}}, 
                           aggregate=True,
                           skip_on_error=True,
                           xarray_combine_by_coords={'combine_attrs': 'override'} 
                          )
Hide code cell output
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id'
100.00% [9/9 00:06<00:00]
/opt/conda/envs/pangeo-notebook/lib/python3.11/site-packages/xarray/conventions.py:286: SerializationWarning: variable 'od550aer' has multiple fill values {1e+20, 1e+20} defined, decoding all values to NaN.
  var = coder.decode(var, name=name)
{'AerChemMIP.CNRM-CERFACS': <xarray.Dataset> Size: 260MB
 Dimensions:      (lat: 128, lon: 256, time: 1980, axis_nbounds: 2, member_id: 1)
 Coordinates:
   * lat          (lat) float64 1kB -88.93 -87.54 -86.14 ... 86.14 87.54 88.93
   * lon          (lon) float64 2kB 0.0 1.406 2.812 4.219 ... 355.8 357.2 358.6
   * time         (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
   * member_id    (member_id) object 8B 'r1i1p1f2'
 Dimensions without coordinates: axis_nbounds
 Data variables:
     time_bounds  (time, axis_nbounds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
     od550aer     (member_id, time, lat, lon) float32 260MB dask.array<chunksize=(1, 48, 128, 256), meta=np.ndarray>
 Attributes: (12/65)
     Conventions:                      CF-1.7 CMIP-6.2
     creation_date:                    2018-08-08T14:22:32Z
     description:                      Historical transient with SSTs prescrib...
     title:                            CNRM-ESM2-1 model output prepared for C...
     activity_id:                      AerChemMIP
     contact:                          contact.cmip@meteo.fr
     ...                               ...
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  CNRM-CERFACS
     intake_esm_attrs:version:         v20190621
     intake_esm_attrs:path:            /mnt/craas1-ns9989k-ns9560k/ESGF/CMIP6/...
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.CNRM-CERFACS,
 'AerChemMIP.NOAA-GFDL': <xarray.Dataset> Size: 411MB
 Dimensions:    (bnds: 2, lat: 180, lon: 288, member_id: 1, time: 1980)
 Coordinates:
   * bnds       (bnds) float64 16B 1.0 2.0
   * lat        (lat) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
     lat_bnds   (lat, bnds) float64 3kB dask.array<chunksize=(180, 2), meta=np.ndarray>
   * lon        (lon) float64 2kB 0.625 1.875 3.125 4.375 ... 356.9 358.1 359.4
     lon_bnds   (lon, bnds) float64 5kB dask.array<chunksize=(288, 2), meta=np.ndarray>
   * time       (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
     time_bnds  (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
   * member_id  (member_id) object 8B 'r1i1p1f1'
 Data variables:
     od550aer   (member_id, time, lat, lon) float32 411MB dask.array<chunksize=(1, 48, 180, 288), meta=np.ndarray>
 Attributes: (12/56)
     external_variables:               areacella
     history:                          File was processed by fremetar (GFDL an...
     table_id:                         AERmon
     activity_id:                      AerChemMIP
     branch_method:                    atmospheric and land state taken from p...
     branch_time_in_child:             0.0
     ...                               ...
     intake_esm_attrs:grid_label:      gr1
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  NOAA-GFDL
     intake_esm_attrs:version:         v20180701
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.NOAA-GFDL,
 'AerChemMIP.NCAR': <xarray.Dataset> Size: 438MB
 Dimensions:    (member_id: 1, time: 1980, lat: 192, lon: 288, nbnd: 2)
 Coordinates:
   * lat        (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
   * lon        (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
   * time       (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
     time_bnds  (time, nbnd) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
     lat_bnds   (lat, nbnd) float32 2kB dask.array<chunksize=(192, 2), meta=np.ndarray>
     lon_bnds   (lon, nbnd) float32 2kB dask.array<chunksize=(288, 2), meta=np.ndarray>
   * member_id  (member_id) object 8B 'r1i2p1f1'
 Dimensions without coordinates: nbnd
 Data variables:
     od550aer   (member_id, time, lat, lon) float32 438MB dask.array<chunksize=(1, 48, 192, 288), meta=np.ndarray>
 Attributes: (12/56)
     Conventions:                      CF-1.7 CMIP-6.2
     activity_id:                      AerChemMIP
     branch_method:                    no parent
     branch_time_in_child:             674885.0
     branch_time_in_parent:            0.0
     case_id:                          47
     ...                               ...
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  NCAR
     intake_esm_attrs:version:         v20190531
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.NCAR,
 'AerChemMIP.NASA-GISS': <xarray.Dataset> Size: 103MB
 Dimensions:    (time: 1980, bnds: 2, lat: 90, lon: 144, member_id: 1)
 Coordinates:
   * time       (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
     time_bnds  (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
   * lat        (lat) float64 720B -89.0 -87.0 -85.0 -83.0 ... 85.0 87.0 89.0
     lat_bnds   (lat, bnds) float64 1kB dask.array<chunksize=(90, 2), meta=np.ndarray>
   * lon        (lon) float64 1kB 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
     lon_bnds   (lon, bnds) float64 2kB dask.array<chunksize=(144, 2), meta=np.ndarray>
   * member_id  (member_id) object 8B 'r1i1p3f1'
 Dimensions without coordinates: bnds
 Data variables:
     od550aer   (member_id, time, lat, lon) float32 103MB dask.array<chunksize=(1, 48, 90, 144), meta=np.ndarray>
 Attributes: (12/56)
     Conventions:                      CF-1.7 CMIP-6.2
     activity_id:                      AerChemMIP
     branch_method:                    standard
     branch_time_in_child:             0.0
     branch_time_in_parent:            0.0
     contact:                          Kenneth Lo (cdkkl@giss.nasa.gov)
     ...                               ...
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  NASA-GISS
     intake_esm_attrs:version:         v20191120
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.NASA-GISS,
 'AerChemMIP.MIROC': <xarray.Dataset> Size: 260MB
 Dimensions:     (time: 1980, bnds: 2, lat: 128, lon: 256, member_id: 1)
 Coordinates:
   * time        (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
     time_bnds   (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
   * lat         (lat) float64 1kB -88.93 -87.54 -86.14 ... 86.14 87.54 88.93
     lat_bnds    (lat, bnds) float64 2kB dask.array<chunksize=(128, 2), meta=np.ndarray>
   * lon         (lon) float64 2kB 0.0 1.406 2.812 4.219 ... 355.8 357.2 358.6
     lon_bnds    (lon, bnds) float64 4kB dask.array<chunksize=(256, 2), meta=np.ndarray>
     wavelength  float64 8B 550.0
   * member_id   (member_id) object 8B 'r1i1p1f1'
 Dimensions without coordinates: bnds
 Data variables:
     od550aer    (member_id, time, lat, lon) float32 260MB dask.array<chunksize=(1, 48, 128, 256), meta=np.ndarray>
 Attributes: (12/53)
     Conventions:                      CF-1.7 CMIP-6.2
     activity_id:                      AerChemMIP
     branch_method:                    standard
     branch_time_in_child:             0.0
     branch_time_in_parent:            0.0
     data_specs_version:               01.00.31
     ...                               ...
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  MIROC
     intake_esm_attrs:version:         v20190828
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.MIROC,
 'AerChemMIP.MOHC': <xarray.Dataset> Size: 219MB
 Dimensions:     (time: 1980, bnds: 2, lat: 144, lon: 192, member_id: 1)
 Coordinates:
   * time        (time) object 16kB 1850-01-16 00:00:00 ... 2014-12-16 00:00:00
     time_bnds   (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
   * lat         (lat) float64 1kB -89.38 -88.12 -86.88 ... 86.88 88.12 89.38
     lat_bnds    (lat, bnds) float64 2kB dask.array<chunksize=(144, 2), meta=np.ndarray>
   * lon         (lon) float64 2kB 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1
     lon_bnds    (lon, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
     wavelength  float64 8B 550.0
   * member_id   (member_id) object 8B 'r1i1p1f2'
 Dimensions without coordinates: bnds
 Data variables:
     od550aer    (member_id, time, lat, lon) float32 219MB dask.array<chunksize=(1, 48, 144, 192), meta=np.ndarray>
 Attributes: (12/56)
     Conventions:                      CF-1.7 CMIP-6.2
     activity_id:                      AerChemMIP
     branch_time_in_child:             0.0
     data_specs_version:               01.00.29
     experiment:                       historical prescribed SSTs and historic...
     experiment_id:                    histSST
     ...                               ...
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  MOHC
     intake_esm_attrs:version:         v20190902
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.MOHC,
 'AerChemMIP.MRI': <xarray.Dataset> Size: 406MB
 Dimensions:     (time: 1980, bnds: 2, lat: 160, lon: 320, member_id: 1)
 Coordinates:
   * time        (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
   * lat         (lat) float64 1kB -89.14 -88.03 -86.91 ... 86.91 88.03 89.14
   * lon         (lon) float64 3kB 0.0 1.125 2.25 3.375 ... 356.6 357.8 358.9
     wavelength  float64 8B ...
   * member_id   (member_id) object 8B 'r1i1p1f1'
 Dimensions without coordinates: bnds
 Data variables:
     time_bnds   (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
     lat_bnds    (lat, bnds) float64 3kB dask.array<chunksize=(160, 2), meta=np.ndarray>
     lon_bnds    (lon, bnds) float64 5kB dask.array<chunksize=(320, 2), meta=np.ndarray>
     od550aer    (member_id, time, lat, lon) float32 406MB dask.array<chunksize=(1, 48, 160, 320), meta=np.ndarray>
 Attributes: (12/59)
     Conventions:                      CF-1.7 CMIP-6.2
     activity_id:                      AerChemMIP
     branch_method:                    standard (the actual parent run: RFMIP ...
     branch_time_in_child:             0.0
     branch_time_in_parent:            0.0
     comment:                          This od550aer includes AOD from stratos...
     ...                               ...
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  MRI
     intake_esm_attrs:version:         v20200207
     intake_esm_attrs:path:            /mnt/craas1-ns9989k-ns9560k/ESGF/CMIP6/...
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.MRI,
 'AerChemMIP.HAMMOZ-Consortium': <xarray.Dataset> Size: 146MB
 Dimensions:     (time: 1980, bnds: 2, lat: 96, lon: 192, member_id: 1)
 Coordinates:
   * time        (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
     time_bnds   (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
   * lat         (lat) float64 768B -88.57 -86.72 -84.86 ... 84.86 86.72 88.57
     lat_bnds    (lat, bnds) float64 2kB dask.array<chunksize=(96, 2), meta=np.ndarray>
   * lon         (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
     lon_bnds    (lon, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
     wavelength  float64 8B 550.0
   * member_id   (member_id) object 8B 'r1i1p1f1'
 Dimensions without coordinates: bnds
 Data variables:
     od550aer    (member_id, time, lat, lon) float32 146MB dask.array<chunksize=(1, 48, 96, 192), meta=np.ndarray>
 Attributes: (12/57)
     CDO:                              Climate Data Operators version 1.9.9rc8...
     Conventions:                      CF-1.7 CMIP-6.2
     activity_id:                      AerChemMIP
     branch_method:                    standard
     branch_time_in_child:             0.0
     branch_time_in_parent:            0.0
     ...                               ...
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  HAMMOZ-Consortium
     intake_esm_attrs:version:         v20190628
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.HAMMOZ-Consortium,
 'CMIP.EC-Earth-Consortium': <xarray.Dataset> Size: 86MB
 Dimensions:     (time: 1980, bnds: 2, lat: 90, lon: 120, member_id: 1)
 Coordinates:
   * time        (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
     time_bnds   (time, bnds) object 32kB dask.array<chunksize=(12, 2), meta=np.ndarray>
   * lat         (lat) float64 720B -89.0 -87.0 -85.0 -83.0 ... 85.0 87.0 89.0
     lat_bnds    (lat, bnds) float64 1kB dask.array<chunksize=(90, 2), meta=np.ndarray>
   * lon         (lon) float64 960B 1.5 4.5 7.5 10.5 ... 349.5 352.5 355.5 358.5
     lon_bnds    (lon, bnds) float64 2kB dask.array<chunksize=(120, 2), meta=np.ndarray>
     wavelength  float64 8B 550.0
   * member_id   (member_id) object 8B 'r1i1p1f1'
 Dimensions without coordinates: bnds
 Data variables:
     od550aer    (member_id, time, lat, lon) float32 86MB dask.array<chunksize=(1, 12, 90, 120), meta=np.ndarray>
 Attributes: (12/57)
     Conventions:                        CF-1.7 CMIP-6.2
     activity_id:                        AerChemMIP
     branch_method:                      standard
     branch_time_in_child:               0.0
     branch_time_in_parent:              0.0
     contact:                            cmip6-data@ec-earth.org
     ...                                 ...
     intake_esm_attrs:grid_label:        gn
     intake_esm_attrs:activity_id:       CMIP
     intake_esm_attrs:institution_id:    EC-Earth-Consortium
     intake_esm_attrs:version:           v20210309
     intake_esm_attrs:_data_format_:     netcdf
     intake_esm_dataset_key:             CMIP.EC-Earth-Consortium}

Preprocess dictionary only contain the time series, which is easy to loop over and plot for each model.

dataset_dict
{'AerChemMIP.NASA-GISS': <xarray.Dataset> Size: 3kB
 Dimensions:    (time: 165, member_id: 1)
 Coordinates:
   * time       (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
   * member_id  (member_id) object 8B 'r1i1p3f1'
 Data variables:
     od550aer   (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
 Attributes:
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       GISS-E2-1-G
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i1p3f1
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  NASA-GISS
     intake_esm_attrs:version:         v20191120
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.NASA-GISS,
 'AerChemMIP.CNRM-CERFACS': <xarray.Dataset> Size: 3kB
 Dimensions:    (time: 165, member_id: 1)
 Coordinates:
   * time       (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
   * member_id  (member_id) object 8B 'r1i1p1f2'
 Data variables:
     od550aer   (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
 Attributes: (12/14)
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       CNRM-ESM2-1
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i1p1f2
     ...                               ...
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  CNRM-CERFACS
     intake_esm_attrs:version:         v20190621
     intake_esm_attrs:path:            /mnt/craas1-ns9989k-ns9560k/ESGF/CMIP6/...
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.CNRM-CERFACS,
 'AerChemMIP.NCAR': <xarray.Dataset> Size: 3kB
 Dimensions:    (time: 165, member_id: 1)
 Coordinates:
   * time       (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
   * member_id  (member_id) object 8B 'r1i2p1f1'
 Data variables:
     od550aer   (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
 Attributes:
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       CESM2-WACCM
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i2p1f1
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  NCAR
     intake_esm_attrs:version:         v20190531
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.NCAR,
 'AerChemMIP.NOAA-GFDL': <xarray.Dataset> Size: 3kB
 Dimensions:    (time: 165, member_id: 1)
 Coordinates:
   * time       (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
   * member_id  (member_id) object 8B 'r1i1p1f1'
 Data variables:
     od550aer   (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
 Attributes:
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       GFDL-ESM4
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i1p1f1
     intake_esm_attrs:grid_label:      gr1
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  NOAA-GFDL
     intake_esm_attrs:version:         v20180701
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.NOAA-GFDL,
 'AerChemMIP.MIROC': <xarray.Dataset> Size: 3kB
 Dimensions:     (time: 165, member_id: 1)
 Coordinates:
     wavelength  float64 8B 550.0
   * time        (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
   * member_id   (member_id) object 8B 'r1i1p1f1'
 Data variables:
     od550aer    (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
 Attributes:
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       MIROC6
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i1p1f1
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  MIROC
     intake_esm_attrs:version:         v20190828
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.MIROC,
 'AerChemMIP.MOHC': <xarray.Dataset> Size: 3kB
 Dimensions:     (time: 165, member_id: 1)
 Coordinates:
     wavelength  float64 8B 550.0
   * time        (time) object 1kB 1850-12-30 00:00:00 ... 2014-12-30 00:00:00
   * member_id   (member_id) object 8B 'r1i1p1f2'
 Data variables:
     od550aer    (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
 Attributes:
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       UKESM1-0-LL
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i1p1f2
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  MOHC
     intake_esm_attrs:version:         v20190902
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.MOHC,
 'AerChemMIP.HAMMOZ-Consortium': <xarray.Dataset> Size: 3kB
 Dimensions:     (time: 165, member_id: 1)
 Coordinates:
     wavelength  float64 8B 550.0
   * time        (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
   * member_id   (member_id) object 8B 'r1i1p1f1'
 Data variables:
     od550aer    (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
 Attributes:
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       MPI-ESM-1-2-HAM
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i1p1f1
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  HAMMOZ-Consortium
     intake_esm_attrs:version:         v20190628
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.HAMMOZ-Consortium,
 'AerChemMIP.MRI': <xarray.Dataset> Size: 3kB
 Dimensions:     (time: 165, member_id: 1)
 Coordinates:
     wavelength  float64 8B 550.0
   * time        (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
   * member_id   (member_id) object 8B 'r1i1p1f1'
 Data variables:
     od550aer    (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
 Attributes: (12/14)
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       MRI-ESM2-0
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i1p1f1
     ...                               ...
     intake_esm_attrs:activity_id:     AerChemMIP
     intake_esm_attrs:institution_id:  MRI
     intake_esm_attrs:version:         v20200207
     intake_esm_attrs:path:            /mnt/craas1-ns9989k-ns9560k/ESGF/CMIP6/...
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           AerChemMIP.MRI,
 'CMIP.EC-Earth-Consortium': <xarray.Dataset> Size: 3kB
 Dimensions:     (time: 165, member_id: 1)
 Coordinates:
     wavelength  float64 8B 550.0
   * time        (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
   * member_id   (member_id) object 8B 'r1i1p1f1'
 Data variables:
     od550aer    (member_id, time) float64 1kB dask.array<chunksize=(1, 1), meta=np.ndarray>
 Attributes:
     intake_esm_vars:                  ['od550aer']
     intake_esm_attrs:variable_id:     od550aer
     intake_esm_attrs:table_id:        AERmon
     intake_esm_attrs:source_id:       EC-Earth3-AerChem
     intake_esm_attrs:experiment_id:   histSST
     intake_esm_attrs:member_id:       r1i1p1f1
     intake_esm_attrs:grid_label:      gn
     intake_esm_attrs:activity_id:     CMIP
     intake_esm_attrs:institution_id:  EC-Earth-Consortium
     intake_esm_attrs:version:         v20210309
     intake_esm_attrs:_data_format_:   netcdf
     intake_esm_dataset_key:           CMIP.EC-Earth-Consortium}
fig,ax = plt.subplots(figsize=(10,6))
for datakey, data in dataset_dict.items(): # Looping over ever varable in the dictionary
    source_id = data.attrs['intake_esm_attrs:source_id']
    member_id = data.attrs['intake_esm_attrs:member_id']
    data['od550aer'].isel(member_id=0).plot(label=f'{source_id} {member_id}', ax=ax)

ax.legend()
    
<matplotlib.legend.Legend at 0x7fa1fc40f110>
../../_images/1ce8477193212fee86cf48b45f0742f9510c9b130e3820534a8d57922614db8e.png

Exporting a subset of the catalog#

Most likely you will only be analyzing a small subset of experiments the model experiment and it could be beneficial to work with a reduced catalog. Below we will make a subset that only contain the information related to histSST and histSST-piaer, and only the absorption optical depth (od550abs) and total optical depth (od550aer).

col = intake.open_esm_datastore('/mnt/craas1-ns9989k-geo4992/data/cmip6.json') # Local data stored on NIRD
col = col.search(    
    experiment_id=['histSST', 'histSST-piaer'],
    variable_id = ['od550abs', 'od550aer']
)
col

cmip6 catalog with 9 dataset(s) from 190 asset(s):

unique
variable_id 1
table_id 1
source_id 9
experiment_id 1
member_id 4
grid_label 3
time_range 185
activity_id 2
institution_id 9
version 9
path 190
dcpp_init_year 0
derived_variable_id 0

Then when we are happy with the selection the catalog can be exported as follows:

col.serialize(name='histSST-AerChemMIP',catalog_type='file',directory='~/')
Successfully wrote ESM catalog json file to: file:///home/fc-3auid-3a9fdc0c87-2d7836-2d4bdc-2db802-2d9a250c322e3b//histSST-AerChemMIP.json
col = intake.open_esm_datastore('~/histSST-AerChemMIP.json')
col

histSST-AerChemMIP catalog with 9 dataset(s) from 190 asset(s):

unique
variable_id 1
table_id 1
source_id 9
experiment_id 1
member_id 4
grid_label 3
time_range 185
activity_id 2
institution_id 9
version 9
path 190
dcpp_init_year 0
derived_variable_id 0