Acessing model data using intake-esm#
To access the CMIP6 model data and CESM-PPE data we created intake catalogs to help browse and subset the data. The catalogs are stored in shared folder /mnt/craas1-ns9989k-geo4992/data/intake-catalogs/
# Importing the required packages
import intake
import xarray as xr
import intake_esm
import numpy as np
import matplotlib.pyplot as plt
Reading and browsing the catalog#
# Open the catalog
# col = intake.open_esm_datastore('https://storage.googleapis.com/cmip6/pangeo-cmip6.json') # Remote Pangeo
col = intake.open_esm_datastore('/mnt/craas1-ns9989k-geo4992/data/catalogs/cmip6.json') # Local data stored on NIRD
col
cmip6 catalog with 155 dataset(s) from 536945 asset(s):
unique | |
---|---|
variable_id | 583 |
table_id | 24 |
source_id | 75 |
experiment_id | 94 |
member_id | 190 |
grid_label | 11 |
time_range | 9100 |
activity_id | 18 |
institution_id | 35 |
version | 577 |
path | 536945 |
dcpp_init_year | 0 |
derived_variable_id | 0 |
Under the hood intake-esm
uses a large table csv, which contains some metadata and the paths of where to find it. The data can be stored both locally or in a remote cloud storage i.e pangeo-cloud
.
Note
Since the paths listed in the csv table are absolute, notebook starting from the same catalog can be run from any directory, without needing to change the in side the notebook paths.
Browsing the catalog: Comparing the change in AOD over the historical period across CMIP6 models#
The keywords when browsing is columns of the table, e.g. variable_id
, source_uid
etc. To list all the keys available for a given key word you can use the col.unique()['<keyword>']
function.
col = col.search(
variable_id='od550aer',
experiment_id='histSST'
)
col
cmip6 catalog with 9 dataset(s) from 190 asset(s):
unique | |
---|---|
variable_id | 1 |
table_id | 1 |
source_id | 9 |
experiment_id | 1 |
member_id | 4 |
grid_label | 3 |
time_range | 185 |
activity_id | 2 |
institution_id | 9 |
version | 9 |
path | 190 |
dcpp_init_year | 0 |
derived_variable_id | 0 |
Models available for this request:
col.unique()['source_id']
['CNRM-ESM2-1',
'MPI-ESM-1-2-HAM',
'MIROC6',
'UKESM1-0-LL',
'MRI-ESM2-0',
'GISS-E2-1-G',
'CESM2-WACCM',
'GFDL-ESM4',
'EC-Earth3-AerChem']
Loading the data and plotting#
Warning
The catalog can be huge, always query and subset catalog before loading the data. Preferably down to level of experiment and variable.
The .to_dataset_dict
function can accept an optional preprocessing function which can be used to harmonize the datasets, temporal resampling, or slicing.
Below we made a preprocessing function for resampling the data into annual means and calculate the global average to allow for easily plotting the time series.
def resample_calculate_and_global_avg(ds):
"""
Function to resample the data to annual mean and calculate the global average.
"""
ds=ds.resample(time='YE').mean() # Resample to annual mean
ds = ds.drop_vars(['lat_bnds','lon_bnds'], errors='ignore') # create conflict when calculating global average
weights = np.cos(np.deg2rad(ds.lat)) # Make weighted global average
ds_weighted = ds.weighted(weights)
weighted_mean = ds_weighted.mean(dim=['lon','lat'])
return weighted_mean
Using dask
The processing can be done in parallel using dask, which can speed things up a bit.
The dask cluster below is constrained within the resource each user have available.
The corresponding dask cluster panel can be viewed by clicking on the dask icon on the left-side panel.
from dask.distributed import Client, LocalCluster
client = Client(LocalCluster(n_workers=4, threads_per_worker=1, memory_limit='16GB'))
# dataset_dict = col.to_dataset_dict()
dataset_dict = col.to_dataset_dict(xarray_open_kwargs={'use_cftime':True, 'chunks':{'time':48}},
aggregate=True,
preprocess=resample_calculate_and_global_avg,
skip_on_error=True,
xarray_combine_by_coords={'combine_attrs': 'override'}
)
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id'
/opt/conda/envs/pangeo-notebook/lib/python3.11/site-packages/xarray/conventions.py:286: SerializationWarning: variable 'od550aer' has multiple fill values {1e+20, 1e+20} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
Without the providing a preprocess
function the time aggregated output as dictionary with grouped by the following keys "activity_id"
, "institution_id"
, "source_id"
, "experiment_id"
, "table_id"
, "grid_label"
.
col.to_dataset_dict(xarray_open_kwargs={'use_cftime':True, 'chunks':{'time':48}},
aggregate=True,
skip_on_error=True,
xarray_combine_by_coords={'combine_attrs': 'override'}
)
Show code cell output
--> The keys in the returned dictionary of datasets are constructed as follows:
'activity_id.institution_id'
/opt/conda/envs/pangeo-notebook/lib/python3.11/site-packages/xarray/conventions.py:286: SerializationWarning: variable 'od550aer' has multiple fill values {1e+20, 1e+20} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
{'AerChemMIP.CNRM-CERFACS': <xarray.Dataset> Size: 260MB
Dimensions: (lat: 128, lon: 256, time: 1980, axis_nbounds: 2, member_id: 1)
Coordinates:
* lat (lat) float64 1kB -88.93 -87.54 -86.14 ... 86.14 87.54 88.93
* lon (lon) float64 2kB 0.0 1.406 2.812 4.219 ... 355.8 357.2 358.6
* time (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* member_id (member_id) object 8B 'r1i1p1f2'
Dimensions without coordinates: axis_nbounds
Data variables:
time_bounds (time, axis_nbounds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
od550aer (member_id, time, lat, lon) float32 260MB dask.array<chunksize=(1, 48, 128, 256), meta=np.ndarray>
Attributes: (12/65)
Conventions: CF-1.7 CMIP-6.2
creation_date: 2018-08-08T14:22:32Z
description: Historical transient with SSTs prescrib...
title: CNRM-ESM2-1 model output prepared for C...
activity_id: AerChemMIP
contact: contact.cmip@meteo.fr
... ...
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: CNRM-CERFACS
intake_esm_attrs:version: v20190621
intake_esm_attrs:path: /mnt/craas1-ns9989k-ns9560k/ESGF/CMIP6/...
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.CNRM-CERFACS,
'AerChemMIP.NOAA-GFDL': <xarray.Dataset> Size: 411MB
Dimensions: (bnds: 2, lat: 180, lon: 288, member_id: 1, time: 1980)
Coordinates:
* bnds (bnds) float64 16B 1.0 2.0
* lat (lat) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
lat_bnds (lat, bnds) float64 3kB dask.array<chunksize=(180, 2), meta=np.ndarray>
* lon (lon) float64 2kB 0.625 1.875 3.125 4.375 ... 356.9 358.1 359.4
lon_bnds (lon, bnds) float64 5kB dask.array<chunksize=(288, 2), meta=np.ndarray>
* time (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
time_bnds (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
* member_id (member_id) object 8B 'r1i1p1f1'
Data variables:
od550aer (member_id, time, lat, lon) float32 411MB dask.array<chunksize=(1, 48, 180, 288), meta=np.ndarray>
Attributes: (12/56)
external_variables: areacella
history: File was processed by fremetar (GFDL an...
table_id: AERmon
activity_id: AerChemMIP
branch_method: atmospheric and land state taken from p...
branch_time_in_child: 0.0
... ...
intake_esm_attrs:grid_label: gr1
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: NOAA-GFDL
intake_esm_attrs:version: v20180701
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.NOAA-GFDL,
'AerChemMIP.NCAR': <xarray.Dataset> Size: 438MB
Dimensions: (member_id: 1, time: 1980, lat: 192, lon: 288, nbnd: 2)
Coordinates:
* lat (lat) float64 2kB -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
* lon (lon) float64 2kB 0.0 1.25 2.5 3.75 ... 355.0 356.2 357.5 358.8
* time (time) object 16kB 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
time_bnds (time, nbnd) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
lat_bnds (lat, nbnd) float32 2kB dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds (lon, nbnd) float32 2kB dask.array<chunksize=(288, 2), meta=np.ndarray>
* member_id (member_id) object 8B 'r1i2p1f1'
Dimensions without coordinates: nbnd
Data variables:
od550aer (member_id, time, lat, lon) float32 438MB dask.array<chunksize=(1, 48, 192, 288), meta=np.ndarray>
Attributes: (12/56)
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
branch_method: no parent
branch_time_in_child: 674885.0
branch_time_in_parent: 0.0
case_id: 47
... ...
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: NCAR
intake_esm_attrs:version: v20190531
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.NCAR,
'AerChemMIP.NASA-GISS': <xarray.Dataset> Size: 103MB
Dimensions: (time: 1980, bnds: 2, lat: 90, lon: 144, member_id: 1)
Coordinates:
* time (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
time_bnds (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
* lat (lat) float64 720B -89.0 -87.0 -85.0 -83.0 ... 85.0 87.0 89.0
lat_bnds (lat, bnds) float64 1kB dask.array<chunksize=(90, 2), meta=np.ndarray>
* lon (lon) float64 1kB 1.25 3.75 6.25 8.75 ... 351.2 353.8 356.2 358.8
lon_bnds (lon, bnds) float64 2kB dask.array<chunksize=(144, 2), meta=np.ndarray>
* member_id (member_id) object 8B 'r1i1p3f1'
Dimensions without coordinates: bnds
Data variables:
od550aer (member_id, time, lat, lon) float32 103MB dask.array<chunksize=(1, 48, 90, 144), meta=np.ndarray>
Attributes: (12/56)
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 0.0
contact: Kenneth Lo (cdkkl@giss.nasa.gov)
... ...
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: NASA-GISS
intake_esm_attrs:version: v20191120
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.NASA-GISS,
'AerChemMIP.MIROC': <xarray.Dataset> Size: 260MB
Dimensions: (time: 1980, bnds: 2, lat: 128, lon: 256, member_id: 1)
Coordinates:
* time (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
time_bnds (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
* lat (lat) float64 1kB -88.93 -87.54 -86.14 ... 86.14 87.54 88.93
lat_bnds (lat, bnds) float64 2kB dask.array<chunksize=(128, 2), meta=np.ndarray>
* lon (lon) float64 2kB 0.0 1.406 2.812 4.219 ... 355.8 357.2 358.6
lon_bnds (lon, bnds) float64 4kB dask.array<chunksize=(256, 2), meta=np.ndarray>
wavelength float64 8B 550.0
* member_id (member_id) object 8B 'r1i1p1f1'
Dimensions without coordinates: bnds
Data variables:
od550aer (member_id, time, lat, lon) float32 260MB dask.array<chunksize=(1, 48, 128, 256), meta=np.ndarray>
Attributes: (12/53)
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 0.0
data_specs_version: 01.00.31
... ...
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: MIROC
intake_esm_attrs:version: v20190828
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.MIROC,
'AerChemMIP.MOHC': <xarray.Dataset> Size: 219MB
Dimensions: (time: 1980, bnds: 2, lat: 144, lon: 192, member_id: 1)
Coordinates:
* time (time) object 16kB 1850-01-16 00:00:00 ... 2014-12-16 00:00:00
time_bnds (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
* lat (lat) float64 1kB -89.38 -88.12 -86.88 ... 86.88 88.12 89.38
lat_bnds (lat, bnds) float64 2kB dask.array<chunksize=(144, 2), meta=np.ndarray>
* lon (lon) float64 2kB 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1
lon_bnds (lon, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
wavelength float64 8B 550.0
* member_id (member_id) object 8B 'r1i1p1f2'
Dimensions without coordinates: bnds
Data variables:
od550aer (member_id, time, lat, lon) float32 219MB dask.array<chunksize=(1, 48, 144, 192), meta=np.ndarray>
Attributes: (12/56)
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
branch_time_in_child: 0.0
data_specs_version: 01.00.29
experiment: historical prescribed SSTs and historic...
experiment_id: histSST
... ...
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: MOHC
intake_esm_attrs:version: v20190902
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.MOHC,
'AerChemMIP.MRI': <xarray.Dataset> Size: 406MB
Dimensions: (time: 1980, bnds: 2, lat: 160, lon: 320, member_id: 1)
Coordinates:
* time (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
* lat (lat) float64 1kB -89.14 -88.03 -86.91 ... 86.91 88.03 89.14
* lon (lon) float64 3kB 0.0 1.125 2.25 3.375 ... 356.6 357.8 358.9
wavelength float64 8B ...
* member_id (member_id) object 8B 'r1i1p1f1'
Dimensions without coordinates: bnds
Data variables:
time_bnds (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
lat_bnds (lat, bnds) float64 3kB dask.array<chunksize=(160, 2), meta=np.ndarray>
lon_bnds (lon, bnds) float64 5kB dask.array<chunksize=(320, 2), meta=np.ndarray>
od550aer (member_id, time, lat, lon) float32 406MB dask.array<chunksize=(1, 48, 160, 320), meta=np.ndarray>
Attributes: (12/59)
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
branch_method: standard (the actual parent run: RFMIP ...
branch_time_in_child: 0.0
branch_time_in_parent: 0.0
comment: This od550aer includes AOD from stratos...
... ...
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: MRI
intake_esm_attrs:version: v20200207
intake_esm_attrs:path: /mnt/craas1-ns9989k-ns9560k/ESGF/CMIP6/...
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.MRI,
'AerChemMIP.HAMMOZ-Consortium': <xarray.Dataset> Size: 146MB
Dimensions: (time: 1980, bnds: 2, lat: 96, lon: 192, member_id: 1)
Coordinates:
* time (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
time_bnds (time, bnds) object 32kB dask.array<chunksize=(48, 2), meta=np.ndarray>
* lat (lat) float64 768B -88.57 -86.72 -84.86 ... 84.86 86.72 88.57
lat_bnds (lat, bnds) float64 2kB dask.array<chunksize=(96, 2), meta=np.ndarray>
* lon (lon) float64 2kB 0.0 1.875 3.75 5.625 ... 354.4 356.2 358.1
lon_bnds (lon, bnds) float64 3kB dask.array<chunksize=(192, 2), meta=np.ndarray>
wavelength float64 8B 550.0
* member_id (member_id) object 8B 'r1i1p1f1'
Dimensions without coordinates: bnds
Data variables:
od550aer (member_id, time, lat, lon) float32 146MB dask.array<chunksize=(1, 48, 96, 192), meta=np.ndarray>
Attributes: (12/57)
CDO: Climate Data Operators version 1.9.9rc8...
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 0.0
... ...
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: HAMMOZ-Consortium
intake_esm_attrs:version: v20190628
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.HAMMOZ-Consortium,
'CMIP.EC-Earth-Consortium': <xarray.Dataset> Size: 86MB
Dimensions: (time: 1980, bnds: 2, lat: 90, lon: 120, member_id: 1)
Coordinates:
* time (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
time_bnds (time, bnds) object 32kB dask.array<chunksize=(12, 2), meta=np.ndarray>
* lat (lat) float64 720B -89.0 -87.0 -85.0 -83.0 ... 85.0 87.0 89.0
lat_bnds (lat, bnds) float64 1kB dask.array<chunksize=(90, 2), meta=np.ndarray>
* lon (lon) float64 960B 1.5 4.5 7.5 10.5 ... 349.5 352.5 355.5 358.5
lon_bnds (lon, bnds) float64 2kB dask.array<chunksize=(120, 2), meta=np.ndarray>
wavelength float64 8B 550.0
* member_id (member_id) object 8B 'r1i1p1f1'
Dimensions without coordinates: bnds
Data variables:
od550aer (member_id, time, lat, lon) float32 86MB dask.array<chunksize=(1, 12, 90, 120), meta=np.ndarray>
Attributes: (12/57)
Conventions: CF-1.7 CMIP-6.2
activity_id: AerChemMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 0.0
contact: cmip6-data@ec-earth.org
... ...
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: CMIP
intake_esm_attrs:institution_id: EC-Earth-Consortium
intake_esm_attrs:version: v20210309
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: CMIP.EC-Earth-Consortium}
Preprocess dictionary only contain the time series, which is easy to loop over and plot for each model.
dataset_dict
{'AerChemMIP.NASA-GISS': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
* time (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
* member_id (member_id) object 8B 'r1i1p3f1'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
Attributes:
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: GISS-E2-1-G
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i1p3f1
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: NASA-GISS
intake_esm_attrs:version: v20191120
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.NASA-GISS,
'AerChemMIP.CNRM-CERFACS': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
* time (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
* member_id (member_id) object 8B 'r1i1p1f2'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
Attributes: (12/14)
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: CNRM-ESM2-1
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i1p1f2
... ...
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: CNRM-CERFACS
intake_esm_attrs:version: v20190621
intake_esm_attrs:path: /mnt/craas1-ns9989k-ns9560k/ESGF/CMIP6/...
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.CNRM-CERFACS,
'AerChemMIP.NCAR': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
* time (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
* member_id (member_id) object 8B 'r1i2p1f1'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
Attributes:
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: CESM2-WACCM
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i2p1f1
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: NCAR
intake_esm_attrs:version: v20190531
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.NCAR,
'AerChemMIP.NOAA-GFDL': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
* time (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
* member_id (member_id) object 8B 'r1i1p1f1'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
Attributes:
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: GFDL-ESM4
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i1p1f1
intake_esm_attrs:grid_label: gr1
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: NOAA-GFDL
intake_esm_attrs:version: v20180701
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.NOAA-GFDL,
'AerChemMIP.MIROC': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
wavelength float64 8B 550.0
* time (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
* member_id (member_id) object 8B 'r1i1p1f1'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
Attributes:
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: MIROC6
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i1p1f1
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: MIROC
intake_esm_attrs:version: v20190828
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.MIROC,
'AerChemMIP.MOHC': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
wavelength float64 8B 550.0
* time (time) object 1kB 1850-12-30 00:00:00 ... 2014-12-30 00:00:00
* member_id (member_id) object 8B 'r1i1p1f2'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
Attributes:
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: UKESM1-0-LL
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i1p1f2
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: MOHC
intake_esm_attrs:version: v20190902
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.MOHC,
'AerChemMIP.HAMMOZ-Consortium': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
wavelength float64 8B 550.0
* time (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
* member_id (member_id) object 8B 'r1i1p1f1'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
Attributes:
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: MPI-ESM-1-2-HAM
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i1p1f1
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: HAMMOZ-Consortium
intake_esm_attrs:version: v20190628
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.HAMMOZ-Consortium,
'AerChemMIP.MRI': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
wavelength float64 8B 550.0
* time (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
* member_id (member_id) object 8B 'r1i1p1f1'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 4), meta=np.ndarray>
Attributes: (12/14)
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: MRI-ESM2-0
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i1p1f1
... ...
intake_esm_attrs:activity_id: AerChemMIP
intake_esm_attrs:institution_id: MRI
intake_esm_attrs:version: v20200207
intake_esm_attrs:path: /mnt/craas1-ns9989k-ns9560k/ESGF/CMIP6/...
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: AerChemMIP.MRI,
'CMIP.EC-Earth-Consortium': <xarray.Dataset> Size: 3kB
Dimensions: (time: 165, member_id: 1)
Coordinates:
wavelength float64 8B 550.0
* time (time) object 1kB 1850-12-31 00:00:00 ... 2014-12-31 00:00:00
* member_id (member_id) object 8B 'r1i1p1f1'
Data variables:
od550aer (member_id, time) float64 1kB dask.array<chunksize=(1, 1), meta=np.ndarray>
Attributes:
intake_esm_vars: ['od550aer']
intake_esm_attrs:variable_id: od550aer
intake_esm_attrs:table_id: AERmon
intake_esm_attrs:source_id: EC-Earth3-AerChem
intake_esm_attrs:experiment_id: histSST
intake_esm_attrs:member_id: r1i1p1f1
intake_esm_attrs:grid_label: gn
intake_esm_attrs:activity_id: CMIP
intake_esm_attrs:institution_id: EC-Earth-Consortium
intake_esm_attrs:version: v20210309
intake_esm_attrs:_data_format_: netcdf
intake_esm_dataset_key: CMIP.EC-Earth-Consortium}
fig,ax = plt.subplots(figsize=(10,6))
for datakey, data in dataset_dict.items(): # Looping over ever varable in the dictionary
source_id = data.attrs['intake_esm_attrs:source_id']
member_id = data.attrs['intake_esm_attrs:member_id']
data['od550aer'].isel(member_id=0).plot(label=f'{source_id} {member_id}', ax=ax)
ax.legend()
<matplotlib.legend.Legend at 0x7fa1fc40f110>
Exporting a subset of the catalog#
Most likely you will only be analyzing a small subset of experiments the model experiment and it could be beneficial to work with a reduced catalog.
Below we will make a subset that only contain the information related to histSST
and histSST-piaer
, and only the absorption optical depth (od550abs
) and total optical depth (od550aer
).
col = intake.open_esm_datastore('/mnt/craas1-ns9989k-geo4992/data/cmip6.json') # Local data stored on NIRD
col = col.search(
experiment_id=['histSST', 'histSST-piaer'],
variable_id = ['od550abs', 'od550aer']
)
col
cmip6 catalog with 9 dataset(s) from 190 asset(s):
unique | |
---|---|
variable_id | 1 |
table_id | 1 |
source_id | 9 |
experiment_id | 1 |
member_id | 4 |
grid_label | 3 |
time_range | 185 |
activity_id | 2 |
institution_id | 9 |
version | 9 |
path | 190 |
dcpp_init_year | 0 |
derived_variable_id | 0 |
Then when we are happy with the selection the catalog can be exported as follows:
col.serialize(name='histSST-AerChemMIP',catalog_type='file',directory='~/')
Successfully wrote ESM catalog json file to: file:///home/fc-3auid-3a9fdc0c87-2d7836-2d4bdc-2db802-2d9a250c322e3b//histSST-AerChemMIP.json
col = intake.open_esm_datastore('~/histSST-AerChemMIP.json')
col
histSST-AerChemMIP catalog with 9 dataset(s) from 190 asset(s):
unique | |
---|---|
variable_id | 1 |
table_id | 1 |
source_id | 9 |
experiment_id | 1 |
member_id | 4 |
grid_label | 3 |
time_range | 185 |
activity_id | 2 |
institution_id | 9 |
version | 9 |
path | 190 |
dcpp_init_year | 0 |
derived_variable_id | 0 |