Regridding model data with xESMF#

Import python packages#

# supress warnings
import warnings
warnings.filterwarnings('ignore') # don't output warnings

import os
# import packages
import xarray as xr
xr.set_options(display_style='html')
import intake
import cftime
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import matplotlib.path as mpath
import numpy as np
import xesmf as xe
from pathlib import Path
%matplotlib inline

Set path to save data:#

Further down in this notebook, an intermediate data product is saved. It is often smart to put this in another directory than your code directory. Below we set the path where this is saved:

!pwd
/home/fc-3auid-3aeb54868a-2dc3bf-2d4cf2-2d9a1a-2d7ab42345078a/eScience2024/docs/learning/notebooks

You can copy the home path from above, or you can make a relative path.

path_to_save_data = '../../../../Data_processed'
path_to_save_data = Path(path_to_save_data)

path_to_save_data.mkdir(exist_ok=True)

Open CMIP6 online catalog#

cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col

pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):

unique
activity_id 18
institution_id 36
source_id 88
experiment_id 170
member_id 657
table_id 37
variable_id 700
grid_label 10
zstore 514818
dcpp_init_year 60
version 736
derived_variable_id 0

Get data in xarray#

Search od550aer variable dataset#

cat = col.search(experiment_id=['historical'], variable_id='od550aer', member_id=['r1i1p1f1'], grid_label='gn')
cat.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP NCAR CESM2-WACCM historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/NCAR/CESM2-WACCM/histori... NaN 20190227
1 CMIP NCAR CESM2 historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r1... NaN 20190308
2 CMIP CCCma CanESM5 historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical... NaN 20190429
3 CMIP HAMMOZ-Consortium MPI-ESM-1-2-HAM historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/HAMMOZ-Consortium/MPI-ES... NaN 20190627
4 CMIP MPI-M MPI-ESM1-2-LR historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/hist... NaN 20190710
5 CMIP MPI-M MPI-ESM1-2-HR historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-HR/hist... NaN 20190710
6 CMIP NCC NorESM2-LM historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/NCC/NorESM2-LM/historica... NaN 20190815
7 CMIP BCC BCC-ESM1 historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/BCC/BCC-ESM1/historical/... NaN 20190918
8 CMIP NCAR CESM2-WACCM-FV2 historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/NCAR/CESM2-WACCM-FV2/his... NaN 20191120
9 CMIP NCAR CESM2-FV2 historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historica... NaN 20191120
10 CMIP MRI MRI-ESM2-0 historical r1i1p1f1 AERmon od550aer gn gs://cmip6/CMIP6/CMIP/MRI/MRI-ESM2-0/historica... NaN 20200207
cat.df['source_id'].unique()
array(['CESM2-WACCM', 'CESM2', 'CanESM5', 'MPI-ESM-1-2-HAM',
       'MPI-ESM1-2-LR', 'MPI-ESM1-2-HR', 'NorESM2-LM', 'BCC-ESM1',
       'CESM2-WACCM-FV2', 'CESM2-FV2', 'MRI-ESM2-0'], dtype=object)

Create dictionary from the list of datasets we found#

  • This step may take several minutes so be patient!

dset_dict = cat.to_dataset_dict(zarr_kwargs={'use_cftime':True})
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [11/11 00:19<00:00]
list(dset_dict.keys())
['CMIP.NCC.NorESM2-LM.historical.AERmon.gn',
 'CMIP.MRI.MRI-ESM2-0.historical.AERmon.gn',
 'CMIP.CCCma.CanESM5.historical.AERmon.gn',
 'CMIP.NCAR.CESM2.historical.AERmon.gn',
 'CMIP.BCC.BCC-ESM1.historical.AERmon.gn',
 'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.AERmon.gn',
 'CMIP.NCAR.CESM2-WACCM-FV2.historical.AERmon.gn',
 'CMIP.NCAR.CESM2-FV2.historical.AERmon.gn',
 'CMIP.MPI-M.MPI-ESM1-2-HR.historical.AERmon.gn',
 'CMIP.MPI-M.MPI-ESM1-2-LR.historical.AERmon.gn',
 'CMIP.NCAR.CESM2-WACCM.historical.AERmon.gn']

Select model and visualize a single date#

  • Use data as xarray to make a simple plot

ds = dset_dict['CMIP.NCC.NorESM2-LM.historical.AERmon.gn']
ds
<xarray.Dataset> Size: 110MB
Dimensions:         (lat: 96, bnds: 2, lon: 144, member_id: 1,
                     dcpp_init_year: 1, time: 1980)
Coordinates:
  * lat             (lat) float64 768B -90.0 -88.11 -86.21 ... 86.21 88.11 90.0
    lat_bnds        (lat, bnds) float64 2kB dask.array<chunksize=(96, 2), meta=np.ndarray>
  * lon             (lon) float64 1kB 0.0 2.5 5.0 7.5 ... 352.5 355.0 357.5
    lon_bnds        (lon, bnds) float64 2kB dask.array<chunksize=(144, 2), meta=np.ndarray>
  * time            (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:...
    time_bnds       (time, bnds) object 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    wavelength      float64 8B ...
  * member_id       (member_id) object 8B 'r1i1p1f1'
  * dcpp_init_year  (dcpp_init_year) float64 8B nan
Dimensions without coordinates: bnds
Data variables:
    od550aer        (member_id, dcpp_init_year, time, lat, lon) float32 109MB dask.array<chunksize=(1, 1, 990, 96, 144), meta=np.ndarray>
Attributes: (12/65)
    Conventions:                      CF-1.7 CMIP-6.2
    activity_id:                      CMIP
    branch_method:                    Hybrid-restart from year 1600-01-01 of ...
    branch_time:                      0.0
    branch_time_in_child:             0.0
    branch_time_in_parent:            430335.0
    ...                               ...
    intake_esm_attrs:variable_id:     od550aer
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/CMIP/NCC/NorESM2-LM/hi...
    intake_esm_attrs:version:         20190815
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           CMIP.NCC.NorESM2-LM.historical.AERmon.gn

Plot on NorthPolarStereo and set the latitude limit#

def polarCentral_set_latlim(lat_lims, ax):
    ax.set_extent([-180, 180, lat_lims[0], lat_lims[1]], ccrs.PlateCarree())
    # Compute a circle in axes coordinates, which we can use as a boundary
    # for the map. We can pan/zoom as much as we like - the boundary will be
    # permanently circular.
    theta = np.linspace(0, 2*np.pi, 100)
    center, radius = [0.5, 0.5], 0.5
    verts = np.vstack([np.sin(theta), np.cos(theta)]).T
    circle = mpath.Path(verts * radius + center)

    ax.set_boundary(circle, transform=ax.transAxes)
fig = plt.figure(1, figsize=[10,10])

# Fix extent
minval = 0
maxval = 0.3

ax = plt.subplot(1, 1, 1, projection=ccrs.NorthPolarStereo())
ax.coastlines()
ax.gridlines()
polarCentral_set_latlim([50,90], ax)
(ds['od550aer']
 .sel(time=cftime.DatetimeNoLeap(1985, 1, 16, 12, 0, 0, 0))
 .plot(ax=ax, vmin=minval, vmax=maxval, transform=ccrs.PlateCarree(), cmap='Reds')
)
<cartopy.mpl.geocollection.GeoQuadMesh at 0x7f8eb5261650>
../../../_images/9516160a25fe41c38685ac07a67cb3e9ba2e920f79fbf4a342438849c564dc5a.png

Get attributes (unique identifier)#

ds.attrs['tracking_id']
'hdl:21.14100/efd7a56e-94a8-47f5-b3d8-06ae02268192\nhdl:21.14100/0c3683e7-1c3f-45d6-bbc7-414f68e7a801\nhdl:21.14100/a83a3a96-0d16-4f3e-aa88-a68f00e1ce2e\nhdl:21.14100/7a629db0-dda1-445d-a496-2e77c9c7c20a\nhdl:21.14100/388888f8-7ee4-467f-aca9-8a6e45361f55\nhdl:21.14100/7aa3797a-b0e7-427d-a209-efed00cd1724\nhdl:21.14100/33d3ae45-cb42-47e5-8c08-76df86d298a8\nhdl:21.14100/10f9c9e3-3d54-494a-b153-ea63c5b584c7\nhdl:21.14100/2a3a1f67-8890-4e89-b6de-84b2b13cf70e\nhdl:21.14100/447d4151-8161-461a-a66e-f21144baabf6\nhdl:21.14100/c1bd13af-f8b2-4b18-bd2c-34b13a4921dc\nhdl:21.14100/eee79e49-dd70-4d8f-b195-c885aca26e3a\nhdl:21.14100/7cd2e526-a94c-4004-bfe8-32832a9df6d7\nhdl:21.14100/b2421ba2-c8c9-44c0-84af-f9b09cb87759\nhdl:21.14100/f87b3cf5-e68d-40f2-a289-667b4cf7d15f\nhdl:21.14100/a6542f1a-0567-4e7a-b063-f5609f017d69\nhdl:21.14100/01e4e8d6-b84e-4d25-9f50-4a9ea5588f07'

Regrid CMIP6 data to common NorESM2-LM grid#

  • Select a time range

  • we use squeeze to remove dimension with one element only e.g. member_id=’r1i1p1f1’

starty = '1985-01'
endy = '1986-12'
# Read in the output grid from NorESM
ds_out = ds.sel(time = slice(starty, endy)).squeeze() 
ds_out
<xarray.Dataset> Size: 1MB
Dimensions:         (lat: 96, bnds: 2, lon: 144, time: 24)
Coordinates:
  * lat             (lat) float64 768B -90.0 -88.11 -86.21 ... 86.21 88.11 90.0
    lat_bnds        (lat, bnds) float64 2kB dask.array<chunksize=(96, 2), meta=np.ndarray>
  * lon             (lon) float64 1kB 0.0 2.5 5.0 7.5 ... 352.5 355.0 357.5
    lon_bnds        (lon, bnds) float64 2kB dask.array<chunksize=(144, 2), meta=np.ndarray>
  * time            (time) object 192B 1985-01-16 12:00:00 ... 1986-12-16 12:...
    time_bnds       (time, bnds) object 384B dask.array<chunksize=(24, 2), meta=np.ndarray>
    wavelength      float64 8B ...
    member_id       <U8 32B 'r1i1p1f1'
    dcpp_init_year  float64 8B nan
Dimensions without coordinates: bnds
Data variables:
    od550aer        (time, lat, lon) float32 1MB dask.array<chunksize=(24, 96, 144), meta=np.ndarray>
Attributes: (12/65)
    Conventions:                      CF-1.7 CMIP-6.2
    activity_id:                      CMIP
    branch_method:                    Hybrid-restart from year 1600-01-01 of ...
    branch_time:                      0.0
    branch_time_in_child:             0.0
    branch_time_in_parent:            430335.0
    ...                               ...
    intake_esm_attrs:variable_id:     od550aer
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/CMIP/NCC/NorESM2-LM/hi...
    intake_esm_attrs:version:         20190815
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           CMIP.NCC.NorESM2-LM.historical.AERmon.gn

Save regridded data intermediately#

# create dictionary for reggridded data
ds_regrid_dict = dict()
for key in dset_dict.keys():
    print(key)
    ds_in = dset_dict[key]
    ds_in = ds_in.sel(time=slice(starty, endy)).squeeze()
    regridder = xe.Regridder(ds_in, ds_out, 'bilinear')
    # Apply regridder to data
    # the entire dataset can be processed at once
    ds_in_regrid = regridder(ds_in, keep_attrs=True)
    # Save to netcdf file
    model = key.split('.')[2]
    filename = 'od550aer_AERmon.nc'
    savepath = path_to_save_data / 'CMIP6_hist' / model
    savepath.mkdir(parents=True, exist_ok=True)
    filename_out = savepath / filename 
    ds_in_regrid.to_netcdf(filename_out)
    # create dataset with all models
    ds_regrid_dict[model] = ds_in_regrid
    print(f'file written: {filename_out}')
CMIP.NCC.NorESM2-LM.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/NorESM2-LM/od550aer_AERmon.nc
CMIP.MRI.MRI-ESM2-0.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/MRI-ESM2-0/od550aer_AERmon.nc
CMIP.CCCma.CanESM5.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/CanESM5/od550aer_AERmon.nc
CMIP.NCAR.CESM2.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/CESM2/od550aer_AERmon.nc
CMIP.BCC.BCC-ESM1.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/BCC-ESM1/od550aer_AERmon.nc
CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/MPI-ESM-1-2-HAM/od550aer_AERmon.nc
CMIP.NCAR.CESM2-WACCM-FV2.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/CESM2-WACCM-FV2/od550aer_AERmon.nc
CMIP.NCAR.CESM2-FV2.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/CESM2-FV2/od550aer_AERmon.nc
CMIP.MPI-M.MPI-ESM1-2-HR.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/MPI-ESM1-2-HR/od550aer_AERmon.nc
CMIP.MPI-M.MPI-ESM1-2-LR.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/MPI-ESM1-2-LR/od550aer_AERmon.nc
CMIP.NCAR.CESM2-WACCM.historical.AERmon.gn
file written: ../../../../Data_processed/CMIP6_hist/CESM2-WACCM/od550aer_AERmon.nc

Concatenate all models#

import cftime
for k in ds_regrid_dict.keys():
    _ds = ds_regrid_dict[k]

    _ds= _ds.resample(time='1M').mean('time')
    # change calendar 
    units = 'days since 1900-01-01'
    _t = _ds.time
    _ds['time'] = cftime.date2num(_t.values, 'days since 1900-01-01')
    _ds['time'].attrs['units'] = units
    _ds = xr.decode_cf(_ds)
    ds_regrid_dict[k] = _ds
_ds = list(ds_regrid_dict.values())
_coord = list(ds_regrid_dict.keys())
ds_out_regrid = xr.concat(objs=_ds, dim=_coord, coords="all").rename({'concat_dim':'model'})
ds_out_regrid
<xarray.Dataset> Size: 29MB
Dimensions:         (lat: 96, lon: 144, time: 48, model: 11)
Coordinates:
  * lat             (lat) float64 768B -90.0 -88.11 -86.21 ... 86.21 88.11 90.0
  * lon             (lon) float64 1kB 0.0 2.5 5.0 7.5 ... 352.5 355.0 357.5
  * time            (time) datetime64[ns] 384B 1985-01-10 ... 1986-12-31
    wavelength      (model) float64 88B 550.0 550.0 550.0 ... 550.0 550.0 550.0
    member_id       (model) <U8 352B 'r1i1p1f1' 'r1i1p1f1' ... 'r1i1p1f1'
    dcpp_init_year  (model) float64 88B nan nan nan nan nan ... nan nan nan nan
  * model           (model) object 88B 'NorESM2-LM' ... 'CESM2-WACCM'
Data variables:
    od550aer        (model, time, lat, lon) float32 29MB dask.array<chunksize=(1, 48, 64, 128), meta=np.ndarray>
Attributes: (12/66)
    Conventions:                      CF-1.7 CMIP-6.2
    activity_id:                      CMIP
    branch_method:                    Hybrid-restart from year 1600-01-01 of ...
    branch_time:                      0.0
    branch_time_in_child:             0.0
    branch_time_in_parent:            430335.0
    ...                               ...
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/CMIP/NCC/NorESM2-LM/hi...
    intake_esm_attrs:version:         20190815
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           CMIP.NCC.NorESM2-LM.historical.AERmon.gn
    regrid_method:                    bilinear

Compute seasonal mean of all regridded models#

ds_seas = ds_out_regrid.mean('model', keep_attrs=True, skipna = True).groupby('time.season').mean('time', keep_attrs=True, skipna = True)
ds_seas
<xarray.Dataset> Size: 223kB
Dimensions:   (season: 4, lat: 96, lon: 144)
Coordinates:
  * lat       (lat) float64 768B -90.0 -88.11 -86.21 -84.32 ... 86.21 88.11 90.0
  * lon       (lon) float64 1kB 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * season    (season) object 32B 'DJF' 'JJA' 'MAM' 'SON'
Data variables:
    od550aer  (season, lat, lon) float32 221kB dask.array<chunksize=(4, 64, 128), meta=np.ndarray>
Attributes: (12/66)
    Conventions:                      CF-1.7 CMIP-6.2
    activity_id:                      CMIP
    branch_method:                    Hybrid-restart from year 1600-01-01 of ...
    branch_time:                      0.0
    branch_time_in_child:             0.0
    branch_time_in_parent:            430335.0
    ...                               ...
    intake_esm_attrs:grid_label:      gn
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/CMIP/NCC/NorESM2-LM/hi...
    intake_esm_attrs:version:         20190815
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           CMIP.NCC.NorESM2-LM.historical.AERmon.gn
    regrid_method:                    bilinear
ds_seas['od550aer'].min().compute(), ds_seas['od550aer'].max().compute()
(<xarray.DataArray 'od550aer' ()> Size: 4B
 array(0.00279297, dtype=float32),
 <xarray.DataArray 'od550aer' ()> Size: 4B
 array(1.4087572, dtype=float32))

Save seasonal mean in a new netCDF file#

path_to_save_data
PosixPath('../../../../Data_processed')
filename = path_to_save_data / 'CMIP6_hist'/'od550aer_seasonal.nc'
ds_seas.to_netcdf(filename)

Visualize final results (seasonal mean for all models)#

import matplotlib
proj_plot = ccrs.Mercator()

p = ds_seas['od550aer'].plot(x='lon', y='lat', 
                             transform=ccrs.PlateCarree(),
                             aspect=ds_seas.dims["lon"] / ds_seas.dims["lat"],  # for a sensible figsize
                             subplot_kws={"projection": proj_plot},
                             col='season', col_wrap=2, robust=True, 
                             cmap='Reds')
# We have to set the map's options on all four axes
for ax,i in zip(p.axes.flat, ds_seas.season.values):
    ax.coastlines()
    ax.set_title('Season '+i, fontsize=18)
fig = matplotlib.pyplot.gcf()
fig.set_size_inches(18.5, 10.5)
fig.savefig('od550aer_seasonal_mean.png', dpi=100)
../../../_images/ff253ea2168528b770c9972440f2be87c79a22a443f0bffa0f94d4561feef398.png