Conversion to time series format¶
For a lot of applications it is favorable to convert the image based format into a format which is optimized for fast time series retrieval. This is what we often need for e.g. validation studies. This can be done by stacking the images into a netCDF file and choosing the correct chunk sizes or a lot of other methods. We have chosen to do it in the following way:
- Store only the reduced gaußian grid points since that saves space.
- Further reduction the amount of stored data by saving only land points if selected.
- Store the time series in netCDF4 in the Climate and Forecast convention Orthogonal multidimensional array representation
- Store the time series in 5x5 degree cells. This means there will be 2566 cell
files (without reduction to land points) and a file called
grid.nc
which contains the information about which grid point is stored in which file. This allows us to read a whole 5x5 degree area into memory and iterate over the time series quickly.
This conversion can be performed using the merra_repurpose
command line
program. An example would be:
merra_repurpose /merra2_data /timeseries/data -s 2000-01-01 -e 2018-11-30 --parameters SFMC RZMC --temporal_sampling 6
Which would take MERRA-2 data stored in /merra2_data
from January 1st 2000
to November 30th 2018 and store the parameters for 6-hourly sampled surface
(SFMC) and root zone soil moisture (RZMC) as time series
in the folder /timeseries/data
.
Conversion to time series is performed by the repurpose package in the background. For custom settings
or other options see the repurpose documentation and the code in
merra.reshuffle
.
Note: If a RuntimeError: NetCDF: Bad chunk sizes.
appears during reshuffling, consider downgrading the
netcdf4 library via:
conda install -c conda-forge netcdf4=1.2.2
if you are on Python 2.* and
conda install -c conda-forge netcdf4=1.2.8
if you are using Python 3.*.
Reading converted time series data¶
For reading the data the merra_repurpose
command produces the class
MerraTs
:
from merra.interface import MerraTs
# specify path to data folder
path = '../timeseries/data'
# specify location lon and lat
lon, lat = (16.375, 48.125)
# initialize the time series class
merra_reader = MerraTs(ts_path=path,
ioclass_kws={'read_bulk':True},
parameters=['SFMC'])
# read SFMC time series at the location
ts = merra_reader.read(lon, lat)