Use pyhomogenize to check netCDF file(s) time axis; time_control

Now, we want to use pyhomogenize’s time_control class. We open a test netCDF file. This will be done automatically by calling the class.

[1]:
import pyhomogenize as pyh
[2]:
time_control = pyh.time_control(pyh.test_netcdf[0])
time_control.ds
[2]:
<xarray.Dataset>
Dimensions:       (time: 7, bnds: 2, rlat: 412, rlon: 424, vertices: 4)
Coordinates:
  * time          (time) object 2007-01-16 12:00:00 ... 2007-07-16 12:00:00
    lon           (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray>
    lat           (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray>
  * rlon          (rlon) float64 -28.38 -28.26 -28.16 ... 17.93 18.05 18.16
  * rlat          (rlat) float64 -23.38 -23.26 -23.16 ... 21.61 21.73 21.83
    height        float64 ...
Dimensions without coordinates: bnds, vertices
Data variables:
    time_bnds     (time, bnds) object dask.array<chunksize=(1, 2), meta=np.ndarray>
    lon_bnds      (rlat, rlon, vertices) float64 dask.array<chunksize=(412, 424, 4), meta=np.ndarray>
    lat_bnds      (rlat, rlon, vertices) float64 dask.array<chunksize=(412, 424, 4), meta=np.ndarray>
    rotated_pole  int32 ...
    tas           (time, rlat, rlon) float32 dask.array<chunksize=(1, 412, 424), meta=np.ndarray>
Attributes: (12/26)
    CDI:                            Climate Data Interface version ?? (http:/...
    history:                        Fri Mar 25 10:44:26 2022: cdo seldate,200...
    source:                         CLMcom-CCLM4-8-17
    institution:                    CLMcom, Climate Limited-area Modelling Co...
    Conventions:                    CF-1.4
    contact:                        klima.projektionen@dwd.de
    ...                             ...
    project_id:                     CORDEX
    product:                        output
    frequency:                      mon
    tracking_id:                    490ab140-e096-11e7-b22c-81c28a935756
    creation_date:                  2017-12-14T07:16:13Z
    CDO:                            Climate Data Operators version 1.9.3 (htt...

Let’s have a look on the datasets’s time axis

[3]:
time_control.time
[3]:
CFTimeIndex([2007-01-16 12:00:00, 2007-02-15 00:00:00, 2007-03-16 12:00:00,
             2007-04-16 00:00:00, 2007-05-16 12:00:00, 2007-06-16 00:00:00,
             2007-07-16 12:00:00],
            dtype='object', length=7, calendar='noleap', freq='None')

We can check whether the time axis contains duplicated, missing or redundant time steps. A redundant time step is a time steps that does not math with the dataset’s calendar and/or frequency.

[4]:
duplicates = time_control.get_duplicates()
redundants = time_control.get_redundants()
missings = time_control.get_missings()
[5]:
duplicates, redundants, missings
[5]:
('', '', '')

We see the time axis doesn’t contain any incorrect time steps and no time steps are missing. Not really a auspicious example. We can combine the three above requests by using the function check_timestamps.

[6]:
timechecker1 = time_control.check_timestamps()
timechecker1
[6]:
<pyhomogenize._time_control.time_control at 0x7f7dca066790>

As we can see the functions returns a time_control object again but with three new attributes.

[7]:
timechecker1.duplicated_timesteps, timechecker1.missing_timesteps, timechecker1.redundant_timesteps
[7]:
({'tas': ''}, {'tas': ''}, {'tas': ''})

We want to test the time axis only for duplicated time steps.

timechecker2 = time_control.check_timestamps(selection=’duplicates’) timechecker2.duplicated_timesteps

By setting the parameter correct to the boolean value True we can delete the duplicated and redundant time steps if exisitng. Of course, in our great example this is not the case.

[8]:
timechecker3 = time_control.check_timestamps(correct=True)
timechecker3.time
[8]:
CFTimeIndex([2007-01-16 12:00:00, 2007-02-15 00:00:00, 2007-03-16 12:00:00,
             2007-04-16 00:00:00, 2007-05-16 12:00:00, 2007-06-16 00:00:00,
             2007-07-16 12:00:00],
            dtype='object', length=7, calendar='noleap', freq='None')

We can set the parameter output to select the dataset’s output file name on disk. If so the parameter correct is automatically set to True.

[9]:
timechecker4 = time_control.check_timestamps(output="output.nc")

Now, we want to sleect a specific time range. We copy out time_control object to keep the original object.

[10]:
from copy import copy

time_control1 = copy(time_control)
selected1 = time_control1.select_time_range(["2007-02-01", "2007-03-30"])
selected1
[10]:
<pyhomogenize._time_control.time_control at 0x7f7dc97fab80>

Here again, we get a time_control object. But now with a different time axis.

[11]:
selected1.time
[11]:
CFTimeIndex([2007-02-15 00:00:00, 2007-03-16 12:00:00],
            dtype='object', length=2, calendar='noleap', freq=None)

Of course, we can write the result as netCDF file on disk.

[12]:
time_control2 = copy(time_control)
selected2 = time_control2.select_time_range(
    ["2007-02-01", "2007-03-30"], output="output.nc"
)
selected2.ds
[12]:
<xarray.Dataset>
Dimensions:       (time: 2, bnds: 2, rlat: 412, rlon: 424, vertices: 4)
Coordinates:
  * time          (time) object 2007-02-15 00:00:00 2007-03-16 12:00:00
    lon           (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray>
    lat           (rlat, rlon) float64 dask.array<chunksize=(412, 424), meta=np.ndarray>
  * rlon          (rlon) float64 -28.38 -28.26 -28.16 ... 17.93 18.05 18.16
  * rlat          (rlat) float64 -23.38 -23.26 -23.16 ... 21.61 21.73 21.83
    height        float64 ...
Dimensions without coordinates: bnds, vertices
Data variables:
    time_bnds     (time, bnds) object dask.array<chunksize=(1, 2), meta=np.ndarray>
    lon_bnds      (rlat, rlon, vertices) float64 dask.array<chunksize=(412, 424, 4), meta=np.ndarray>
    lat_bnds      (rlat, rlon, vertices) float64 dask.array<chunksize=(412, 424, 4), meta=np.ndarray>
    rotated_pole  int32 ...
    tas           (time, rlat, rlon) float32 dask.array<chunksize=(1, 412, 424), meta=np.ndarray>
Attributes: (12/26)
    CDI:                            Climate Data Interface version ?? (http:/...
    history:                        Fri Mar 25 10:44:26 2022: cdo seldate,200...
    source:                         CLMcom-CCLM4-8-17
    institution:                    CLMcom, Climate Limited-area Modelling Co...
    Conventions:                    CF-1.4
    contact:                        klima.projektionen@dwd.de
    ...                             ...
    project_id:                     CORDEX
    product:                        output
    frequency:                      mon
    tracking_id:                    490ab140-e096-11e7-b22c-81c28a935756
    creation_date:                  2017-12-14T07:16:13Z
    CDO:                            Climate Data Operators version 1.9.3 (htt...

If we want to crop or limit the time axis to a user-specified start and end month values as shown in the above example basics.date_range_to_frequency_limits we can do this with netCDF files as well. The time axis should start with the start of an arbitrary season and end with the end of an arbitrary season.

[13]:
time_control3 = copy(time_control)
selected3 = time_control3.select_limited_time_range(
    smonth=[3, 6, 9, 12], emonth=[2, 5, 8, 11], output="output.nc"
)
selected3.time
[13]:
CFTimeIndex([2007-03-16 12:00:00, 2007-04-16 00:00:00, 2007-05-16 12:00:00],
            dtype='object', length=3, calendar='noleap', freq='732H')

Now, we want to check whether the time axis is within certain left and right bounds.

[14]:
time_control.within_time_range(["2007-02-01", "2007-03-30"])
[14]:
True
[15]:
time_control.within_time_range(["2007-02-01", "2008-03-30"])
[15]:
False
[16]:
time_control.within_time_range(["20070201", "20070330"], fmt="%Y%m%d")
[16]:
True