MetOcean data types¶
Martin Schultz, Forschungszentrum Jülich, Germany (firstname.lastname@example.org) - 11 July 2014
- MetOcean data types
An essential element of a robust data model for interoperable web applications such as JOIN is the definition and recognition of data types. As explained in the Coverage primer, most datasets in the fields of atmospheric and ocean sciences (short: MetOcean sciences) are discrete grid point coverages, but in order to fully understand and process such data, we also need information on the data coordinates, i.e. we must know how to place the data onto the Earth surface, to which altitude (or pressure) they refer, and for which date and time they are valid.
This document attempts a systematic classification of MetOcean data types as a preparatory step for defining and coding the JOIN data model, but also as information resource to anyone dealing with MetOcean data interoperability. I will be happy to collect feedback and receive criticism or updates concerning additional data types. The basis of this work comes from the community data model of Unidata, and from a CF convention track ticket which I opened, because I found inconsistencies among various tyologies of data (often confused with coverages), and because I felt that none of the existing typologies provides a complete hierarchy of MetOcean data types which is needed for a flexible data model.
Discrete grid point coverages¶
As explained in the coverage primer, the majority of MetOcean datasets can be subsumed under the class of "gridded datasets", where the grid is defined on discrete points and can have between 1 and 5 dimensions, which we call x, y, z, t, and i (first and second horizontal dimension, vertical dimension, time dimension, and ensemble dimension). Through association with a coordinate reference system (or at least a coordinate system), these dimensions become coordinates. Usually, the coordinates will be 1-dimensional, in which case we refer to the multi-dimensional grid as rectified (see the discussion on rectified grids in the overage primer). The most prominent exceptions are the 4-dimensional pressure or altitude coordinates of numerical model grids (see vertical coordinates). Scalar (numerical) features (i.e. 0-dimensional datasets) are considered as 1-element 1-dimensional coverages, i.e. point type datasets.
The following classification is structured according to the number of spatial dimensions of the MetOcean dataset. The t and i dimensions are treated independently and lead to the definition of subclasses of the spatial dataset types. For example, a point coverage type can be extended to a point_timeseries, a point_collection, or a point_timeseries_collection, while a rectified_grid_2d_xy may become a rectified_grid_2d_xy_timeseries, a rectified_grid_2d_ensemble, or a rectified_grid_2d_xy_timeseries_ensemble.
x(i), y(i), [z(i)], v(i,t) lists the existing coordinates and their dimensions and the variable v with its dimensions. Normally, we implictly assume that x and y are longitude and latitude coordinates, and that z is either a pressure or altitude coordinate. However, this is not a fundamental requirement and at least in theory, the datatype classes given below should also work with other coordinate types (the conversion to longitudes, latitudes, etc. is actually a role of the coordinate reference system).
Note that all coverage types define only the locations of one (central) point of each "grid cell". The description of boundaries may add extra complexity for example if the 2d nonrectified grid describes a curved plane that cuts across an arbitrary part of the 3d sphere.
Zero (spatial) dimensions: Points and trajectories¶
- point: a single location with optional vertical coordinate. All coordinates and the value are scalars unless they are extended to timeseries or collections. One-element one-dimensional or even multi-dimensional coordinates and variables that can be collapsed into scalars (example: extraction of a single grid point from a model grid) should not be considered point data, unless they are actually converted to scalar values.
x, y, [z], v
x, y, [z], v(t)
x(i), y(i), [z(i)], v(i)
x(i), y(i), [z(i)], v(i,t). In this case, all points share the same time axis. It is also conceivable to store multiple point_timeseries with individual time axes in one collection. However, as this will often result in sparsely populated matrices (timeseries of different lengths), we ignore this case for now. A suitable name would be point_collection_timeseries.
1. Data from irregular grids (e.g. icosahedric grids) may also be described as point_collection (or point_collection_timeseries).
2. Conceivably, x and y may not always be defined or may have different dimensionalities (for example points along one latitude line). It should be good practice to always specify x and y, and always define them with the same dimensions. If this is not the case we should flag this as error and deal with it when it actually occurs.
- trajectory: a location in space that changes with time. A trajectory will therefore always be a trajectory_timeseries or else it collapses into a point.
x(t), y(t), [z(t)], v(t)
x(i,t), y(i,t), [z(i,t)], v(i,t). The trajectories of a collection share the same time axis.
Figure 1: Point and trajectory data types
- rectified_grid_1d: a list of values that are arranged along a 1-dimensional coordinate axis (i.e. x, y, or z). The coordinate values need not be regularly spaced, although this will often be the case (and in practice we may indeed write subclasses for regular grids). Generally, coordinate values will be sorted in ascending order (latitude or vertical grids may sometimes be sorted in descending order instead).
x(x), [y], [z], v(x)
x(x), [y], [z], v(t,x)
x(x), [y], [z], v(i,x)
x(x), [y], [z], v(i,t,x)
[x], y(y), [z], v(y).
[x], y(y), [z], v(t,y).
[x], y(y), [z], v(i,y).
[x], y(y), [z], v(i,t,y).
[x], [y], z(z), v(z).
[x], [y], z(z), v(t,z).
[x], [y], z(z), v(i,z).
[x], [y], z(z), v(i,t,z).
1. As discussed in the coverage primer, datasets with the frequently used sigma- or hybrid sigma-pressure vertical coordinates are not rectified coverages. Yet, as long as we have a single column, or all columns in the ensemble share the same z coordinate, such data can be treated as rectified coverages. In the case of simple altitude or pressure coordinates this is also true.
- nonrectified_grid_1d: a list of values that are arranged along an arbitrary "line" in 3-dimensional space. Any
x, y, zcoordinate may be a 1-dimensional array, a scalar, or missing. All 1-dimensional coordinate axes must have the same dimension (i.e. length). Examples are:
x, y(n), z(n), v(n)(here, x is a scalar value),
x(n), y(n), v(n) (here *z* is a missing coordinate, i.e. one usually assumes that this line is at the Earth surface), or @x(n), y(n), z(n), v(n)(here, all spatial coordinates are defined).
x(n), y(n), z(n), v(t,n)
x(n), y(n), z(n), v(i,n)
x(n), y(n), z(n), v(i,t,n)
1. Conceivably, the x, y, or z coordinates of a nonrectified_grid_1d_ensemble could be different for each ensemble member. If this is a relevant datatype (I am not aware of such data myself), then one should probably use the term "collection" rather than "ensemble" for this and define the appropriate classes accordingly.
2. Even though datasets from models with irregular grids (e.g. icosahedric, or various ocean model grids) may formally be similar to nonrectified_grid_1d data types, they are actually point_collection data, because the grid type requires all coordinates to be sorted in ascending or descending order.
Figure 2: 1-dimensional discrete grid point data types
- rectified_grid_2d: an array of values that are arranged along two 1-dimensional coordinate axes (i.e. longitude/latitude, longitude/vertical, or latitude/vertical). The coordinate values need not be regularly spaced. Generally, coordinate values will be sorted in ascending order (latitude or vertical grids may sometimes be sorted in descending order instead).
x(x), y(y), [z], v(y,x); timeseries and ensemble will change the values to
x(x), y(y), [z], v(t,y,x)
x(x), y(y), [z], v(i,y,x)
x(x), y(y), [z], v(i,t,y,x)
x(x), [y], z(z), v(z,x)
x(x), [y], z(z), v(t,z,x)
x(x), [y], z(z), v(i,z,x)
x(x), [y], z(z), v(i,tz,x)
[x], y(y), z(z), v(z,y)
[x], y(y), z(z), v(t,z,y)
[x], y(y), z(z), v(i,z,y)
[x], y(y), z(z), v(i,t,z,y)
- nonrectified_grid_2d: an array of values that are arranged along an arbitrary "plane" in 3-dimensional space. At least one of
x, y, zcoordinate will be a 2-dimensional array, the other coordinates may be 2-d, 1-d, scalar, or missing. All 2-dimensional coordinate axes must have the same shape. Example:
x(k,n), y(k,n), z(n), v(k,n).
x(k,n), y(k,n), z(k,n), v(t,k,n)
x(k,n), y(k,n), z(k,n), v(i,k,n)
x(k,n), y(k,n), z(k,n), v(i,t,k,n)
x(t), y(t), z(t,z), v(t,z). Vertical profiling from a moving object (e.g. aircraft) yields a special type of 2-dimensional grid, similar to a trajectory (see figure 4 below). If only one time point is extracted from such a "grid", it can be collapsed into a (rectified) 1-dimensional grid coverage (rectified_grid_1d_z).
Figure 3: 2-dimensional discrete grid point data types
Figure 4: Example of a profile_trajectory_timeseries datatype (image from http://www.esrl.noaa.gov/csd/groups/csd3/instruments/topaz/)
x(x), y(y), z(z), v(z,y,x). An array of values that are arranged along three 1-dimensional coordinate axes (for example longitude/latitude/altitude or pressure).
x(x), y(y), z(z), v(t,z,y,x)
x(x), y(y), z(z), v(i,z,y,x)
x(x), y(y), z(z), v(i,t,z,y,x)
x(x), y(y), z(z), v(z,y,x). Similar to the generic rectified_grid_3d, but with special handling of vertical coordinates (see note 1 below).
x(x), y(y), z(z), v(t,z,y,x)
x(x), y(y), z(z), v(i,z,y,x)
x(x), y(y), z(z), v(i,t,z,y,x)
1. As mentioned several times, model datasets with sigma- or hybrid sigma-pressure vertical coordinates do not constitute rectified coverages. Since this is a very common case, however, and the vertical coordinates of such data are described with 1-dimensional coefficients, we introduce a special class hierarchy under the general category of rectified grids (the handling of such data is more similar to recitifed grids than to nonrectified grids).
- nonrectified_grid_3d: :
x(k,m,n), y(k,m,n), z(k,m,n), v(k,m,n). An array of values whose positions are determined by 3-dimensional coordinate arrays.
x(k,m,n), y(k,m,n), z(k,m,n), v(t,k,m,n)
x(k,m,n), y(k,m,n), z(k,m,n), v(i,k,m,n)
x(k,m,n), y(k,m,n), z(k,m,n), v(i,t,k,m,n)
Similarly to the 2-dimensional profile_trajectory_timeseries one can also define a cross_section_trajectory_timeseries which depicts for example an "aircraft flight corridor". This will require some further thinking, because the orientation of the cross sections will usually be constant relative to the aircraft, which means that it will change over time when projected onto a latitude-longitude coordinate system. A possible representation could be
x(t,k,m,n), y(t,k,m,n), z(t,k,m,n), v(t,k,m,n)
Figure 5: Examples for 3-dimensional MetOcean data types