ochanticipy.utils package¶
Submodules¶
ochanticipy.utils.check_extra_imports module¶
Check imports that are in extras_require.
- ochanticipy.utils.check_extra_imports.check_extra_imports(libraries: list, subpackage: str)[source]¶
Check that libraries are installed and available.
- Parameters:
libraries (str) – List of libraries to check.
subpackage (str) – String of subpackage defined for extra_requires that import should warn to install from.
ochanticipy.utils.check_file_existence module¶
Function for checking file existence.
- ochanticipy.utils.check_file_existence.check_file_existence(wrapper=None, enabled=None, adapter=None, proxy=<class 'FunctionWrapper'>) F[source]¶
Don’t overwrite existing data.
Avoid recreating data if it already exists and if clobber not toggled by user. Used to wrap functions that accept filepath as a keyword argument.
- Parameters:
wrapped (function) – The function to wrap. The function must have “filepath” as a keyword parameter, and it can also have an optional “clobber” boolean keyword parameter.
instance (Optional[DataSource]) – Object the wrapped function is bound to. Not used within, but ensures that instance methods do not pass self to args.
args (list) – List of positional arguments.
kwargs (dict) – Dictionary of keyword arguments
- Returns:
If filepath exists and clobber is False, returns filepath.
Otherwise, returns the result of the decorated function.
- Raises:
KeyError – If filepath or clobber are not passed as kwargs.
ochanticipy.utils.dates module¶
Functions for dealing with dates.
- ochanticipy.utils.dates.compare_dekads_gt(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) bool[source]¶
Is year1/dekad1 greater than year2/dekad2.
Compare two pairs of years and dekads, that the first pair are greater than the second pair.
- ochanticipy.utils.dates.compare_dekads_gte(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) bool[source]¶
Is year1/dekad1 greater than or equal to year2/dekad2.
Compare two pairs of years and dekads, that the first pair are greater than or equal to the second pair.
- ochanticipy.utils.dates.compare_dekads_lt(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) bool[source]¶
Is year1/dekad1 less than year2/dekad2.
Compare two pairs of years and dekads, that the first pair are less than the second pair.
- ochanticipy.utils.dates.compare_dekads_lte(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) bool[source]¶
Is year1/dekad1 less than or equal to year2/dekad2.
Compare two pairs of years and dekads, that the first pair are less than or equal to the second pair.
- ochanticipy.utils.dates.date_to_dekad(date_obj: date) Tuple[int, int][source]¶
Compute dekad and year from date.
Dekad computed from date. This is based on the common dekadal definition of the 1st and 2nd dekad of a month being the first 10 day periods, and the 3rd dekad being the remaining days within that month.
- ochanticipy.utils.dates.dekad_to_date(dekad: Tuple[int, int]) date[source]¶
Compute date from dekad and year.
Date computed from dekad and year in datetime object, corresponding to first day of the dekad. This is based on the common dekadal definition of the 1st and 2nd dekad of a month being the first 10 day periods, and the 3rd dekad being the remaining days within that month.
- ochanticipy.utils.dates.expand_dekads(dekad1: Tuple[int, int], dekad2: Tuple[int, int]) List[Tuple[int, int]][source]¶
Expand for all years/dekads between two dates.
Takes input year and dekads and returns a list of year/dekad lists.
- ochanticipy.utils.dates.get_date_from_user_input(input_date: date | str) date[source]¶
Return date from string or date input.
Processes input data in either
datetime.dateformat or as an ISO8601 string. Generates error message if different object provided.- Parameters:
input_date (Union[date, str]) –
datetime.dateobject or ISO8601 string.- Returns:
datetime.date- Return type:
date
- ochanticipy.utils.dates.get_dekadal_date(input_date: date | str | Tuple[int, int] | None, default_date: date | str | Tuple[int, int] | None = None) Tuple[int, int][source]¶
Calculate dekadal date from general input.
Processes input
input_dateand returns two values, the year and dekad. Input can be of formatdatetime.date, an ISO8601 date string, an already calculated(year, dekad)format date, orNone. IfNone,default_dateis returned.default_datecan also be passed in the above formats.
ochanticipy.utils.geoboundingbox module¶
Functionality to retrieve and modify boundary coordinates.
It is possible to create an GeoBoundingBox object either from
lat_max, lat_min, lon_max, lon_min coordinates,
or from a shapefile that has been read in with geopandas.
- class ochanticipy.utils.geoboundingbox.GeoBoundingBox(lat_max: float, lat_min: float, lon_max: float, lon_min: float)[source]¶
Bases:
objectCreate an object containing the bounds of an area.
Standard geographic coordinate system is used where latitude runs from -90 to 90 degrees, and latitude from -180 to 180. North must always be greater than south, and east greater than west.
- Parameters:
lat_max (float) – The northern latitude boundary of the area (degrees). The value must be between -90 and 90, and greater than or equal to the southern boundary.
lat_min (float) – The southern latitude boundary of the area (degrees). The value must be between -90 and 90, and less than or equal to the northern boundary.
lon_max (float) – The easternmost longitude boundary of the area (degrees). The value must be between -180 and 180, and greater than or equal to the western boundary.
lon_min (float) – The westernmost longitude boundary of the area (degrees). The value must be between -180 and 180, and less than or equal to the eastern boundary.
- classmethod from_shape(shape: GeoSeries | GeoDataFrame) GeoBoundingBox[source]¶
Create
GeoBoundingBoxfrom a geopandas object.- Parameters:
shape (geopandas.GeoSeries, geopandas.GeoDataFrame) – A shape whose bounds will be retrieved
- Return type:
GeoBoundingBoxfrom the total bounds of theGeoDataFrame
Examples
>>> import geopandas as gpd >>> df_admin_boundaries = gpd.read_file("admin0_boundaries.gpkg") >>> geobb = GeoBoundingBox.from_shape(df_admin_boundaries)
- get_filename_repr(precision: int = 0) str[source]¶
Get succinct boundary representation for usage in filenames.
- Parameters:
precision (int, default = 0) – Precision, i.e. number of decimal places to round to. Default is 0 for ints.
- Return type:
String containing N, S, E and W coordinates.
- property lat_max: float¶
Get the northern latitude boundary of the area (degrees).
- property lat_min: float¶
Get the southern latitude boundary of the area (degrees).
- property lon_max: float¶
Get the eastern longitude boundary of the area (degrees).
- property lon_min: float¶
Get the western longitude boundary of the area (degrees).
- round_coords(offset_val: float = 0.0, round_val: int | float = 1) GeoBoundingBox[source]¶
Round the bounding box coordinates.
Rounding is always done outside the original bounding box, i.e. the resulting bounding box is always equal or larger than the original bounding box. Rounding can only be done once per instance.
- Parameters:
offset_val (float, default = 0.0) – Offset the coordinates by this factor.
round_val (int or float, default = 1) – Rounds to the nearest round_val. Can be an int for integer rounding or float for decimal rounding. If 1, round to integers.
- Return type:
GeoBoundingBoxinstance with rounded and offset coordinates
ochanticipy.utils.hdx_api module¶
Use HDX python API to download data.
- ochanticipy.utils.hdx_api.load_resource_from_hdx(hdx_dataset: str, hdx_resource_name: str, output_filepath: Path) Path[source]¶
Use the HDX API to download a dataset based on the address and dataset ID.
- Parameters:
hdx_dataset (str) – The name of the HDX dataset where the resource is located. Can be found by taking the portion of the url after
data.humdata.org/dataset/hdx_resource_name (str) – Resources name on HDX. Can be found by taking the filename as it appears on the dataset page.
output_filepath (Path) – Target filepath for the dataset
- Return type:
The full path of the downloaded dataset
ochanticipy.utils.io module¶
Function for I/O.
- ochanticipy.utils.io.download_url(url: str, save_path: Path, chunk_size: int = 2048)[source]¶
Download the file located at url to save_path.
- Parameters:
url (str) – url that contains the file to be downloaded
save_path (Path) – path to the location the file should be saved
chunk_size (int) – number of bytes to save at once
ochanticipy.utils.raster module¶
Utilities to manipulate and analyze raster data.
The raster module provides accessor utilities for xarray
data arrays and datasets accessible using the oap accessor.
These functions are available just by importing directly
the library using import ochanticipy.
Since rioxarray already extends xarray, this
module’s extensions inherit from the RasterArray and
RasterDataset extensions respectively. This ensures
cleaner code in the module as rio methods are
available immediately, but also means a couple of
design decisions are followed.
The xarray.DataArray and xarray.Dataset
extensions here inherit from rioxarray base classes.
Thus, methods that are identical for both objects
are defined in a mixin class OapRasterMixin which
can be inherited by the two respective extensions.
- class ochanticipy.utils.raster.OapRasterArray(xarray_object)[source]¶
Bases:
OapRasterMixin,RasterArrayOCHA AnticiPy extension for xarray.DataArray.
- compute_raster_stats(gdf: GeoDataFrame, feature_col: str, stats_list: List[str] | None = None, percentile_list: List[int] | None = None, all_touched: bool = False) DataFrame[source]¶
Compute raster statistics for polygon geometry.
compute_raster_stats()is designed to quickly compute raster statistics across a polygon and its features.- Parameters:
gdf (geopandas.GeoDataFrame) – GeoDataFrame with row per area for stats computation. If
pd.DataFrameis passed, geometry column must have the namegeometry.feature_col (str) – Column in
gdfto use as row/feature identifier.stats_list (Optional[List[str]], optional) – List of statistics to calculate, by default None. Passed to
get_attr().percentile_list (Optional[List[int]], optional) – List of percentiles to compute, by default None.
all_touched (bool, optional) – If
Trueall cells touching the region will be included, by default False. IfFalse, only cells with their centre in the region will be included.
- Returns:
Dataframe with computed statistics.
- Return type:
pandas.DataFrame
Examples
>>> import geopandas as gpd >>> import xarray as xr >>> import rioxarray >>> from shapely.geometry import Polygon >>> >>> # compute raster stats on simple data >>> d = { ... "name": ["area_a", "area_b"], ... "geometry": [ ... Polygon([(0, 0), (0, 2), (2, 2), (2, 0)]), ... Polygon([(2, 0), (2, 2), (3, 2), (3, 0)]), ... ], ... } >>> gdf = gpd.GeoDataFrame(d) >>> >>> da = xr.DataArray( ... [[1, 2, 3], [4, 5, 6]], ... dims=("y", "x"), ... coords={"y": [1.5, 0.5], "x": [0.5, 1.5, 2.5]}, ... ).rio.write_crs("EPSG:4326") >>> >>> da.oap.compute_raster_stats( ... gdf=gdf, ... feature_col="name" ... ) mean_name std_name min_name max_name sum_name count_name name # noqa: E501 0 3.0 1.5811388300841898 1 5 12.0 4 area_a # noqa: E501 1 4.5 1.5 3 6 9.0 2 area_b # noqa: E501
- class ochanticipy.utils.raster.OapRasterDataset(xarray_object)[source]¶
Bases:
OapRasterMixin,RasterDatasetOCHA AnticiPy extension for xarray.Dataset.
- compute_raster_stats(var_names: List[str] | str | None = None, **kwargs: Any)[source]¶
Compute raster statistics across dataset arrays.
compute_raster_stats()calculates raster statistics on component data arrays of a dataset. By default, calculates on all non-coordinate variables, unless a list of variable names is passed in, which then have statistics calculated for them.- Parameters:
var_names (Union[List[str], str, None], optional) – Dataset data array variables to calculate raster statistics on.
kwargs (Any) – Keyword arguments passed to the array method
compute_raster_stats()
- Returns:
List of raster statistics data frames.
- Return type:
List[pandas.DataFrame]
Examples
>>> import geopandas as gpd >>> import xarray as xr >>> import rioxarray >>> from shapely.geometry import Polygon >>> >>> # compute raster stats on simple data >>> d = { ... "name": ["area_a", "area_b"], ... "geometry": [ ... Polygon([(0, 0), (0, 2), (2, 2), (2, 0)]), ... Polygon([(2, 0), (2, 2), (3, 2), (3, 0)]), ... ], ... } >>> gdf = gpd.GeoDataFrame(d) >>> >>> ds = xr.DataArray( ... [[1, 2, 3], [4, 5, 6]], ... dims=("y", "x"), ... coords={"y": [1.5, 0.5], "x": [0.5, 1.5, 2.5]}, ... ).rio.write_crs("EPSG:4326").to_dataset(name="data") >>> >>> ds.oap.compute_raster_stats( ... var_names=["data"], ... gdf=gdf, ... feature_col="name" ... ) [ mean std min max sum count name # noqa: E501 0 3.0 1.5811388300841898 1 5 12.0 4 area_a # noqa: E501 1 4.5 1.5 3 6 9.0 2 area_b] # noqa: E501
- get_raster_array(var_name: str) DataArray[source]¶
Get xarray.DataArray from variable and keep dimensions.
Accessing a component xarray.DataArray using the non-coordinate variable name loses and dimensions set through
riooroap. This includesx_dim,y_dim, andt_dimthat have to be specifically set usingrio.set_spatial_dims()oroap.set_time_dim()respectively. For any datasetds,ds.get_raster_array("var")will retrieve the data array without losing the dimensions. Usingds["var"]will lose the dimensions.- Parameters:
var_name (str) – Name of variable.
- Returns:
A data array.
- Return type:
xarray.DataArray
Examples
>>> import xarray >>> import numpy >>> temp = 15 + 8 * numpy.random.randn(4, 4, 3) >>> precip = 10 * np.random.rand(4, 4, 3) >>> ds = xarray.Dataset( ... { ... "temperature": (["lat", "lon", "F"], temp), ... "precipitation": (["lat", "lon", "F"], precip) ... }, ... coords={ ... "lat":numpy.array([87, 88, 89, 90]), ... "lon":numpy.array([5, 120, 199, 360]), ... "F": pd.date_range("2014-09-06", periods=3) ... } ... ) >>> ds.oap.set_time_dim("F", inplace=True) >>> da = ds.oap.get_raster_array("temperature") >>> da.oap.t_dim 'F' >>> # directly accessing array loses set dimensions >>> ds['temperature'].oap.t_dim Traceback (most recent call last): ... rioxarray.exceptions.DimensionError: Time dimension not found. 'oap.set_time_dim()' or using 'rename()' to change the dimension name to 't' can address this. Data variable: temperature
- class ochanticipy.utils.raster.OapRasterMixin(xarray_obj)[source]¶
Bases:
objectOCHA AnticiPy mixin base class.
- change_longitude_range(to_180_range: bool = True, inplace: bool = False) DataArray | Dataset | None[source]¶
Convert longitude range between -180 to 180 and 0 to 360.
The standard longitude range is from -180 to 180, while some applications use 0 to 360. This includes
`rasterstats.zonal_stats<https://pypi.org/project/rasterstats/>`_, which assumes ranges from 0 to 360.change_longitude_range()will convert between the two coordinate ranges based on its current state. By default it will use the -180 to 180 range unless to_180_range is False, then it will use 0-360 If coordinates lie solely between 0 and 180 then there is no need for conversion and the input will be returned.- Parameters:
to_180_range (bool, default = True) – If True, the returned range is -180 to 180 Else, the returned range is 0 to 360
inplace (bool, optional) – If True, will overwrite existing data array. Default is False.
- Returns:
Dataset with transformed longitude coordinates.
- Return type:
Union[xarray.DataArray, xarray.Dataset]
Examples
>>> import xarray >>> import numpy >>> import pandas >>> temp = 15 + 8 * numpy.random.randn(4, 4, 3) >>> precip = 10 * numpy.random.rand(4, 4, 3) >>> ds = xarray.Dataset( ... { ... "temperature": (["lat", "lon", "time"], temp), ... "precipitation": (["lat", "lon", "time"], precip) ... }, ... coords={ ... "lat":numpy.array([87, 88, 89, 90]), ... "lon":numpy.array([5, 120, 199, 360]), ... "time": pandas.date_range("2014-09-06", periods=3) ... } ... ) >>> ds_inv = ds.oap.change_longitude_range() >>> ds_inv.get_index("lon") Index([-161, 0, 5, 120], dtype='int64', name='lon') >>> # invert coordinates back to original, in place >>> ds_inv.oap.change_longitude_range(to_180_range=False, inplace=True) >>> ds_inv.get_index("lon") Index([0, 5, 120, 199], dtype='int64', name='lon')
- correct_calendar(inplace: bool = False) DataArray | Dataset | None[source]¶
Correct calendar attribute for recognition by xarray.
Some datasets come with a wrong calendar attribute that isn’t recognized by xarray. This function corrects the coordinate attribute to ensure that a
calendarattribute exists and specifies a calendar alias that is supportable byxarray.cftime_rangeand NetCDF in general.Currently ensures that calendar attributes that are either specified with
units="months since"orcalendar="360"explicitly havecalendar="360_day". This is based on discussions in this GitHub issue. If and when further issues are found with calendar attributes, support for conversion will be added here.- Parameters:
inplace (bool, optional) – If True, it will modify the dataarray in place. Otherwise it will return a modified copy.
- Returns:
Data array or dataset with transformed calendar coordinate.
- Return type:
Union[xarray.DataArray, xarray.Dataset]
Examples
>>> import xarray >>> import numpy >>> da = xarray.DataArray( ... numpy.arange(64).reshape(4,4,4), ... coords={"lat":numpy.array([87, 88, 89, 90]), ... "lon":numpy.array([5, 120, 199, 360]), ... "t":numpy.array([10,11,12,13])} ... ) >>> da["t"].attrs["units"] = "months since 1960-01-01" >>> da_crct = da.oap.correct_calendar() >>> da_crct["t"].attrs["calendar"] '360_day'
- invert_coordinates(inplace: bool = False) DataArray | Dataset | None[source]¶
Invert latitude and longitude in data array.
This function checks for inversion of latitude and longitude and inverts them if needed. Datasets with inverted coordinates can produce incorrect results in certain functions like
rasterstats.zonal_stats(). Correctly ordered coordinates should be:latitude: Largest to smallest.
longitude: Smallest to largest.
If data array already has correct coordinate ordering, it is directly returned. Function largely copied from https://github.com/perrygeo/python-rasterstats/issues/218.
- Parameters:
inplace (bool, optional) – If True, will overwrite existing data array. Default is False.
- Returns:
Data array or dataset with correct coordinate ordering.
- Return type:
Union[xarray.DataArray, xarray.Dataset]
Examples
>>> import xarray >>> import numpy >>> da = xarray.DataArray( ... numpy.arange(16).reshape(4,4), ... coords={"lat":numpy.array([87, 88, 89, 90]), ... "lon":numpy.array([70, 69, 68, 67])} ... ) >>> da.oap.invert_coordinates(inplace=True) >>> da.get_index("lon") Index([67, 68, 69, 70], dtype='int64', name='lon') >>> da.get_index("lat") Index([90, 89, 88, 87], dtype='int64', name='lat')
- property longitude_range¶
The longitude range.
The longitude range indicates if coordinates are between -180 and 180 (indicated by ‘180’) or 0 and 360 (indicated by ‘360’).
- Type:
str
- set_time_dim(t_dim: str, inplace: bool = False) DataArray | Dataset | None[source]¶
Set the time dimension of the dataset.
- Parameters:
t_dim (str) – The name of the time dimension.
inplace (bool, optional) – If True, it will modify the dataarray in place. Otherwise it will return a modified copy.
- Returns:
Data array or dataset with time dimension.
- Return type:
Union[xarray.DataArray, xarray.Dataset]
Examples
>>> import xarray >>> import numpy >>> da = xarray.DataArray( ... numpy.arange(64).reshape(4,4,4), ... coords={"lat":numpy.array([87, 88, 89, 90]), ... "lon":numpy.array([5, 120, 199, 360]), ... "F":numpy.array([10,11,12,13])} ... ) >>> da.oap.set_time_dim(t_dim="F", inplace=True) >>> da.oap.t_dim 'F'
- property t_dim¶
The dimension for time.
- Type:
str
- x_dim: str¶
- y_dim: str¶
Module contents¶
General utilities.