waterpyk package¶
Submodules¶
waterpyk.main¶
- class waterpyk.main.StudyArea(coords, layers=None, saving_dir='/home/docs/checkouts/readthedocs.org/user_builds/waterpyk/checkouts/latest/docs/source/data/output', **kwargs)¶
Bases:
object
- describe()¶
Print statements describing StudyArea attributes and deficit parameters, if deficit was calculated.
- get_data(layers, **kwargs)¶
Updates self with attributes containing dataframes for the site. If data already exists in saving_dir, data is simply loaded. Otherwise, data from layers is extracted from GEE and the USGS.
- Parameters
layers (str or
df
, optional) – If str, specify ‘minimal’ or ‘all’ to extract default set of assets. If df, columns that must be present include: asset_id, start_date, end_date, relative_date, scale, bands, bands_to_scale, new_bandnames, scaling factor. These are the same parameters required for extract_basic().**interp (bool, optional) – (default: True), currently no option to change to False.
**combine_ET_bands (bool, optional) – (default True) add ET bands to make one ET band.
**bands_to_combine (list of str, optional) – (default [Es, Ec]) ET bands to combine
**band_names_combined (str, optional) – (default ‘ET’) name of combined ET band
**et_asset (str, optional) – asset to be used for creation of ET column (default ‘pml’).
**et_band (str, optional) – band name to be used for creation of ET column (default ‘ET’).
**ppt_asset (str, optional) – asset name for precipitation column (default ‘prism’).
**ppt_band (str, optional) – band name for precipitation colum (default ‘ppt’)
**snow_asset (str, optional) – defaults to ‘modis_snow’
**snow_band (str, optional) – defaults to ‘snow’. Note: You will need to change this if you don’t specify to change the default name of this asset upon extraction.
**snow_correction (bool, optional) – (default True) use snow correction factor when calculating deficit
**snow_frac (int, optional) – (default 10) set all ET when snow is greater than this (%) to 0 if snow_correction = True
- get_kind()¶
Adds attribute of point or watershed to object.
- get_location(**kwargs)¶
Convert coordinates to GEE feature. For a USGS watershed, coordinates are extracted from USGS website using gage ID.
- Parameters
site_name (str, optional) – Watersheds are already named. Default for lat/long sites is empty string.
- Returns
gee feature with location geometry
- Return type
feature
- plot(kind, **kwargs)¶
Make common plots of the data specifying what ‘kind’ of plot. Each kind of plot has defaults pre-set, which may differ from the arg defaults specified below. To see the full list of defaults, see the plots submodule. In general, updating kwargs should not be necessary except for small details like legend, title, or colors.
- Parameters
kind (str) – The kind of plot you wanted. Not to be confused with the kind of site (ie watershed or point). Supported types are: ‘timeseries’, ‘spearman’, ‘wateryear’, ‘RWS’, and ‘distribution’.
**plot_PET (bool, optional) –
**plot_Q (bool, optional) –
**plot_D (bool, optional) –
**plot_Dwy (bool, optional) –
**plot_ET (bool, optional) – include wateryear ET in plot
**plot_ET_dry (bool, optional) – include dry season ET in plot
**color_PET (str, optional) – hex code string for color
**color_Q (str, optional) – hex code string for color
**color_P (str, optional) – hex code string for color
**color_D (str, optional) – hex code string for color
**color_ET (str, optional) – hex code string for color
**color_WY (str, optional) – hex code string for color
**markeredgecolor (str, optional) – default = black
**linestyle_PET (str, optional) –
**linestyle_Q (str, optional) –
**linestyle_P (str, optional) –
**linestyle_D (str, optional) –
**linestyle_Dwy (str, optional) –
**linestyle_ET (str, optional) –
**lw (float, optional) – line weight. Default = 1.5
**xmin (date or float or int, optional) –
**xmax (date or float or int, optional) –
**legend (bool, optional) – default = True
**dpi (int, optional) – default = 300
**figsize (tuple, optional) – default = (6,4)
**xlabel (str, optional) –
**ylabel (str, optional) –
**twinx (bool, optional) – some plots allow for twin x-axes. Default = False.
**title (str, optional) – default None
waterpyk.gee¶
- waterpyk.gee.extract(layers, gee_feature, kind, reducer_type=None, **kwargs)¶
Extract data at site for several assets at once. Uses extract_basic().
- Parameters
layers (str or
df
, optional) – If str, specify ‘minimal’ or ‘all’ to extract default set of assets. If df, columns that must be present include: asset_id, start_date, end_date, relative_date, scale, bands, bands_to_scale, new_bandnames, scaling factor. These are the same parameters required for extract_basic().gee_feature (
gee feature
) – GEE feature for region geometrykind (str) – ‘point’ or ‘watershed’
reducer_type (
GEE reducer function
) – defaults to None, in which case GEE reduceRegion reducer function is first() and mean() for points and watersheds, respectively. See GEE documentation for more available types.**interp (bool, optional) – (default: True), currently no option to change to False.
**combine_ET_bands (bool, optional) – (default True) add ET bands to make one ET band.
**bands_to_combine (list of str, optional) – (default [Es, Ec]) ET bands to combine
**band_names_combined (str, optional) – (default ‘ET’) name of combined ET band
- Returns
2 long-style pandas dataframes, the first containing all of the daily data and the second containing all of the non-daily data (i.e. extractions from images or from single-timestep ImageCollections).
- Return type
df
,df
- waterpyk.gee.extract_basic(gee_feature, kind, asset_id, scale, bands, start_date=None, end_date=None, relative_date=None, bands_to_scale=None, scaling_factor=1, reducer_type=None, new_bandnames=None)¶
Extract data from a single asset. For timeseries, specify start_date and end_date for an asset_id. For an image or to get an image from an imagecollection (ie one date), specify relative_date as either ‘first’, ‘most_recent’, or ‘image’. Either start_date and end_date must be specified or relative_date must be specified or error will be raised.
- Parameters
gee_feature (
gee feature
) – GEE feature for region geometrykind (str) – ‘point’ or ‘watershed’
asset_id (str) – GEE asset identification string
start_date (str, optional) – format mm/dd/yyy or similiar date format
end_date (str, optional) – format mm/dd/yyyy or similiar date format
relative_date (str, optional) – If not using start and end date, specify ‘first’, ‘most_recent’, or ‘image’ to extract a single date from an ImageCollection or to extract datea from an Image.
scale (str) – scale in meters for GEE reducer function
bands (list of str) – bands of GEE asset to extract
new_bandnames (list of str, optional) – rename bands. Default is to not rename bands, where final band name is the original band name before underscore, for example pml stays pml but LC_Type1 becomes LC by default. This param must be the same length as **bands or exception will be thrown.
bands_to_scale (list of str, optional) – (default = None) bands for which each value will be multiplied by scaling_factor.
scaling_factor (float, optional) – (default = 1) scaling factor to apply to all values in bands_to_scale
reducer_type (
gee reducer function
, optional) – reducer_type defaults to first() for points and mean() for watersheds. See available gee ReduceRegion options online for other possible inputs.
- Returns
dataframe of all extracted data
- Return type
df
waterpyk.watershed¶
- waterpyk.watershed.extract_geometry(gage)¶
Get the geometry of a USGS gage in Google Earth Engine (GEE) form and as a geopandas dataframe.
- Parameters
gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.
- Returns
GEE feature containing the basin’s exterior polygon coordinates and geopandas dataframe containing the basin’s coordinates.
- Return type
feature
andgdf
- waterpyk.watershed.extract_geometry_flowline(gage)¶
Get geopandas dataframe of USGS basin flowline geometry for plotting.
- Parameters
gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.
- Returns
geopandas dataframe with geometry of flowlines (rivers) for plotting.
- Return type
df
- waterpyk.watershed.extract_latitude(gage)¶
Get the latitude of the centroid of the USGS gage for calculating PET.
- Parameters
gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.
- Returns
latitude
- Return type
float
- waterpyk.watershed.extract_metadata(gage)¶
Get metadata for a USGS gage.
- Parameters
gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.
- Returns
2 strings. (1) USGS long-name of gage. (2) description of the form ‘USGS Basin + gage ID + imported at + site_name + CRS: + coordinate system
- Return type
str, str
- waterpyk.watershed.extract_streamflow(gage, **kwargs)¶
Get a dataframe with streamflow (i.e. discharge) for a USGS gage. Units are converted to mm using the area of the basin, calculated from the exterior geometry.
- Parameters
gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.
**flow_start_date (str, optional) – default: ‘1980-10-01’
**flow_end_date (str, optional) – default: ‘2021-10-01’
- Returns
dataframe with daily discharge (Q) in units of cfs, m3/day, m, and mm.
- Return type
df
- waterpyk.watershed.extract_urls(gage, **kwargs)¶
Return relevant urls for a USGS gage.
- Parameters
gage (
str
orint
) – USGS 8-number gage ID. If int, leading 0s will automatically be added.**flow_start_date (str, optional) – default: ‘1980-10-01’
**flow_end_date (str, optional) – default: ‘2021-10-01’
- Returns
4 strings with urls which (1) access basin lat/long geometry. (2) access geometry of flowline (i.e. rivers) lat/long geometry. (3) access basin metadata. (4) access basin discharge timeseries (between the dates of **kwargs).
- Return type
str, str, str, str
waterpyk.calcs¶
- waterpyk.calcs.combine_bands(df, bands_to_combine, band_name_final)¶
Defaults to add together soil and vegetation ET bands (Es and Ec) for PML to create ET band. Can be customized to add any bands together to create a new band.
- Parameters
bands_to_combine (list of str) – list of strings of band names (example: [‘Es’, ‘Ec’]) to combine into a single band
band_name_final (str) – the final band name to be stored in the ‘band’ column of the final dataframe, where the ‘value’ column will contain the added values from bands_to_combine
- Returns
long-form dataframe from input with appended section which contains date, asset_name, value, band, and value_raw (set to np.nan) columns with data from combined
- Return type
df
- waterpyk.calcs.deficit(df_long, df_wide=None, **kwargs)¶
Calculate D(t) after McCormick et al., 2021 and Dralle et al., 2020.
- Parameters
df_long (
df
) – original long-style dataframe with snow, P, and ET data at a minimumdf_wide (
df
) – dataframe created from make_wide_df(). (default = None, in which case new df_wide is created from df_long using default args). This dataframe must have ET, P, and wateryear columns.**interp – (bool, optional): default: True. Currently no option to change to False.
**combine_ET_bands (bool, optional) – (default True) add ET bands to make one ET band.
**bands_to_combine (list of str, optional) – (default [Es, Ec]) ET bands to combine
**band_names_combined (str, optional) – (default ET) name of combined ET band
**et_asset (str, optional) – (default pml) ET dataset to use for deficit calculation, if multiple are given
**et_band (str, optional) – (default ET) band from ET dataset to use for deficit calculation, if multiple are given
**ppt_asset (str, optional) – (default prism) precipitation dataset to use for deficit calculation, if multiple are given
**ppt_band (str, optional) – (default ppt) precipitation dataset to use for deficit calculation, if multiple are given
**snow_band (str, optional) – defaults to ‘snow’. Note: You will need to change this if you don’t specify to change the default name of this asset upon extraction.
**snow_correction (bool, optional) – (default True) use snow correction factor when calculating deficit
**snow_correction – (default True) use snow correction factor when calculating deficit
**snow_frac (int, optional) – (default 10) set all ET when snow is greater than this (%) to 0 if snow_correction = True
- Returns
dataframe with root-zone water storage deficit data where deficit is column ‘D’ and wateryear deficit is ‘D_wy’.
- Return type
df
- waterpyk.calcs.deficit_bursts(df)¶
Still under development!! Get a dataframe with the length and maximum deficit of each “burst” (i.e. deficits that are continuously above zero).
- Parameters
df (
df
) – dataframe with columns for date and D (for deficit), at minimum.- Returns
dataframe with columns for start and end dates, and max_D [mm], duration [days] and start and end wateryear of each burst.
- Return type
df
- waterpyk.calcs.interp_daily(df)¶
Interpolate all data to daily.
:param
df
: initial long-form dataframe for a single asset (may contain multiple bands) with column ‘value’ for interpolating- Returns
dataframe with ‘value’ column containing daily interpolated daily
- Return type
df
- waterpyk.calcs.make_wide_df(df_long, **kwargs)¶
Uses ET and P asset and band names (designated in **kwargs) to return a wide-form dataframe with columns: date, ET, Ei, P, and P-Ei (where Ei is the interception band from PML and is only extracted if it exists). merge_wide_streamflow() can then be used to merge with streamflow df.
- Parameters
df_long – dataframe such as that produced by extract_assets_at_site() where columns contain asset_name, band, and value.
**et_asset (str, optional) – asset to be used for creation of ET column (default ‘pml’).
**et_band (str, optional) – band name to be used for creation of ET column (default ‘ET’).
**ppt_asset (str, optional) – asset name for precipitation column (default ‘prism’).
**ppt_band (str, optional) – band name for precipitation colum (default ‘ppt’)
- Returns
wide-form dataframe with columns for date, ET, Ei, P, and P-Ei (where Ei is the interception band from PML and is only included if it exists.)
- Return type
df
- waterpyk.calcs.merge(df_wide, df_merge, merge_with, column_names=None)¶
Merge df_wide (dataframe with columns representing variable names) with streamflow dataframe.
:param df_wide
df
): wide-form dataframe with ‘date’ column and other columns with variable names (ie ET, P, etc) :param df_merge: dataframe with stuff to merge (streamflow or deficit). Must contain columns for ‘date’ and column_names if given. :type df_merge:df
:param merge_with: choice of ‘streamflow’ or ‘deficit’ to merge with df_wide. :type merge_with: str :param column_names: Names of the column in df_merge that you wish to merge (default for streamflow: [‘Q_mm’], default for deficit: [‘D’, and ‘D_wy’]). ‘date’ is automatically included. :type column_names: list of str- Returns
merged dataframes.
- Return type
df
- waterpyk.calcs.wateryear(df_wide)¶
Create total and cumulative wateryear dataframes for all variables (ie columns) given in df_wide.
- Parameters
df_wide (
df
) – wide-form dataframe with ‘date’ column and other columns with variable names (i.e. ET, P, Q_mm, etc.) This dataframe can also have D, D_wy, Q_mm, etc merged into it.- Returns
2 datafrmes: (1) the original wide-form dataframe with added columns (in the naming form of ORIGINALNAME_cumulative) with cumulative wateryear values. For example, if ‘P’ was in original dataframe, then ‘P_cumulative’ now exists. If ‘Q_mm’ was in dataframe, then ‘Q_mm_cumulative’ now exists. (2) dataframe with one row for each wateryear with columns such as wateryear, ET, P, D_wy_max, ET_summer, Q_mm, etc, which represent the total (or maximum, for D_wy_max) wateryear sum.
- Return type
df
,df
waterpyk.plots¶
- waterpyk.plots.plot_RWS(studyareaobject=None, df_deficit=None, smax=None, **plot_kwargs)¶
Plot root-zone water storage (RWS) timeseries, following Rempe-McCormick et al. (in prep) SI Fig 1. Either studyarea object must be supplied OR the df_deficit and smax params with the necessary data for plotting (ie columns for ‘D’ and ‘date’ and smax value)
- Parameters
studyareaobject (object from studyarea class, optional) –
deficit_timeseries (
df
, optional) – dataframe with deficit timeseries and columns ‘D’, and ‘date’ at minimum.smax (int or float, optional) – root-zone water storage capacity (Smax) in mm
**xmin (str, optional) – default = ‘2003-10-01’
**xmax (str, optional) – default = ‘2020-10-01’
**xlabel (str, optional) – default = ‘Date’
**ylabel (str, optional) – default = ‘RWS (mm)’
**dpi (int, optional) – default = 300
**figsize (tuple, optional) – default = (6,4)
**legend (bool, optional) – default = False
**title (str, optional) – default = None
**color_D (str, optional) – default = ‘black’
**linestyle_D (str, optional) – default = ‘-’
**lw (float, optional) – lineweight, default = 1
- Returns
fig
- waterpyk.plots.plot_p_distribution(studyareaobject=None, df_wateryear_totals=None, smax=None, **plot_kwargs)¶
Plot distribution of wateryear precipitation (P) and root-zone water storage capacity (Smax) in the form of Rempe-McCormick et al. (in prep) Fig 1c,f. Either studyarea object must be supplied OR the df_wateryear_totals and smax param with the necessary data for plotting.
- Parameters
studyareaobject (object from studyarea class, optional) –
df_wateryear_totals (
df
, optional) – dataframe with wateryear cumulative P and a column named ‘P’ at minimum.smax (int or float, optional) – value for root-zone water storage capacity (Smax) in mm
**xmin (int, optional) – default = 0
**xmax (int, optional) – default = 4000
**xlabel (str, optional) – default = ‘mm’
**ylabel (str, optional) – default = ‘Density’
**dpi (int, optional) – default = 300
**figsize (tuple, optional) – default = (6,4)
**legend (bool, optional) – default = True
**title (str, optional) – default = None
- Returns
fig
- waterpyk.plots.plot_spearman(studyareaobject=None, df_wateryear_totals=None, **plot_kwargs)¶
Scatter plot of wateryear precipitation and summer (ie dry season) ET, following Rempe-McCormick et al. (in prep) Fig 1b,e. Spearman correlation coefficient (rho) and p-value is given in upper right corner of plot. Either studyarea object must be supplied OR the df_wateryear_totals param with the necessary data for plotting (ie wateryear, P, and ET_summer columns).
- Parameters
studyareaobject (object from studyarea class, optional) –
df_wateryear_totals (
df
, optional) – dataframe with columns ‘P’, ‘ET_summer’, and ‘wateryear’deficit_timeseries (
df
, optional) – dataframe with deficit timeseries and columns ‘D’, ‘D_wy’ and ‘date’ at minimum.**xmin (int, optional) – default = 0
**xmax (int, optional) – default = 3000
**dpi (int, optional) – default = 300
**figsize (tuple, optional) – default = (6,4)
**color_WY (bool, optional) – This puts a wateryear legend and colors the dots by wateryear. default = True
**title (str, optional) – default = None
**markeredgecolor (str, optional) – default = ‘black’
**lw (float, optional) – lineweight, default = 1.5
- Returns
fig
- waterpyk.plots.plot_timeseries(studyareaobject=None, df_daily_wide=None, deficit_timeseries=None, **plot_kwargs)¶
Plot daily (wateryear cumulative) timeseries, following Rempe-McCormick et al. (in prep) Fig 1b,e. Either studyarea object must be supplied OR the df_daily_wide and df_timeseries params with the necessary data for plotting. Use **kwargs to choose which datasets to plot, from the options of P, ET, deficit, deficit_wateryear, and streamflow (Q) if available.
- Parameters
studyareaobject (object from studyarea class, optional) –
df_daily_wide (
df
, optional) – dataframe with daily cumulative wateryear timeseries and columns corresponding to the chosen data (from **kwargs), at minimum, such as ‘ET_cumulative’, ‘P_cumulative’, and ‘Q_mm_cumulative’, and ‘date’ columns.deficit_timeseries (
df
, optional) – dataframe with deficit timeseries and columns ‘D’, ‘D_wy’ and ‘date’ at minimum.**xmin (str, optional) – default = ‘2003-10-01’
**xmax (str, optional) – default = ‘2020-10-01’
**xlabel (str, optional) – default = ‘Date’
**ylabel (str, optional) – default = ‘[mm]’
**dpi (int, optional) – default = 300
**figsize (tuple, optional) – default = (6,4)
**legend (bool, optional) – default = True
**title (str, optional) – default = None
**plot_Q (bool, optional) – default = False
**plot_P (bool, optional) – default = True
**plot_D (bool, optional) – default = True
**plot_Dwy (bool, optional) – default = True
**plot_ET (bool, optional) – default = False
**plot_ET_dry (bool, optional) – default = False
**color_Q (str, optional) – default = ‘blue’
**color_P (str, optional) – default = ‘#b1d6f0’
**color_D (str, optional) – default = ‘black’
**color_Dwy (str, optional) – default = ‘black’
**color_ET (str, optional) – default = ‘purple’
**linestyle_Q (str, optional) – default = ‘-’
**linestyle_P (str, optional) – default = ‘-’
**linestyle_D (str, optional) – default = ‘-’
**linestyle_Dwy (str, optional) – default = ‘–’
**linestyle_ET (str, optional) – default = ‘-’
**lw (float, optional) – lineweight, default = 1.5
- Returns
fig
- waterpyk.plots.plot_wateryear_totals(studyareaobject=None, df_wateryear_totals=None, **plot_kwargs)¶
Plot summed wateryear timeseries. Either studyarea object must be supplied OR the df_wateryear_totals param with the necessary data for plotting. Use **kwargs to choose which datasets to plot, from the options of P, ET, ET_summer, deficit, and streamflow (Q) if available.
- Parameters
studyareaobject (object from studyarea class, optional) –
df_wateryear_totals (
df
, optional) – dataframe with cumulative wateryear timeseries and columns corresponding to the chosen data (from **kwargs), at minimum, such as ‘P’, ‘D’, ‘ET’, ‘ET_summer’, and ‘Q_mm’, and ‘wateryear’ columns.**xmin (int, optional) – default = 2003
**xmax (int, optional) – default = 2020
**xlabel (str, optional) – default = ‘Wateryear’
**ylabel (str, optional) – default = ‘[mm]’
**dpi (int, optional) – default = 300
**figsize (tuple, optional) – default = (6,4)
**legend (bool, optional) – default = True
**title (str, optional) – default = None
**plot_Q (bool, optional) – default = True
**plot_P (bool, optional) – default = True
**plot_D (bool, optional) – default = True
**plot_ET (bool, optional) – default = True
**plot_ET_dry (bool, optional) – default = False
**color_Q (str, optional) – default = ‘blue’
**color_P (str, optional) – default = ‘#b1d6f0’
**color_D (str, optional) – default = ‘black’
**color_ET (str, optional) – default = ‘purple’
**linestyle_Q (str, optional) – default = ‘-o’
**linestyle_P (str, optional) – default = ‘-o’
**linestyle_D (str, optional) – default = ‘-’
**linestyle_ET (str, optional) – default = ‘-o’
**lw (float, optional) – lineweight, default = 1.5
**twinx (bool, optional) – Put ET data on twin x-axis to P data. default = True
- Returns
fig
waterpyk¶
Top-level package for waterpyk.
- waterpyk.in_colab_shell()¶
Checks if code is being executed within Google Colab to decide on default_saving_dir (default saving directory path). Function from geemap by Qiusheng Wu
- Returns
bool
- waterpyk.load_data(layers)¶
Finds the import data (either all.csv or minimal.csv corresponding to layers input) and loads it.
- Parameters
layers (str, optional) – the GEE input layers to be extracted. Options if string are ‘minimal’ or ‘all’.
- Returns
dataframe of read csv from layers.
- Return type
df