waterpyk package

Submodules

waterpyk.main

class waterpyk.main.StudyArea(coords, layers=None, saving_dir='/home/docs/checkouts/readthedocs.org/user_builds/waterpyk/checkouts/latest/docs/source/data/output', **kwargs)

Bases: object

describe()

Print statements describing StudyArea attributes and deficit parameters, if deficit was calculated.

get_data(layers, **kwargs)

Updates self with attributes containing dataframes for the site. If data already exists in saving_dir, data is simply loaded. Otherwise, data from layers is extracted from GEE and the USGS.

Parameters
  • layers (str or df, optional) – If str, specify ‘minimal’ or ‘all’ to extract default set of assets. If df, columns that must be present include: asset_id, start_date, end_date, relative_date, scale, bands, bands_to_scale, new_bandnames, scaling factor. These are the same parameters required for extract_basic().

  • **interp (bool, optional) – (default: True), currently no option to change to False.

  • **combine_ET_bands (bool, optional) – (default True) add ET bands to make one ET band.

  • **bands_to_combine (list of str, optional) – (default [Es, Ec]) ET bands to combine

  • **band_names_combined (str, optional) – (default ‘ET’) name of combined ET band

  • **et_asset (str, optional) – asset to be used for creation of ET column (default ‘pml’).

  • **et_band (str, optional) – band name to be used for creation of ET column (default ‘ET’).

  • **ppt_asset (str, optional) – asset name for precipitation column (default ‘prism’).

  • **ppt_band (str, optional) – band name for precipitation colum (default ‘ppt’)

  • **snow_asset (str, optional) – defaults to ‘modis_snow’

  • **snow_band (str, optional) – defaults to ‘snow’. Note: You will need to change this if you don’t specify to change the default name of this asset upon extraction.

  • **snow_correction (bool, optional) – (default True) use snow correction factor when calculating deficit

  • **snow_frac (int, optional) – (default 10) set all ET when snow is greater than this (%) to 0 if snow_correction = True

get_kind()

Adds attribute of point or watershed to object.

get_location(**kwargs)

Convert coordinates to GEE feature. For a USGS watershed, coordinates are extracted from USGS website using gage ID.

Parameters

site_name (str, optional) – Watersheds are already named. Default for lat/long sites is empty string.

Returns

gee feature with location geometry

Return type

feature

plot(kind, **kwargs)

Make common plots of the data specifying what ‘kind’ of plot. Each kind of plot has defaults pre-set, which may differ from the arg defaults specified below. To see the full list of defaults, see the plots submodule. In general, updating kwargs should not be necessary except for small details like legend, title, or colors.

Parameters
  • kind (str) – The kind of plot you wanted. Not to be confused with the kind of site (ie watershed or point). Supported types are: ‘timeseries’, ‘spearman’, ‘wateryear’, ‘RWS’, and ‘distribution’.

  • **plot_PET (bool, optional) –

  • **plot_Q (bool, optional) –

  • **plot_D (bool, optional) –

  • **plot_Dwy (bool, optional) –

  • **plot_ET (bool, optional) – include wateryear ET in plot

  • **plot_ET_dry (bool, optional) – include dry season ET in plot

  • **color_PET (str, optional) – hex code string for color

  • **color_Q (str, optional) – hex code string for color

  • **color_P (str, optional) – hex code string for color

  • **color_D (str, optional) – hex code string for color

  • **color_ET (str, optional) – hex code string for color

  • **color_WY (str, optional) – hex code string for color

  • **markeredgecolor (str, optional) – default = black

  • **linestyle_PET (str, optional) –

  • **linestyle_Q (str, optional) –

  • **linestyle_P (str, optional) –

  • **linestyle_D (str, optional) –

  • **linestyle_Dwy (str, optional) –

  • **linestyle_ET (str, optional) –

  • **lw (float, optional) – line weight. Default = 1.5

  • **xmin (date or float or int, optional) –

  • **xmax (date or float or int, optional) –

  • **legend (bool, optional) – default = True

  • **dpi (int, optional) – default = 300

  • **figsize (tuple, optional) – default = (6,4)

  • **xlabel (str, optional) –

  • **ylabel (str, optional) –

  • **twinx (bool, optional) – some plots allow for twin x-axes. Default = False.

  • **title (str, optional) – default None

waterpyk.gee

waterpyk.gee.extract(layers, gee_feature, kind, reducer_type=None, **kwargs)

Extract data at site for several assets at once. Uses extract_basic().

Parameters
  • layers (str or df, optional) – If str, specify ‘minimal’ or ‘all’ to extract default set of assets. If df, columns that must be present include: asset_id, start_date, end_date, relative_date, scale, bands, bands_to_scale, new_bandnames, scaling factor. These are the same parameters required for extract_basic().

  • gee_feature (gee feature) – GEE feature for region geometry

  • kind (str) – ‘point’ or ‘watershed’

  • reducer_type (GEE reducer function) – defaults to None, in which case GEE reduceRegion reducer function is first() and mean() for points and watersheds, respectively. See GEE documentation for more available types.

  • **interp (bool, optional) – (default: True), currently no option to change to False.

  • **combine_ET_bands (bool, optional) – (default True) add ET bands to make one ET band.

  • **bands_to_combine (list of str, optional) – (default [Es, Ec]) ET bands to combine

  • **band_names_combined (str, optional) – (default ‘ET’) name of combined ET band

Returns

2 long-style pandas dataframes, the first containing all of the daily data and the second containing all of the non-daily data (i.e. extractions from images or from single-timestep ImageCollections).

Return type

df, df

waterpyk.gee.extract_basic(gee_feature, kind, asset_id, scale, bands, start_date=None, end_date=None, relative_date=None, bands_to_scale=None, scaling_factor=1, reducer_type=None, new_bandnames=None)

Extract data from a single asset. For timeseries, specify start_date and end_date for an asset_id. For an image or to get an image from an imagecollection (ie one date), specify relative_date as either ‘first’, ‘most_recent’, or ‘image’. Either start_date and end_date must be specified or relative_date must be specified or error will be raised.

Parameters
  • gee_feature (gee feature) – GEE feature for region geometry

  • kind (str) – ‘point’ or ‘watershed’

  • asset_id (str) – GEE asset identification string

  • start_date (str, optional) – format mm/dd/yyy or similiar date format

  • end_date (str, optional) – format mm/dd/yyyy or similiar date format

  • relative_date (str, optional) – If not using start and end date, specify ‘first’, ‘most_recent’, or ‘image’ to extract a single date from an ImageCollection or to extract datea from an Image.

  • scale (str) – scale in meters for GEE reducer function

  • bands (list of str) – bands of GEE asset to extract

  • new_bandnames (list of str, optional) – rename bands. Default is to not rename bands, where final band name is the original band name before underscore, for example pml stays pml but LC_Type1 becomes LC by default. This param must be the same length as **bands or exception will be thrown.

  • bands_to_scale (list of str, optional) – (default = None) bands for which each value will be multiplied by scaling_factor.

  • scaling_factor (float, optional) – (default = 1) scaling factor to apply to all values in bands_to_scale

  • reducer_type (gee reducer function, optional) – reducer_type defaults to first() for points and mean() for watersheds. See available gee ReduceRegion options online for other possible inputs.

Returns

dataframe of all extracted data

Return type

df

waterpyk.watershed

waterpyk.watershed.extract_geometry(gage)

Get the geometry of a USGS gage in Google Earth Engine (GEE) form and as a geopandas dataframe.

Parameters

gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.

Returns

GEE feature containing the basin’s exterior polygon coordinates and geopandas dataframe containing the basin’s coordinates.

Return type

feature and gdf

waterpyk.watershed.extract_geometry_flowline(gage)

Get geopandas dataframe of USGS basin flowline geometry for plotting.

Parameters

gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.

Returns

geopandas dataframe with geometry of flowlines (rivers) for plotting.

Return type

df

waterpyk.watershed.extract_latitude(gage)

Get the latitude of the centroid of the USGS gage for calculating PET.

Parameters

gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.

Returns

latitude

Return type

float

waterpyk.watershed.extract_metadata(gage)

Get metadata for a USGS gage.

Parameters

gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.

Returns

2 strings. (1) USGS long-name of gage. (2) description of the form ‘USGS Basin + gage ID + imported at + site_name + CRS: + coordinate system

Return type

str, str

waterpyk.watershed.extract_streamflow(gage, **kwargs)

Get a dataframe with streamflow (i.e. discharge) for a USGS gage. Units are converted to mm using the area of the basin, calculated from the exterior geometry.

Parameters
  • gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.

  • **flow_start_date (str, optional) – default: ‘1980-10-01’

  • **flow_end_date (str, optional) – default: ‘2021-10-01’

Returns

dataframe with daily discharge (Q) in units of cfs, m3/day, m, and mm.

Return type

df

waterpyk.watershed.extract_urls(gage, **kwargs)

Return relevant urls for a USGS gage.

Parameters
  • gage (str or int) – USGS 8-number gage ID. If int, leading 0s will automatically be added.

  • **flow_start_date (str, optional) – default: ‘1980-10-01’

  • **flow_end_date (str, optional) – default: ‘2021-10-01’

Returns

4 strings with urls which (1) access basin lat/long geometry. (2) access geometry of flowline (i.e. rivers) lat/long geometry. (3) access basin metadata. (4) access basin discharge timeseries (between the dates of **kwargs).

Return type

str, str, str, str

waterpyk.calcs

waterpyk.calcs.combine_bands(df, bands_to_combine, band_name_final)

Defaults to add together soil and vegetation ET bands (Es and Ec) for PML to create ET band. Can be customized to add any bands together to create a new band.

Parameters
  • bands_to_combine (list of str) – list of strings of band names (example: [‘Es’, ‘Ec’]) to combine into a single band

  • band_name_final (str) – the final band name to be stored in the ‘band’ column of the final dataframe, where the ‘value’ column will contain the added values from bands_to_combine

Returns

long-form dataframe from input with appended section which contains date, asset_name, value, band, and value_raw (set to np.nan) columns with data from combined

Return type

df

waterpyk.calcs.deficit(df_long, df_wide=None, **kwargs)

Calculate D(t) after McCormick et al., 2021 and Dralle et al., 2020.

Parameters
  • df_long (df) – original long-style dataframe with snow, P, and ET data at a minimum

  • df_wide (df) – dataframe created from make_wide_df(). (default = None, in which case new df_wide is created from df_long using default args). This dataframe must have ET, P, and wateryear columns.

  • **interp – (bool, optional): default: True. Currently no option to change to False.

  • **combine_ET_bands (bool, optional) – (default True) add ET bands to make one ET band.

  • **bands_to_combine (list of str, optional) – (default [Es, Ec]) ET bands to combine

  • **band_names_combined (str, optional) – (default ET) name of combined ET band

  • **et_asset (str, optional) – (default pml) ET dataset to use for deficit calculation, if multiple are given

  • **et_band (str, optional) – (default ET) band from ET dataset to use for deficit calculation, if multiple are given

  • **ppt_asset (str, optional) – (default prism) precipitation dataset to use for deficit calculation, if multiple are given

  • **ppt_band (str, optional) – (default ppt) precipitation dataset to use for deficit calculation, if multiple are given

  • **snow_band (str, optional) – defaults to ‘snow’. Note: You will need to change this if you don’t specify to change the default name of this asset upon extraction.

  • **snow_correction (bool, optional) – (default True) use snow correction factor when calculating deficit

  • **snow_correction – (default True) use snow correction factor when calculating deficit

  • **snow_frac (int, optional) – (default 10) set all ET when snow is greater than this (%) to 0 if snow_correction = True

Returns

dataframe with root-zone water storage deficit data where deficit is column ‘D’ and wateryear deficit is ‘D_wy’.

Return type

df

waterpyk.calcs.deficit_bursts(df)

Still under development!! Get a dataframe with the length and maximum deficit of each “burst” (i.e. deficits that are continuously above zero).

Parameters

df (df) – dataframe with columns for date and D (for deficit), at minimum.

Returns

dataframe with columns for start and end dates, and max_D [mm], duration [days] and start and end wateryear of each burst.

Return type

df

waterpyk.calcs.interp_daily(df)

Interpolate all data to daily.

:param df: initial long-form dataframe for a single asset (may contain multiple bands) with column ‘value’ for interpolating

Returns

dataframe with ‘value’ column containing daily interpolated daily

Return type

df

waterpyk.calcs.make_wide_df(df_long, **kwargs)

Uses ET and P asset and band names (designated in **kwargs) to return a wide-form dataframe with columns: date, ET, Ei, P, and P-Ei (where Ei is the interception band from PML and is only extracted if it exists). merge_wide_streamflow() can then be used to merge with streamflow df.

Parameters
  • df_long – dataframe such as that produced by extract_assets_at_site() where columns contain asset_name, band, and value.

  • **et_asset (str, optional) – asset to be used for creation of ET column (default ‘pml’).

  • **et_band (str, optional) – band name to be used for creation of ET column (default ‘ET’).

  • **ppt_asset (str, optional) – asset name for precipitation column (default ‘prism’).

  • **ppt_band (str, optional) – band name for precipitation colum (default ‘ppt’)

Returns

wide-form dataframe with columns for date, ET, Ei, P, and P-Ei (where Ei is the interception band from PML and is only included if it exists.)

Return type

df

waterpyk.calcs.merge(df_wide, df_merge, merge_with, column_names=None)

Merge df_wide (dataframe with columns representing variable names) with streamflow dataframe.

:param df_wide df): wide-form dataframe with ‘date’ column and other columns with variable names (ie ET, P, etc) :param df_merge: dataframe with stuff to merge (streamflow or deficit). Must contain columns for ‘date’ and column_names if given. :type df_merge: df :param merge_with: choice of ‘streamflow’ or ‘deficit’ to merge with df_wide. :type merge_with: str :param column_names: Names of the column in df_merge that you wish to merge (default for streamflow: [‘Q_mm’], default for deficit: [‘D’, and ‘D_wy’]). ‘date’ is automatically included. :type column_names: list of str

Returns

merged dataframes.

Return type

df

waterpyk.calcs.wateryear(df_wide)

Create total and cumulative wateryear dataframes for all variables (ie columns) given in df_wide.

Parameters

df_wide (df) – wide-form dataframe with ‘date’ column and other columns with variable names (i.e. ET, P, Q_mm, etc.) This dataframe can also have D, D_wy, Q_mm, etc merged into it.

Returns

2 datafrmes: (1) the original wide-form dataframe with added columns (in the naming form of ORIGINALNAME_cumulative) with cumulative wateryear values. For example, if ‘P’ was in original dataframe, then ‘P_cumulative’ now exists. If ‘Q_mm’ was in dataframe, then ‘Q_mm_cumulative’ now exists. (2) dataframe with one row for each wateryear with columns such as wateryear, ET, P, D_wy_max, ET_summer, Q_mm, etc, which represent the total (or maximum, for D_wy_max) wateryear sum.

Return type

df, df

waterpyk.plots

waterpyk.plots.plot_RWS(studyareaobject=None, df_deficit=None, smax=None, **plot_kwargs)

Plot root-zone water storage (RWS) timeseries, following Rempe-McCormick et al. (in prep) SI Fig 1. Either studyarea object must be supplied OR the df_deficit and smax params with the necessary data for plotting (ie columns for ‘D’ and ‘date’ and smax value)

Parameters
  • studyareaobject (object from studyarea class, optional) –

  • deficit_timeseries (df, optional) – dataframe with deficit timeseries and columns ‘D’, and ‘date’ at minimum.

  • smax (int or float, optional) – root-zone water storage capacity (Smax) in mm

  • **xmin (str, optional) – default = ‘2003-10-01’

  • **xmax (str, optional) – default = ‘2020-10-01’

  • **xlabel (str, optional) – default = ‘Date’

  • **ylabel (str, optional) – default = ‘RWS (mm)’

  • **dpi (int, optional) – default = 300

  • **figsize (tuple, optional) – default = (6,4)

  • **legend (bool, optional) – default = False

  • **title (str, optional) – default = None

  • **color_D (str, optional) – default = ‘black’

  • **linestyle_D (str, optional) – default = ‘-’

  • **lw (float, optional) – lineweight, default = 1

Returns

fig

waterpyk.plots.plot_p_distribution(studyareaobject=None, df_wateryear_totals=None, smax=None, **plot_kwargs)

Plot distribution of wateryear precipitation (P) and root-zone water storage capacity (Smax) in the form of Rempe-McCormick et al. (in prep) Fig 1c,f. Either studyarea object must be supplied OR the df_wateryear_totals and smax param with the necessary data for plotting.

Parameters
  • studyareaobject (object from studyarea class, optional) –

  • df_wateryear_totals (df, optional) – dataframe with wateryear cumulative P and a column named ‘P’ at minimum.

  • smax (int or float, optional) – value for root-zone water storage capacity (Smax) in mm

  • **xmin (int, optional) – default = 0

  • **xmax (int, optional) – default = 4000

  • **xlabel (str, optional) – default = ‘mm’

  • **ylabel (str, optional) – default = ‘Density’

  • **dpi (int, optional) – default = 300

  • **figsize (tuple, optional) – default = (6,4)

  • **legend (bool, optional) – default = True

  • **title (str, optional) – default = None

Returns

fig

waterpyk.plots.plot_spearman(studyareaobject=None, df_wateryear_totals=None, **plot_kwargs)

Scatter plot of wateryear precipitation and summer (ie dry season) ET, following Rempe-McCormick et al. (in prep) Fig 1b,e. Spearman correlation coefficient (rho) and p-value is given in upper right corner of plot. Either studyarea object must be supplied OR the df_wateryear_totals param with the necessary data for plotting (ie wateryear, P, and ET_summer columns).

Parameters
  • studyareaobject (object from studyarea class, optional) –

  • df_wateryear_totals (df, optional) – dataframe with columns ‘P’, ‘ET_summer’, and ‘wateryear’

  • deficit_timeseries (df, optional) – dataframe with deficit timeseries and columns ‘D’, ‘D_wy’ and ‘date’ at minimum.

  • **xmin (int, optional) – default = 0

  • **xmax (int, optional) – default = 3000

  • **dpi (int, optional) – default = 300

  • **figsize (tuple, optional) – default = (6,4)

  • **color_WY (bool, optional) – This puts a wateryear legend and colors the dots by wateryear. default = True

  • **title (str, optional) – default = None

  • **markeredgecolor (str, optional) – default = ‘black’

  • **lw (float, optional) – lineweight, default = 1.5

Returns

fig

waterpyk.plots.plot_timeseries(studyareaobject=None, df_daily_wide=None, deficit_timeseries=None, **plot_kwargs)

Plot daily (wateryear cumulative) timeseries, following Rempe-McCormick et al. (in prep) Fig 1b,e. Either studyarea object must be supplied OR the df_daily_wide and df_timeseries params with the necessary data for plotting. Use **kwargs to choose which datasets to plot, from the options of P, ET, deficit, deficit_wateryear, and streamflow (Q) if available.

Parameters
  • studyareaobject (object from studyarea class, optional) –

  • df_daily_wide (df, optional) – dataframe with daily cumulative wateryear timeseries and columns corresponding to the chosen data (from **kwargs), at minimum, such as ‘ET_cumulative’, ‘P_cumulative’, and ‘Q_mm_cumulative’, and ‘date’ columns.

  • deficit_timeseries (df, optional) – dataframe with deficit timeseries and columns ‘D’, ‘D_wy’ and ‘date’ at minimum.

  • **xmin (str, optional) – default = ‘2003-10-01’

  • **xmax (str, optional) – default = ‘2020-10-01’

  • **xlabel (str, optional) – default = ‘Date’

  • **ylabel (str, optional) – default = ‘[mm]’

  • **dpi (int, optional) – default = 300

  • **figsize (tuple, optional) – default = (6,4)

  • **legend (bool, optional) – default = True

  • **title (str, optional) – default = None

  • **plot_Q (bool, optional) – default = False

  • **plot_P (bool, optional) – default = True

  • **plot_D (bool, optional) – default = True

  • **plot_Dwy (bool, optional) – default = True

  • **plot_ET (bool, optional) – default = False

  • **plot_ET_dry (bool, optional) – default = False

  • **color_Q (str, optional) – default = ‘blue’

  • **color_P (str, optional) – default = ‘#b1d6f0’

  • **color_D (str, optional) – default = ‘black’

  • **color_Dwy (str, optional) – default = ‘black’

  • **color_ET (str, optional) – default = ‘purple’

  • **linestyle_Q (str, optional) – default = ‘-’

  • **linestyle_P (str, optional) – default = ‘-’

  • **linestyle_D (str, optional) – default = ‘-’

  • **linestyle_Dwy (str, optional) – default = ‘–’

  • **linestyle_ET (str, optional) – default = ‘-’

  • **lw (float, optional) – lineweight, default = 1.5

Returns

fig

waterpyk.plots.plot_wateryear_totals(studyareaobject=None, df_wateryear_totals=None, **plot_kwargs)

Plot summed wateryear timeseries. Either studyarea object must be supplied OR the df_wateryear_totals param with the necessary data for plotting. Use **kwargs to choose which datasets to plot, from the options of P, ET, ET_summer, deficit, and streamflow (Q) if available.

Parameters
  • studyareaobject (object from studyarea class, optional) –

  • df_wateryear_totals (df, optional) – dataframe with cumulative wateryear timeseries and columns corresponding to the chosen data (from **kwargs), at minimum, such as ‘P’, ‘D’, ‘ET’, ‘ET_summer’, and ‘Q_mm’, and ‘wateryear’ columns.

  • **xmin (int, optional) – default = 2003

  • **xmax (int, optional) – default = 2020

  • **xlabel (str, optional) – default = ‘Wateryear’

  • **ylabel (str, optional) – default = ‘[mm]’

  • **dpi (int, optional) – default = 300

  • **figsize (tuple, optional) – default = (6,4)

  • **legend (bool, optional) – default = True

  • **title (str, optional) – default = None

  • **plot_Q (bool, optional) – default = True

  • **plot_P (bool, optional) – default = True

  • **plot_D (bool, optional) – default = True

  • **plot_ET (bool, optional) – default = True

  • **plot_ET_dry (bool, optional) – default = False

  • **color_Q (str, optional) – default = ‘blue’

  • **color_P (str, optional) – default = ‘#b1d6f0’

  • **color_D (str, optional) – default = ‘black’

  • **color_ET (str, optional) – default = ‘purple’

  • **linestyle_Q (str, optional) – default = ‘-o’

  • **linestyle_P (str, optional) – default = ‘-o’

  • **linestyle_D (str, optional) – default = ‘-’

  • **linestyle_ET (str, optional) – default = ‘-o’

  • **lw (float, optional) – lineweight, default = 1.5

  • **twinx (bool, optional) – Put ET data on twin x-axis to P data. default = True

Returns

fig

waterpyk

Top-level package for waterpyk.

waterpyk.in_colab_shell()

Checks if code is being executed within Google Colab to decide on default_saving_dir (default saving directory path). Function from geemap by Qiusheng Wu

Returns

bool

waterpyk.load_data(layers)

Finds the import data (either all.csv or minimal.csv corresponding to layers input) and loads it.

Parameters

layers (str, optional) – the GEE input layers to be extracted. Options if string are ‘minimal’ or ‘all’.

Returns

dataframe of read csv from layers.

Return type

df