Usage

Start by importing Benchmarks for Hydrologic Timeseries.

import hydrobm

Main calculation function

HydroBM provides a main function to calculate the benchmark timeseries. This is a catch-all function that lets you set up a complete benchmarking exercise for a given time series of observed streamflow (and optionally other variables, depending on the selected benchmarks). Functions are accessible outside of this main function too for more granular setups.

hydrobm.calculate.calc_bm(data, cal_mask, val_mask=[], precipitation='precipitation', streamflow='streamflow', benchmarks=['daily_mean_flow'], metrics=['rmse'], optimization_method='brute_force', calc_snowmelt=False, temperature='temperature', snowmelt_threshold=0.0, snowmelt_rate=3.0)[source]

Calculate benchmark model scores for a given set of benchmark models and metrics.

Parameters:

datapandas DataFrame or xarray Dataset: Input data containing precipitation and streamflow columns.
cal_maskpandas Series: Boolean mask for the calculation period.
val_maskpandas Series, optional: Boolean mask for the validation period. Default is [] (no validation scores returned).
precipitationstr, optional: Name of the precipitation column in the input data. Default is ‘precipitation’.
streamflowstr, optional: Name of the streamflow column in the input data. Default is ‘streamflow’.
benchmarkslist, optional: List of benchmark models to calculate. Default is [‘daily_mean_flow’].
metricslist, optional: List of metrics to calculate. Default is [‘rmse’].
optimization_methodstr, optional: Optimization method to use for benchmark model calibration. Default is ‘brute_force’.
calc_snowmeltbool, optional: Flag to run a basic snow accumulation and melt model. Default is False.
temperaturestr, optional: Name of the temperature column in the input data. Default is ‘temperature’.
snowmelt_thresholdfloat, optional: Threshold temperature for snowmelt calculation. Default is 0.0 [C].

Returns:

benchmark_flowspandas DataFrame: DataFrame containing benchmark flows for each benchmark model.
metricsdict: Dictionary containing metric scores for each benchmark model.

Benchmark Efficiency (BME) function

HydroBM also provides a function to calculate skill scores termed benchmark efficiencies (BME) (Schaefli & Gupta, 2007) between hydrological model simulations and benchmark timeseries. This function supports the Schaefli and Gupta (2007) and Siebert (2001) formulation of the BME skill score, as well as a skill score formulation of the KGE (Knoben et al. 2019). This function is functionally identical to calc_bm, but also requires simulated streamflow and the desired formulation of the BME.

hydrobm.calculate.calc_bme(data, cal_mask, val_mask=[], precipitation='precipitation', streamflow='streamflow', simulated_flow='simulated_flow', benchmarks=['daily_mean_flow'], metrics=['rmse'], optimization_method='brute_force', formulation='bme_nse', calc_snowmelt=False, temperature='temperature', snowmelt_threshold=0.0, snowmelt_rate=3.0)[source]

Calculate Benchmark Efficiency (BME) scores alongside standard metric scores.

Parameters:

datapandas DataFrame or xarray Dataset: Input data containing precipitation, streamflow, and simulated flow columns.
cal_maskpandas Series: Boolean mask for the calibration period.
val_maskpandas Series, optional: Boolean mask for the validation period. Default is [] (no validation scores returned).
precipitationstr, optional: Name of the precipitation column in the input data. Default is ‘precipitation’.
streamflowstr, optional: Name of the streamflow column in the input data. Default is ‘streamflow’.
simulated_flowstr, optional: Name of the simulated flow column in the input data. Default is ‘simulated_flow’.
benchmarkslist, optional: List of benchmark models to calculate. Default is [‘daily_mean_flow’].
metricslist, optional: List of metrics to calculate via calc_bm. Default is [‘rmse’].
optimization_methodstr, optional: Optimization method for benchmark model calibration. Default is ‘brute_force’.
formulationstr, optional: BME formulation. Options: - ‘bme_nse’ (default): BME = 1 - sum((q_obs-q_sim)^2) / sum((q_obs-q_b)^2) - ‘bme_kge’: BME = (KGE_model - KGE_benchmark) / (1 - KGE_benchmark)
calc_snowmeltbool, optional: Flag to run a basic snow accumulation and melt model. Default is False.
temperaturestr, optional: Name of the temperature column in the input data. Default is ‘temperature’.
snowmelt_thresholdfloat, optional: Threshold temperature for snowmelt. Default is 0.0 [C].
snowmelt_ratefloat, optional: Rate of snowmelt. Default is 3.0.

Returns:

bme_scoresdict: Dictionary of BME scores for each benchmark.
benchmark_flowspandas DataFrame: DataFrame containing benchmark flows for each benchmark model (from calc_bm).
resultsdict: Dictionary of standard metric scores for each benchmark (from calc_bm).

Benchmarks

Within their respective category, benchmarks are all set up to require the same inputs. Click on each benchmark in the table for more information.

Benchmarks that rely on streamflow data only

`hydrobm.benchmarks.bm_mean_flow`(data, cal_mask)	Calculate the mean flow over the calculation period and use that as a predictor for all timesteps in the whole dataframe.
`hydrobm.benchmarks.bm_median_flow`(data, cal_mask)	Calculate the median flow over the calculation period and use that as a predictor for all timesteps in the whole dataframe.
`hydrobm.benchmarks.bm_annual_mean_flow`(data, ...)	Calculate the annual mean flow over the calculation period and use that as a predictor for each year in the calculation period.
`hydrobm.benchmarks.bm_annual_median_flow`(...)	Calculate the annual median flow over the calculation period and use that as a predictor for each year in the calculation period.
`hydrobm.benchmarks.bm_monthly_mean_flow`(...)	Calculate the monthly mean flow over the calculation period and use that as a predictor for each month in the whole dataframe.
`hydrobm.benchmarks.bm_monthly_median_flow`(...)	Calculate the monthly median flow over the calculation period and use that as a predictor for each month in the whole dataframe.
`hydrobm.benchmarks.bm_daily_mean_flow`(data, ...)	Calculate the daily mean flow over the calculation period and use that as a predictor for each day in the whole dataframe.
`hydrobm.benchmarks.bm_daily_median_flow`(...)	Calculate the daily median flow over the calculation period and use that as a predictor for each day in the whole dataframe.

Benchmarks that rely on precipitation and streamflow

`hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_all`(...)	Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff using precipitation totals from the calculation period and non-calculation period respectively.
`hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_annual`(...)	Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each year in the whole dataframe.
`hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_monthly`(...)	Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each month in the whole dataframe.
`hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_daily`(...)	Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each day in the whole dataframe.
`hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_timestep`(...)	Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.
`hydrobm.benchmarks.bm_monthly_rainfall_runoff_ratio_to_monthly`(...)	Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each month in the whole dataframe.
`hydrobm.benchmarks.bm_monthly_rainfall_runoff_ratio_to_daily`(...)	Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each day in the whole dataframe.
`hydrobm.benchmarks.bm_monthly_rainfall_runoff_ratio_to_timestep`(...)	Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.
`hydrobm.benchmarks.bm_annual_scaled_daily_mean_flow`(...)	Calculate the daily mean flow scaled by annual precipitation anomalies.
`hydrobm.benchmarks.bm_monthly_scaled_daily_mean_flow`(...)	Calculate the daily mean flow scaled by monthly precipitation anomalies.
`hydrobm.benchmarks.bm_scaled_precipitation_benchmark`(...)	Calculate the scaled precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.

Parsimonious model benchmarks

`hydrobm.benchmarks.bm_eckhardt_baseflow`(...)	Baseflow separation using Eckhardt filter to create a mean annual baseflow signal.
`hydrobm.benchmarks.bm_adjusted_precipitation_benchmark`(...)	Calculate the adjusted precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.
`hydrobm.benchmarks.bm_adjusted_smoothed_precipitation_benchmark`(...)	Calculate the adjusted smoothed precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.

Benchmark support functions

hydrobm.benchmarks.create_bm(data, benchmark, cal_mask, precipitation='precipitation', streamflow='streamflow', optimization_method='brute_force')[source]

Helper function to call the correct benchmark model function; makes looping over benchmark models easier.

Parameters:

datapandas DataFrame: Input data containing precipitation and streamflow columns.
benchmarkstr: Benchmark model to calculate.
cal_maskpandas Series: Boolean mask for the calculation period.
precipitationstr, optional: Name of the precipitation column in the input data. Default is [‘precipitation’].
streamflowstr, optional: Name of the streamflow column in the input data. Default is [‘streamflow’].
optimization_methodstr, optional: Optimization method to create adjusted (snoothed) precipitation benchmark. Default is [‘brute_force’].

Returns:

bm_values: pandas DataSeries: Benchmark values for the given benchmark model.
qbmpandas DataFrame: Benchmark flow time series for the given benchmark model.

hydrobm.benchmarks.evaluate_bm(data, benchmark_flow, metric, cal_mask, val_mask=None, streamflow='streamflow', ignore_nan=True)[source]

Helper function to calculate calculation and evaluation metric scores for a given set of observations and benchmark flows.

Parameters:

datapandas DataFrame: Input data containing streamflow observation column.
benchmark_flowpandas DataFrame: Benchmark flow time series as returned by one of the benchmark model functions.
metricstr: Name of the metric to calculate. See hydrobm/metrics for a list.
cal_maskpandas Series: Boolean mask for the calculation period.
val_maskpandas Series, optional: Boolean mask for the evaluation period. Default is None (no evaluation score returned).
streamflowstr, optional: Name of the streamflow column in the input data. Default is [‘streamflow’].
ignore_nanbool, optional: Flag to consider only non-NaN values. Default is True.

Returns:

cal_score: float: Metric score for the calculation period.
val_score: float: Metric score for the evaluation period. NaN if no val_mask specified.

Benchmark optimization functions

Only used by the Eckhardt Baseflow, Adjusted Precipitation Benchmark (APB), and Adjusted Smoothed Precipitation Benchmark (ASPB) to optimize or estimate their respective parameters.

hydrobm.utils.optimize_apb(scaled_precip, streamflow, method, max_lag=30)[source]

Wrapper function around adjusted precipitation benchmark model optimization functions.

Parameters:

scaled_precippandas Series: Scaled precipitation data.
streamflowpandas Series: Streamflow data.
methodstr: Optimization method to use. Currently supports “brute_force” and “minimize”.
max_lagint, optional: Maximum lag to consider. Default is 30.

Returns:

best_lagint: Best lag value.
best_msefloat: Best mean squared error value.

hydrobm.utils.brute_force_apb(scaled_precip, streamflow, max_lag=30)[source]

Optimize the lag for the adjusted precipitation benchmark model using brute force.

Parameters:

scaled_precippandas Series: Scaled precipitation data.
streamflowpandas Series: Streamflow data.
max_lagint, optional: Maximum lag to consider. Default is 30.

Returns:

best_lagint: Best lag value.
best_msefloat: Best mean squared error value.

hydrobm.utils.minimize_scalar_apb(scaled_precip, streamflow, max_lag=30)[source]

Optimize the lag for the adjusted precipitation benchmark model using scipy.optimize.minimize_scalar.

Parameters:

scaled_precippandas Series: Scaled precipitation data.
streamflowpandas Series: Streamflow data.
max_lagint, optional: Maximum lag to consider. Default is 30.

Returns:

best_lagint: Best lag value.
best_msefloat: Best mean squared error value.

Notes

scipy.optimize.minimize_scalar is not designed for use with integer-only solutions. Here we use the round function to enforce integer solutions. This seems to work for simple test cases, but results for real data may vary. User caution is advised. Use brute force optimization if 100% accurate solutions are required.

hydrobm.utils.optimize_aspb(scaled_precip, streamflow, method, max_lag=30, max_window=90)[source]

Wrapper function around adjusted smoothed precipitation benchmark model optimization functions.

Parameters:

scaled_precippandas Series: Scaled precipitation data.
streamflowpandas Series: Streamflow data.
methodstr: Optimization method to use. Currently supports “brute_force” and “minimize”.
max_lagint, optional: Maximum lag to consider. Default is 30.
max_window: int, optional: Maximum smoothing window length to consider. Default is 90.

Returns:

best_lagint: Best lag value.
best_window: int: Best window value.
best_msefloat: Best mean squared error value.

hydrobm.utils.brute_force_aspb(scaled_precip, streamflow, max_lag=30, max_window=90)[source]

Optimize the lag and window for adjusted smoothed precipitation benchmark model using brute force.

Parameters:

scaled_precippandas Series: Scaled precipitation data.
streamflowpandas Series: Streamflow data.
max_lagint, optional: Maximum lag to consider. Default is 30.
max_window: int, optional: Maximum smoothing window length to consider. Default is 90.

Returns:

best_lagint: Best lag value.
best_window: int: Best window value.
best_msefloat: Best mean squared error value.

hydrobm.utils.minimize_aspb(scaled_precip, streamflow, max_lag=30, max_window=90, method='Powell')[source]

Optimize the lag and window for the ASPB model using scipy.optimize.minimize.

Parameters:

scaled_precippandas Series: Scaled precipitation data.
streamflowpandas Series: Streamflow data.
max_lagint, optional: Maximum lag to consider. Default is 30.
max_window: int, optional: Maximum smoothing window length to consider. Default is 90.
method: str, optional: Optimization method to use. Default is ‘Powell’. See scipy.optimize.minimize for more options.

Returns:

best_lagint: Best lag value.
best_window: int: Best window value.
best_msefloat: Best mean squared error value.

Notes

scipy.optimize.minimize is not designed for use with integer-only solutions. Here we use the round function to enforce integer solutions. The ‘Powell’ optimization method seems to return appropriate lag and window values in simple test cases, but results for real data may vary. User caution is advised. Use brute force optimization if 100% accurate solutions are required.

hydrobm.utils.estimate_eckhardt_parameters(streamflow, precip, precip_window_days=3, precip_threshold=0.1)[source]

Estimate both recession coefficient (k) and maximum baseflow index (BFI_max) which are required for baseflow separation as outlined by Eckhardt (2005).

This function combines recession analysis to estimate k with the backward filter method from Collischonn & Fan (2013) to estimate BFI_max. Automatically detects the timestep from the data and adjusts time window accordingly.

Parameters:

streamflowpandas Series: Observed streamflow with DatetimeIndex.
precippandas Series: Precipitation data with DatetimeIndex.
precip_window_daysfloat, optional: Number of DAYS to check for precipitation when identifying recessions. Automatically converted to appropriate number of timesteps based on data frequency. Default is 3 days.
precip_thresholdfloat, optional: Precipitation threshold in same units as precip data (e.g., mm/day or kg/m²/s). Default is 0.1.

Returns:

kfloat: Recession coefficient estimated from recession periods (for native timestep).
BFI_maxfloat: Maximum baseflow index estimated using backward filter method.

References

Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507-515.

Collischonn, W., & Fan, F. M. (2013). Defining parameters for Eckhardt’s digital baseflow filter. Hydrological Processes, 27(18), 2614–2622. https://doi.org/10.1002/hyp.9391

hydrobm.utils.eckhardt_filter(Q, BFI_max, k)[source]

Eckhardt two-parameter digital filter for baseflow separation.

The Eckhardt filter was found to be the best of 9 evaluated baseflow separation methods in Xie et al. (2020), showing superior performance across diverse catchment conditions.

Parameters:

Qpandas Series or numpy array: Streamflow time series.
BFI_maxfloat: Maximum baseflow index.
kfloat: Recession constant.

Returns:

baseflowpandas Series or numpy array: Separated baseflow component.

References

Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507-515.

Xie, J., Liu, X., Wang, K., Yang, T., Liang, K., & Liu, C. (2020). Evaluation of typical methods for baseflow separation in the contiguous United States. Journal of Hydrology, 583, 124628. https://doi.org/10.1016/j.jhydrol.2020.124628

Metrics

hydrobm.metrics.mse(obs, sim, ignore_nan=True)[source]

Calculate mean square error.

Parameters:

obsarray-like: Observed values.
simarray-like: Simulated values.
ignore_nanbool, optional: Flag to consider only non-NaN values. Default is True.

Returns:

float: Mean square error.

hydrobm.metrics.rmse(obs, sim, ignore_nan=True)[source]

Calculate root mean square error.

Parameters:

obsarray-like: Observed values.
simarray-like: Simulated values.

Returns:

float: Root mean square error.

hydrobm.metrics.nse(obs, sim, ignore_nan=True)[source]

Calculate Nash-Sutcliffe efficiency.

Parameters:

obsarray-like: Observed values.
simarray-like: Simulated values.
ignore_nanbool, optional: Flag to consider only non-NaN values. Default is True.

Returns:

float: Nash-Sutcliffe efficiency.

hydrobm.metrics.kge(obs, sim, ignore_nan=True)[source]

Calculate Kling-Gupta efficiency.

Parameters:

obsarray-like: Observed values.
simarray-like: Simulated values.
ignore_nanbool, optional: Flag to consider only non-NaN values. Default is True.

Returns:

float: Kling-Gupta efficiency.

Metric support functions

hydrobm.metrics.calculate_metric(obs, sim, metric, ignore_nan=True)[source]

Helper function to check metric existence and simplify loops.

Parameters:

obsarray-like: Observed values.
simarray-like: Simulated values.
metric: str: Name of the metric to calculate.
ignore_nanbool, optional: Flag to consider only non-NaN values. Default is True.

Returns:

float: Metric score.

hydrobm.metrics.filter_nan(obs, sim)[source]

Select only non-NaN values from both observed and simulated data.

Parameters:

obsarray-like: Observed values.
simarray-like: Simulated values.

Returns:

tuple: Tuple of arrays with non-NaN values from both observed and simulated data.

Utilities

hydrobm.utils.rain_to_melt(data, precipitation='precipitation', temperature='temperature', snow_and_melt_temp=0.0, snow_and_melt_rate=3.0)[source]

Calculate snow accumulation and melt based on temperature thresholds.

Parameters:

datapandas DataFrame: Input data containing precipitation and temperature columns.
precipitationstr, optional: Name of the precipitation column in the input data. Default is ‘precipitation’.
temperaturestr, optional’: Name of the temperature column in the input data. Default is ‘temperature’.
snow_and_melt_tempfloat, optional: Temperature threshold for snow accumulation and melt. Default is 0.0 [C].
snow_and_melt_ratefloat, optional: Snow melt rate if temperature above threshold. Default is 3.0 [mm/timestep/degree C].

Returns:

datapandas DataFrame: Input data with additional columns for snow depth and rain plus melt.

Notes

The default values for snow_and_melt_temp and snow_and_melt_rate are given in units of degrees Celsius and millimeters per time step per degree Celsius, respectively. These are not used in the code however, as the function is designed to work with any units.

For example, providing the input data in Kelvin and setting snow_and_melt_temp to 273.15 will work as expected. Similarly, if the input precipitation data is not in millimeters, simply providing the snow_and_melt_rate in those same units will yield the correct output.

hydrobm.utils.bme_nse(q_obs, q_sim, q_bm, cal_mask, val_mask=None)[source]

Calculate NSE-based Benchmark Efficiency (BME) for cal and val periods. The formulation can be found in Seibert (2001) and Schaefli and Gupta (2007).

BME = 1 - sum((q_obs - q_sim)^2) / sum((q_obs - q_bm)^2)

Parameters:

q_obspandas Series: Observed streamflow.
q_simpandas Series: Simulated streamflow.
q_bmpandas Series: Benchmark streamflow.
cal_maskpandas Series: Boolean mask for the calibration period.
val_maskpandas Series, optional: Boolean mask for the validation period. Default is None (no val score returned).

Returns:

cal_scorefloat: NSE-based BME score for the calibration period.
val_scorefloat: NSE-based BME score for the validation period. NaN if no val_mask specified.

References

Seibert, J. (2001). On the need for benchmarks in hydrological modelling. Hydrological Processes, 15(6), 1063–1064. https://doi.org/10.1002/hyp.446

Schaefli, B., & Gupta, H. V. (2007). Do Nash values have value? Hydrological Processes, 21(15), 2075–2080. https://doi.org/10.1002/hyp.6825

hydrobm.utils.bme_kge(q_obs, q_sim, q_bm, cal_mask, val_mask=None)[source]

Calculate KGE-based Benchmark Model Efficiency (KGE skill score) for cal and val periods. This skill score formulation can be found in Knoben et al. (2019) among others.

KGE_skill = (KGE_model - KGE_benchmark) / (1 - KGE_benchmark)

Parameters:

q_obspandas Series: Observed streamflow.
q_simpandas Series: Simulated streamflow.
q_bmpandas Series: Benchmark streamflow.
cal_maskpandas Series: Boolean mask for the calibration period.
val_maskpandas Series, optional: Boolean mask for the validation period. Default is None (no val score returned).

Returns:

cal_scorefloat: KGE skill score for the calibration period.
val_scorefloat: KGE skill score for the validation period. NaN if no val_mask specified.

References

Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323–4331. https://doi.org/10.5194/hess-23-4323-2019

References

Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323–4331. https://doi.org/10.5194/hess-23-4323-2019

Schaefli, B., & Gupta, H. V. (2007). Do Nash values have value? Hydrological Processes, 21(15), 2075–2080. https://doi.org/10.1002/hyp.6825

Seibert, J. (2001). On the need for benchmarks in hydrological modelling. Hydrological Processes, 15(6), 1063–1064. https://doi.org/10.1002/hyp.446