Usage

Start by importing Benchmarks for Hydrologic Timeseries.

import hydrobm

Main calculation function

HydroBM provides a main function to calculate the benchmark timeseries. This is a catch-all function that lets you set up a complete benchmarking exercise for a given time series of observed streamflow (and optionally other variables, depending on the selected benchmarks). Functions are accessible outside of this main function too for more granular setups.

hydrobm.calculate.calc_bm(data, cal_mask, val_mask=[], precipitation='precipitation', streamflow='streamflow', benchmarks=['daily_mean_flow'], metrics=['rmse'], optimization_method='brute_force', calc_snowmelt=False, temperature='temperature', snowmelt_threshold=0.0, snowmelt_rate=3.0)[source]

Calculate benchmark model scores for a given set of benchmark models and metrics.

Parameters:
datapandas DataFrame or xarray Dataset

Input data containing precipitation and streamflow columns.

cal_maskpandas Series

Boolean mask for the calculation period.

val_maskpandas Series, optional

Boolean mask for the validation period. Default is [] (no validation scores returned).

precipitationstr, optional

Name of the precipitation column in the input data. Default is ‘precipitation’.

streamflowstr, optional

Name of the streamflow column in the input data. Default is ‘streamflow’.

benchmarkslist, optional

List of benchmark models to calculate. Default is [‘daily_mean_flow’].

metricslist, optional

List of metrics to calculate. Default is [‘rmse’].

optimization_methodstr, optional

Optimization method to use for benchmark model calibration. Default is ‘brute_force’.

calc_snowmeltbool, optional

Flag to run a basic snow accumulation and melt model. Default is False.

temperaturestr, optional

Name of the temperature column in the input data. Default is ‘temperature’.

snowmelt_thresholdfloat, optional

Threshold temperature for snowmelt calculation. Default is 0.0 [C].

Returns:
benchmark_flowspandas DataFrame

DataFrame containing benchmark flows for each benchmark model.

metricsdict

Dictionary containing metric scores for each benchmark model.

Benchmark Efficiency (BME) function

HydroBM also provides a function to calculate skill scores termed benchmark efficiencies (BME) (Schaefli & Gupta, 2007) between hydrological model simulations and benchmark timeseries. This function supports the Schaefli and Gupta (2007) and Siebert (2001) formulation of the BME skill score, as well as a skill score formulation of the KGE (Knoben et al. 2019). This function is functionally identical to calc_bm, but also requires simulated streamflow and the desired formulation of the BME.

hydrobm.calculate.calc_bme(data, cal_mask, val_mask=[], precipitation='precipitation', streamflow='streamflow', simulated_flow='simulated_flow', benchmarks=['daily_mean_flow'], metrics=['rmse'], optimization_method='brute_force', formulation='bme_nse', calc_snowmelt=False, temperature='temperature', snowmelt_threshold=0.0, snowmelt_rate=3.0)[source]

Calculate Benchmark Efficiency (BME) scores alongside standard metric scores.

Parameters:
datapandas DataFrame or xarray Dataset

Input data containing precipitation, streamflow, and simulated flow columns.

cal_maskpandas Series

Boolean mask for the calibration period.

val_maskpandas Series, optional

Boolean mask for the validation period. Default is [] (no validation scores returned).

precipitationstr, optional

Name of the precipitation column in the input data. Default is ‘precipitation’.

streamflowstr, optional

Name of the streamflow column in the input data. Default is ‘streamflow’.

simulated_flowstr, optional

Name of the simulated flow column in the input data. Default is ‘simulated_flow’.

benchmarkslist, optional

List of benchmark models to calculate. Default is [‘daily_mean_flow’].

metricslist, optional

List of metrics to calculate via calc_bm. Default is [‘rmse’].

optimization_methodstr, optional

Optimization method for benchmark model calibration. Default is ‘brute_force’.

formulationstr, optional

BME formulation. Options: - ‘bme_nse’ (default): BME = 1 - sum((q_obs-q_sim)^2) / sum((q_obs-q_b)^2) - ‘bme_kge’: BME = (KGE_model - KGE_benchmark) / (1 - KGE_benchmark)

calc_snowmeltbool, optional

Flag to run a basic snow accumulation and melt model. Default is False.

temperaturestr, optional

Name of the temperature column in the input data. Default is ‘temperature’.

snowmelt_thresholdfloat, optional

Threshold temperature for snowmelt. Default is 0.0 [C].

snowmelt_ratefloat, optional

Rate of snowmelt. Default is 3.0.

Returns:
bme_scoresdict

Dictionary of BME scores for each benchmark.

benchmark_flowspandas DataFrame

DataFrame containing benchmark flows for each benchmark model (from calc_bm).

resultsdict

Dictionary of standard metric scores for each benchmark (from calc_bm).

Benchmarks

Within their respective category, benchmarks are all set up to require the same inputs. Click on each benchmark in the table for more information.

Benchmarks that rely on streamflow data only

hydrobm.benchmarks.bm_mean_flow(data, cal_mask)

Calculate the mean flow over the calculation period and use that as a predictor for all timesteps in the whole dataframe.

hydrobm.benchmarks.bm_median_flow(data, cal_mask)

Calculate the median flow over the calculation period and use that as a predictor for all timesteps in the whole dataframe.

hydrobm.benchmarks.bm_annual_mean_flow(data, ...)

Calculate the annual mean flow over the calculation period and use that as a predictor for each year in the calculation period.

hydrobm.benchmarks.bm_annual_median_flow(...)

Calculate the annual median flow over the calculation period and use that as a predictor for each year in the calculation period.

hydrobm.benchmarks.bm_monthly_mean_flow(...)

Calculate the monthly mean flow over the calculation period and use that as a predictor for each month in the whole dataframe.

hydrobm.benchmarks.bm_monthly_median_flow(...)

Calculate the monthly median flow over the calculation period and use that as a predictor for each month in the whole dataframe.

hydrobm.benchmarks.bm_daily_mean_flow(data, ...)

Calculate the daily mean flow over the calculation period and use that as a predictor for each day in the whole dataframe.

hydrobm.benchmarks.bm_daily_median_flow(...)

Calculate the daily median flow over the calculation period and use that as a predictor for each day in the whole dataframe.

Benchmarks that rely on precipitation and streamflow

hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_all(...)

Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff using precipitation totals from the calculation period and non-calculation period respectively.

hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_annual(...)

Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each year in the whole dataframe.

hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_monthly(...)

Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each month in the whole dataframe.

hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_daily(...)

Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each day in the whole dataframe.

hydrobm.benchmarks.bm_rainfall_runoff_ratio_to_timestep(...)

Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.

hydrobm.benchmarks.bm_monthly_rainfall_runoff_ratio_to_monthly(...)

Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each month in the whole dataframe.

hydrobm.benchmarks.bm_monthly_rainfall_runoff_ratio_to_daily(...)

Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each day in the whole dataframe.

hydrobm.benchmarks.bm_monthly_rainfall_runoff_ratio_to_timestep(...)

Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.

hydrobm.benchmarks.bm_annual_scaled_daily_mean_flow(...)

Calculate the daily mean flow scaled by annual precipitation anomalies.

hydrobm.benchmarks.bm_monthly_scaled_daily_mean_flow(...)

Calculate the daily mean flow scaled by monthly precipitation anomalies.

hydrobm.benchmarks.bm_scaled_precipitation_benchmark(...)

Calculate the scaled precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.

Parsimonious model benchmarks

hydrobm.benchmarks.bm_eckhardt_baseflow(...)

Baseflow separation using Eckhardt filter to create a mean annual baseflow signal.

hydrobm.benchmarks.bm_adjusted_precipitation_benchmark(...)

Calculate the adjusted precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.

hydrobm.benchmarks.bm_adjusted_smoothed_precipitation_benchmark(...)

Calculate the adjusted smoothed precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe.

Benchmark support functions

hydrobm.benchmarks.create_bm(data, benchmark, cal_mask, precipitation='precipitation', streamflow='streamflow', optimization_method='brute_force')[source]

Helper function to call the correct benchmark model function; makes looping over benchmark models easier.

Parameters:
datapandas DataFrame

Input data containing precipitation and streamflow columns.

benchmarkstr

Benchmark model to calculate.

cal_maskpandas Series

Boolean mask for the calculation period.

precipitationstr, optional

Name of the precipitation column in the input data. Default is [‘precipitation’].

streamflowstr, optional

Name of the streamflow column in the input data. Default is [‘streamflow’].

optimization_methodstr, optional

Optimization method to create adjusted (snoothed) precipitation benchmark. Default is [‘brute_force’].

Returns:
bm_values: pandas DataSeries

Benchmark values for the given benchmark model.

qbmpandas DataFrame

Benchmark flow time series for the given benchmark model.

hydrobm.benchmarks.evaluate_bm(data, benchmark_flow, metric, cal_mask, val_mask=None, streamflow='streamflow', ignore_nan=True)[source]

Helper function to calculate calculation and evaluation metric scores for a given set of observations and benchmark flows.

Parameters:
datapandas DataFrame

Input data containing streamflow observation column.

benchmark_flowpandas DataFrame

Benchmark flow time series as returned by one of the benchmark model functions.

metricstr

Name of the metric to calculate. See hydrobm/metrics for a list.

cal_maskpandas Series

Boolean mask for the calculation period.

val_maskpandas Series, optional

Boolean mask for the evaluation period. Default is None (no evaluation score returned).

streamflowstr, optional

Name of the streamflow column in the input data. Default is [‘streamflow’].

ignore_nanbool, optional

Flag to consider only non-NaN values. Default is True.

Returns:
cal_score: float

Metric score for the calculation period.

val_score: float

Metric score for the evaluation period. NaN if no val_mask specified.

Benchmark optimization functions

Only used by the Eckhardt Baseflow, Adjusted Precipitation Benchmark (APB), and Adjusted Smoothed Precipitation Benchmark (ASPB) to optimize or estimate their respective parameters.

hydrobm.utils.optimize_apb(scaled_precip, streamflow, method, max_lag=30)[source]

Wrapper function around adjusted precipitation benchmark model optimization functions.

Parameters:
scaled_precippandas Series

Scaled precipitation data.

streamflowpandas Series

Streamflow data.

methodstr

Optimization method to use. Currently supports “brute_force” and “minimize”.

max_lagint, optional

Maximum lag to consider. Default is 30.

Returns:
best_lagint

Best lag value.

best_msefloat

Best mean squared error value.

hydrobm.utils.brute_force_apb(scaled_precip, streamflow, max_lag=30)[source]

Optimize the lag for the adjusted precipitation benchmark model using brute force.

Parameters:
scaled_precippandas Series

Scaled precipitation data.

streamflowpandas Series

Streamflow data.

max_lagint, optional

Maximum lag to consider. Default is 30.

Returns:
best_lagint

Best lag value.

best_msefloat

Best mean squared error value.

hydrobm.utils.minimize_scalar_apb(scaled_precip, streamflow, max_lag=30)[source]

Optimize the lag for the adjusted precipitation benchmark model using scipy.optimize.minimize_scalar.

Parameters:
scaled_precippandas Series

Scaled precipitation data.

streamflowpandas Series

Streamflow data.

max_lagint, optional

Maximum lag to consider. Default is 30.

Returns:
best_lagint

Best lag value.

best_msefloat

Best mean squared error value.

Notes

scipy.optimize.minimize_scalar is not designed for use with integer-only solutions. Here we use the round function to enforce integer solutions. This seems to work for simple test cases, but results for real data may vary. User caution is advised. Use brute force optimization if 100% accurate solutions are required.

hydrobm.utils.optimize_aspb(scaled_precip, streamflow, method, max_lag=30, max_window=90)[source]

Wrapper function around adjusted smoothed precipitation benchmark model optimization functions.

Parameters:
scaled_precippandas Series

Scaled precipitation data.

streamflowpandas Series

Streamflow data.

methodstr

Optimization method to use. Currently supports “brute_force” and “minimize”.

max_lagint, optional

Maximum lag to consider. Default is 30.

max_window: int, optional

Maximum smoothing window length to consider. Default is 90.

Returns:
best_lagint

Best lag value.

best_window: int

Best window value.

best_msefloat

Best mean squared error value.

hydrobm.utils.brute_force_aspb(scaled_precip, streamflow, max_lag=30, max_window=90)[source]

Optimize the lag and window for adjusted smoothed precipitation benchmark model using brute force.

Parameters:
scaled_precippandas Series

Scaled precipitation data.

streamflowpandas Series

Streamflow data.

max_lagint, optional

Maximum lag to consider. Default is 30.

max_window: int, optional

Maximum smoothing window length to consider. Default is 90.

Returns:
best_lagint

Best lag value.

best_window: int

Best window value.

best_msefloat

Best mean squared error value.

hydrobm.utils.minimize_aspb(scaled_precip, streamflow, max_lag=30, max_window=90, method='Powell')[source]

Optimize the lag and window for the ASPB model using scipy.optimize.minimize.

Parameters:
scaled_precippandas Series

Scaled precipitation data.

streamflowpandas Series

Streamflow data.

max_lagint, optional

Maximum lag to consider. Default is 30.

max_window: int, optional

Maximum smoothing window length to consider. Default is 90.

method: str, optional

Optimization method to use. Default is ‘Powell’. See scipy.optimize.minimize for more options.

Returns:
best_lagint

Best lag value.

best_window: int

Best window value.

best_msefloat

Best mean squared error value.

Notes

scipy.optimize.minimize is not designed for use with integer-only solutions. Here we use the round function to enforce integer solutions. The ‘Powell’ optimization method seems to return appropriate lag and window values in simple test cases, but results for real data may vary. User caution is advised. Use brute force optimization if 100% accurate solutions are required.

hydrobm.utils.estimate_eckhardt_parameters(streamflow, precip, precip_window_days=3, precip_threshold=0.1)[source]

Estimate both recession coefficient (k) and maximum baseflow index (BFI_max) which are required for baseflow separation as outlined by Eckhardt (2005).

This function combines recession analysis to estimate k with the backward filter method from Collischonn & Fan (2013) to estimate BFI_max. Automatically detects the timestep from the data and adjusts time window accordingly.

Parameters:
streamflowpandas Series

Observed streamflow with DatetimeIndex.

precippandas Series

Precipitation data with DatetimeIndex.

precip_window_daysfloat, optional

Number of DAYS to check for precipitation when identifying recessions. Automatically converted to appropriate number of timesteps based on data frequency. Default is 3 days.

precip_thresholdfloat, optional

Precipitation threshold in same units as precip data (e.g., mm/day or kg/m²/s). Default is 0.1.

Returns:
kfloat

Recession coefficient estimated from recession periods (for native timestep).

BFI_maxfloat

Maximum baseflow index estimated using backward filter method.

References

Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507-515.

Collischonn, W., & Fan, F. M. (2013). Defining parameters for Eckhardt’s digital baseflow filter. Hydrological Processes, 27(18), 2614–2622. https://doi.org/10.1002/hyp.9391

hydrobm.utils.eckhardt_filter(Q, BFI_max, k)[source]

Eckhardt two-parameter digital filter for baseflow separation.

The Eckhardt filter was found to be the best of 9 evaluated baseflow separation methods in Xie et al. (2020), showing superior performance across diverse catchment conditions.

Parameters:
Qpandas Series or numpy array

Streamflow time series.

BFI_maxfloat

Maximum baseflow index.

kfloat

Recession constant.

Returns:
baseflowpandas Series or numpy array

Separated baseflow component.

References

Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507-515.

Xie, J., Liu, X., Wang, K., Yang, T., Liang, K., & Liu, C. (2020). Evaluation of typical methods for baseflow separation in the contiguous United States. Journal of Hydrology, 583, 124628. https://doi.org/10.1016/j.jhydrol.2020.124628

Metrics

hydrobm.metrics.mse(obs, sim, ignore_nan=True)[source]

Calculate mean square error.

Parameters:
obsarray-like

Observed values.

simarray-like

Simulated values.

ignore_nanbool, optional

Flag to consider only non-NaN values. Default is True.

Returns:
float

Mean square error.

hydrobm.metrics.rmse(obs, sim, ignore_nan=True)[source]

Calculate root mean square error.

Parameters:
obsarray-like

Observed values.

simarray-like

Simulated values.

Returns:
float

Root mean square error.

hydrobm.metrics.nse(obs, sim, ignore_nan=True)[source]

Calculate Nash-Sutcliffe efficiency.

Parameters:
obsarray-like

Observed values.

simarray-like

Simulated values.

ignore_nanbool, optional

Flag to consider only non-NaN values. Default is True.

Returns:
float

Nash-Sutcliffe efficiency.

hydrobm.metrics.kge(obs, sim, ignore_nan=True)[source]

Calculate Kling-Gupta efficiency.

Parameters:
obsarray-like

Observed values.

simarray-like

Simulated values.

ignore_nanbool, optional

Flag to consider only non-NaN values. Default is True.

Returns:
float

Kling-Gupta efficiency.

Metric support functions

hydrobm.metrics.calculate_metric(obs, sim, metric, ignore_nan=True)[source]

Helper function to check metric existence and simplify loops.

Parameters:
obsarray-like

Observed values.

simarray-like

Simulated values.

metric: str

Name of the metric to calculate.

ignore_nanbool, optional

Flag to consider only non-NaN values. Default is True.

Returns:
float

Metric score.

hydrobm.metrics.filter_nan(obs, sim)[source]

Select only non-NaN values from both observed and simulated data.

Parameters:
obsarray-like

Observed values.

simarray-like

Simulated values.

Returns:
tuple

Tuple of arrays with non-NaN values from both observed and simulated data.

Utilities

hydrobm.utils.rain_to_melt(data, precipitation='precipitation', temperature='temperature', snow_and_melt_temp=0.0, snow_and_melt_rate=3.0)[source]

Calculate snow accumulation and melt based on temperature thresholds.

Parameters:
datapandas DataFrame

Input data containing precipitation and temperature columns.

precipitationstr, optional

Name of the precipitation column in the input data. Default is ‘precipitation’.

temperaturestr, optional’

Name of the temperature column in the input data. Default is ‘temperature’.

snow_and_melt_tempfloat, optional

Temperature threshold for snow accumulation and melt. Default is 0.0 [C].

snow_and_melt_ratefloat, optional

Snow melt rate if temperature above threshold. Default is 3.0 [mm/timestep/degree C].

Returns:
datapandas DataFrame

Input data with additional columns for snow depth and rain plus melt.

Notes

The default values for snow_and_melt_temp and snow_and_melt_rate are given in units of degrees Celsius and millimeters per time step per degree Celsius, respectively. These are not used in the code however, as the function is designed to work with any units.

For example, providing the input data in Kelvin and setting snow_and_melt_temp to 273.15 will work as expected. Similarly, if the input precipitation data is not in millimeters, simply providing the snow_and_melt_rate in those same units will yield the correct output.

hydrobm.utils.bme_nse(q_obs, q_sim, q_bm, cal_mask, val_mask=None)[source]

Calculate NSE-based Benchmark Efficiency (BME) for cal and val periods. The formulation can be found in Seibert (2001) and Schaefli and Gupta (2007).

BME = 1 - sum((q_obs - q_sim)^2) / sum((q_obs - q_bm)^2)

Parameters:
q_obspandas Series

Observed streamflow.

q_simpandas Series

Simulated streamflow.

q_bmpandas Series

Benchmark streamflow.

cal_maskpandas Series

Boolean mask for the calibration period.

val_maskpandas Series, optional

Boolean mask for the validation period. Default is None (no val score returned).

Returns:
cal_scorefloat

NSE-based BME score for the calibration period.

val_scorefloat

NSE-based BME score for the validation period. NaN if no val_mask specified.

References

Seibert, J. (2001). On the need for benchmarks in hydrological modelling. Hydrological Processes, 15(6), 1063–1064. https://doi.org/10.1002/hyp.446

Schaefli, B., & Gupta, H. V. (2007). Do Nash values have value? Hydrological Processes, 21(15), 2075–2080. https://doi.org/10.1002/hyp.6825

hydrobm.utils.bme_kge(q_obs, q_sim, q_bm, cal_mask, val_mask=None)[source]

Calculate KGE-based Benchmark Model Efficiency (KGE skill score) for cal and val periods. This skill score formulation can be found in Knoben et al. (2019) among others.

KGE_skill = (KGE_model - KGE_benchmark) / (1 - KGE_benchmark)

Parameters:
q_obspandas Series

Observed streamflow.

q_simpandas Series

Simulated streamflow.

q_bmpandas Series

Benchmark streamflow.

cal_maskpandas Series

Boolean mask for the calibration period.

val_maskpandas Series, optional

Boolean mask for the validation period. Default is None (no val score returned).

Returns:
cal_scorefloat

KGE skill score for the calibration period.

val_scorefloat

KGE skill score for the validation period. NaN if no val_mask specified.

References

Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323–4331. https://doi.org/10.5194/hess-23-4323-2019

References

Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323–4331. https://doi.org/10.5194/hess-23-4323-2019

Schaefli, B., & Gupta, H. V. (2007). Do Nash values have value? Hydrological Processes, 21(15), 2075–2080. https://doi.org/10.1002/hyp.6825

Seibert, J. (2001). On the need for benchmarks in hydrological modelling. Hydrological Processes, 15(6), 1063–1064. https://doi.org/10.1002/hyp.446