Usage
Start by importing Benchmarks for Hydrologic Timeseries.
import hydrobm
Main calculation function
HydroBM provides a main function to calculate the benchmark timeseries. This is a catch-all function that lets you set up a complete benchmarking exercise for a given time series of observed streamflow (and optionally other variables, depending on the selected benchmarks). Functions are accessible outside of this main function too for more granular setups.
- hydrobm.calculate.calc_bm(data, cal_mask, val_mask=[], precipitation='precipitation', streamflow='streamflow', benchmarks=['daily_mean_flow'], metrics=['rmse'], optimization_method='brute_force', calc_snowmelt=False, temperature='temperature', snowmelt_threshold=0.0, snowmelt_rate=3.0)[source]
Calculate benchmark model scores for a given set of benchmark models and metrics.
- Parameters:
- datapandas DataFrame or xarray Dataset
Input data containing precipitation and streamflow columns.
- cal_maskpandas Series
Boolean mask for the calculation period.
- val_maskpandas Series, optional
Boolean mask for the validation period. Default is [] (no validation scores returned).
- precipitationstr, optional
Name of the precipitation column in the input data. Default is ‘precipitation’.
- streamflowstr, optional
Name of the streamflow column in the input data. Default is ‘streamflow’.
- benchmarkslist, optional
List of benchmark models to calculate. Default is [‘daily_mean_flow’].
- metricslist, optional
List of metrics to calculate. Default is [‘rmse’].
- optimization_methodstr, optional
Optimization method to use for benchmark model calibration. Default is ‘brute_force’.
- calc_snowmeltbool, optional
Flag to run a basic snow accumulation and melt model. Default is False.
- temperaturestr, optional
Name of the temperature column in the input data. Default is ‘temperature’.
- snowmelt_thresholdfloat, optional
Threshold temperature for snowmelt calculation. Default is 0.0 [C].
- Returns:
- benchmark_flowspandas DataFrame
DataFrame containing benchmark flows for each benchmark model.
- metricsdict
Dictionary containing metric scores for each benchmark model.
Benchmark Efficiency (BME) function
HydroBM also provides a function to calculate skill scores termed benchmark efficiencies (BME) (Schaefli & Gupta, 2007) between hydrological model simulations and benchmark timeseries. This function supports the Schaefli and Gupta (2007) and Siebert (2001) formulation of the BME skill score, as well as a skill score formulation of the KGE (Knoben et al. 2019). This function is functionally identical to calc_bm, but also requires simulated streamflow and the desired formulation of the BME.
- hydrobm.calculate.calc_bme(data, cal_mask, val_mask=[], precipitation='precipitation', streamflow='streamflow', simulated_flow='simulated_flow', benchmarks=['daily_mean_flow'], metrics=['rmse'], optimization_method='brute_force', formulation='bme_nse', calc_snowmelt=False, temperature='temperature', snowmelt_threshold=0.0, snowmelt_rate=3.0)[source]
Calculate Benchmark Efficiency (BME) scores alongside standard metric scores.
- Parameters:
- datapandas DataFrame or xarray Dataset
Input data containing precipitation, streamflow, and simulated flow columns.
- cal_maskpandas Series
Boolean mask for the calibration period.
- val_maskpandas Series, optional
Boolean mask for the validation period. Default is [] (no validation scores returned).
- precipitationstr, optional
Name of the precipitation column in the input data. Default is ‘precipitation’.
- streamflowstr, optional
Name of the streamflow column in the input data. Default is ‘streamflow’.
- simulated_flowstr, optional
Name of the simulated flow column in the input data. Default is ‘simulated_flow’.
- benchmarkslist, optional
List of benchmark models to calculate. Default is [‘daily_mean_flow’].
- metricslist, optional
List of metrics to calculate via calc_bm. Default is [‘rmse’].
- optimization_methodstr, optional
Optimization method for benchmark model calibration. Default is ‘brute_force’.
- formulationstr, optional
BME formulation. Options: - ‘bme_nse’ (default): BME = 1 - sum((q_obs-q_sim)^2) / sum((q_obs-q_b)^2) - ‘bme_kge’: BME = (KGE_model - KGE_benchmark) / (1 - KGE_benchmark)
- calc_snowmeltbool, optional
Flag to run a basic snow accumulation and melt model. Default is False.
- temperaturestr, optional
Name of the temperature column in the input data. Default is ‘temperature’.
- snowmelt_thresholdfloat, optional
Threshold temperature for snowmelt. Default is 0.0 [C].
- snowmelt_ratefloat, optional
Rate of snowmelt. Default is 3.0.
- Returns:
- bme_scoresdict
Dictionary of BME scores for each benchmark.
- benchmark_flowspandas DataFrame
DataFrame containing benchmark flows for each benchmark model (from calc_bm).
- resultsdict
Dictionary of standard metric scores for each benchmark (from calc_bm).
Benchmarks
Within their respective category, benchmarks are all set up to require the same inputs. Click on each benchmark in the table for more information.
Benchmarks that rely on streamflow data only
|
Calculate the mean flow over the calculation period and use that as a predictor for all timesteps in the whole dataframe. |
|
Calculate the median flow over the calculation period and use that as a predictor for all timesteps in the whole dataframe. |
|
Calculate the annual mean flow over the calculation period and use that as a predictor for each year in the calculation period. |
Calculate the annual median flow over the calculation period and use that as a predictor for each year in the calculation period. |
|
Calculate the monthly mean flow over the calculation period and use that as a predictor for each month in the whole dataframe. |
|
Calculate the monthly median flow over the calculation period and use that as a predictor for each month in the whole dataframe. |
|
|
Calculate the daily mean flow over the calculation period and use that as a predictor for each day in the whole dataframe. |
Calculate the daily median flow over the calculation period and use that as a predictor for each day in the whole dataframe. |
Benchmarks that rely on precipitation and streamflow
Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff using precipitation totals from the calculation period and non-calculation period respectively. |
|
Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each year in the whole dataframe. |
|
Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each month in the whole dataframe. |
|
Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each day in the whole dataframe. |
|
|
Calculate the long-term rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each timestep in the whole dataframe. |
|
Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each month in the whole dataframe. |
|
Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each day in the whole dataframe. |
|
Calculate the mean monthly rainfall-runoff ratio over the calculation period and use that as a predictor of runoff-from-precipitation for each timestep in the whole dataframe. |
Calculate the daily mean flow scaled by annual precipitation anomalies. |
|
Calculate the daily mean flow scaled by monthly precipitation anomalies. |
|
Calculate the scaled precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe. |
Parsimonious model benchmarks
Baseflow separation using Eckhardt filter to create a mean annual baseflow signal. |
|
Calculate the adjusted precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe. |
|
|
Calculate the adjusted smoothed precipitation benchmark model as a predictor of runoff-from-precipitation for each timestep in the whole dataframe. |
Benchmark support functions
- hydrobm.benchmarks.create_bm(data, benchmark, cal_mask, precipitation='precipitation', streamflow='streamflow', optimization_method='brute_force')[source]
Helper function to call the correct benchmark model function; makes looping over benchmark models easier.
- Parameters:
- datapandas DataFrame
Input data containing precipitation and streamflow columns.
- benchmarkstr
Benchmark model to calculate.
- cal_maskpandas Series
Boolean mask for the calculation period.
- precipitationstr, optional
Name of the precipitation column in the input data. Default is [‘precipitation’].
- streamflowstr, optional
Name of the streamflow column in the input data. Default is [‘streamflow’].
- optimization_methodstr, optional
Optimization method to create adjusted (snoothed) precipitation benchmark. Default is [‘brute_force’].
- Returns:
- bm_values: pandas DataSeries
Benchmark values for the given benchmark model.
- qbmpandas DataFrame
Benchmark flow time series for the given benchmark model.
- hydrobm.benchmarks.evaluate_bm(data, benchmark_flow, metric, cal_mask, val_mask=None, streamflow='streamflow', ignore_nan=True)[source]
Helper function to calculate calculation and evaluation metric scores for a given set of observations and benchmark flows.
- Parameters:
- datapandas DataFrame
Input data containing streamflow observation column.
- benchmark_flowpandas DataFrame
Benchmark flow time series as returned by one of the benchmark model functions.
- metricstr
Name of the metric to calculate. See hydrobm/metrics for a list.
- cal_maskpandas Series
Boolean mask for the calculation period.
- val_maskpandas Series, optional
Boolean mask for the evaluation period. Default is None (no evaluation score returned).
- streamflowstr, optional
Name of the streamflow column in the input data. Default is [‘streamflow’].
- ignore_nanbool, optional
Flag to consider only non-NaN values. Default is True.
- Returns:
- cal_score: float
Metric score for the calculation period.
- val_score: float
Metric score for the evaluation period. NaN if no val_mask specified.
Benchmark optimization functions
Only used by the Eckhardt Baseflow, Adjusted Precipitation Benchmark (APB), and Adjusted Smoothed Precipitation Benchmark (ASPB) to optimize or estimate their respective parameters.
- hydrobm.utils.optimize_apb(scaled_precip, streamflow, method, max_lag=30)[source]
Wrapper function around adjusted precipitation benchmark model optimization functions.
- Parameters:
- scaled_precippandas Series
Scaled precipitation data.
- streamflowpandas Series
Streamflow data.
- methodstr
Optimization method to use. Currently supports “brute_force” and “minimize”.
- max_lagint, optional
Maximum lag to consider. Default is 30.
- Returns:
- best_lagint
Best lag value.
- best_msefloat
Best mean squared error value.
- hydrobm.utils.brute_force_apb(scaled_precip, streamflow, max_lag=30)[source]
Optimize the lag for the adjusted precipitation benchmark model using brute force.
- Parameters:
- scaled_precippandas Series
Scaled precipitation data.
- streamflowpandas Series
Streamflow data.
- max_lagint, optional
Maximum lag to consider. Default is 30.
- Returns:
- best_lagint
Best lag value.
- best_msefloat
Best mean squared error value.
- hydrobm.utils.minimize_scalar_apb(scaled_precip, streamflow, max_lag=30)[source]
Optimize the lag for the adjusted precipitation benchmark model using scipy.optimize.minimize_scalar.
- Parameters:
- scaled_precippandas Series
Scaled precipitation data.
- streamflowpandas Series
Streamflow data.
- max_lagint, optional
Maximum lag to consider. Default is 30.
- Returns:
- best_lagint
Best lag value.
- best_msefloat
Best mean squared error value.
Notes
scipy.optimize.minimize_scalar is not designed for use with integer-only solutions. Here we use the round function to enforce integer solutions. This seems to work for simple test cases, but results for real data may vary. User caution is advised. Use brute force optimization if 100% accurate solutions are required.
- hydrobm.utils.optimize_aspb(scaled_precip, streamflow, method, max_lag=30, max_window=90)[source]
Wrapper function around adjusted smoothed precipitation benchmark model optimization functions.
- Parameters:
- scaled_precippandas Series
Scaled precipitation data.
- streamflowpandas Series
Streamflow data.
- methodstr
Optimization method to use. Currently supports “brute_force” and “minimize”.
- max_lagint, optional
Maximum lag to consider. Default is 30.
- max_window: int, optional
Maximum smoothing window length to consider. Default is 90.
- Returns:
- best_lagint
Best lag value.
- best_window: int
Best window value.
- best_msefloat
Best mean squared error value.
- hydrobm.utils.brute_force_aspb(scaled_precip, streamflow, max_lag=30, max_window=90)[source]
Optimize the lag and window for adjusted smoothed precipitation benchmark model using brute force.
- Parameters:
- scaled_precippandas Series
Scaled precipitation data.
- streamflowpandas Series
Streamflow data.
- max_lagint, optional
Maximum lag to consider. Default is 30.
- max_window: int, optional
Maximum smoothing window length to consider. Default is 90.
- Returns:
- best_lagint
Best lag value.
- best_window: int
Best window value.
- best_msefloat
Best mean squared error value.
- hydrobm.utils.minimize_aspb(scaled_precip, streamflow, max_lag=30, max_window=90, method='Powell')[source]
Optimize the lag and window for the ASPB model using scipy.optimize.minimize.
- Parameters:
- scaled_precippandas Series
Scaled precipitation data.
- streamflowpandas Series
Streamflow data.
- max_lagint, optional
Maximum lag to consider. Default is 30.
- max_window: int, optional
Maximum smoothing window length to consider. Default is 90.
- method: str, optional
Optimization method to use. Default is ‘Powell’. See scipy.optimize.minimize for more options.
- Returns:
- best_lagint
Best lag value.
- best_window: int
Best window value.
- best_msefloat
Best mean squared error value.
Notes
scipy.optimize.minimize is not designed for use with integer-only solutions. Here we use the round function to enforce integer solutions. The ‘Powell’ optimization method seems to return appropriate lag and window values in simple test cases, but results for real data may vary. User caution is advised. Use brute force optimization if 100% accurate solutions are required.
- hydrobm.utils.estimate_eckhardt_parameters(streamflow, precip, precip_window_days=3, precip_threshold=0.1)[source]
Estimate both recession coefficient (k) and maximum baseflow index (BFI_max) which are required for baseflow separation as outlined by Eckhardt (2005).
This function combines recession analysis to estimate k with the backward filter method from Collischonn & Fan (2013) to estimate BFI_max. Automatically detects the timestep from the data and adjusts time window accordingly.
- Parameters:
- streamflowpandas Series
Observed streamflow with DatetimeIndex.
- precippandas Series
Precipitation data with DatetimeIndex.
- precip_window_daysfloat, optional
Number of DAYS to check for precipitation when identifying recessions. Automatically converted to appropriate number of timesteps based on data frequency. Default is 3 days.
- precip_thresholdfloat, optional
Precipitation threshold in same units as precip data (e.g., mm/day or kg/m²/s). Default is 0.1.
- Returns:
- kfloat
Recession coefficient estimated from recession periods (for native timestep).
- BFI_maxfloat
Maximum baseflow index estimated using backward filter method.
References
Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507-515.
Collischonn, W., & Fan, F. M. (2013). Defining parameters for Eckhardt’s digital baseflow filter. Hydrological Processes, 27(18), 2614–2622. https://doi.org/10.1002/hyp.9391
- hydrobm.utils.eckhardt_filter(Q, BFI_max, k)[source]
Eckhardt two-parameter digital filter for baseflow separation.
The Eckhardt filter was found to be the best of 9 evaluated baseflow separation methods in Xie et al. (2020), showing superior performance across diverse catchment conditions.
- Parameters:
- Qpandas Series or numpy array
Streamflow time series.
- BFI_maxfloat
Maximum baseflow index.
- kfloat
Recession constant.
- Returns:
- baseflowpandas Series or numpy array
Separated baseflow component.
References
Eckhardt, K. (2005). How to construct recursive digital filters for baseflow separation. Hydrological Processes, 19(2), 507-515.
Xie, J., Liu, X., Wang, K., Yang, T., Liang, K., & Liu, C. (2020). Evaluation of typical methods for baseflow separation in the contiguous United States. Journal of Hydrology, 583, 124628. https://doi.org/10.1016/j.jhydrol.2020.124628
Metrics
- hydrobm.metrics.mse(obs, sim, ignore_nan=True)[source]
Calculate mean square error.
- Parameters:
- obsarray-like
Observed values.
- simarray-like
Simulated values.
- ignore_nanbool, optional
Flag to consider only non-NaN values. Default is True.
- Returns:
- float
Mean square error.
- hydrobm.metrics.rmse(obs, sim, ignore_nan=True)[source]
Calculate root mean square error.
- Parameters:
- obsarray-like
Observed values.
- simarray-like
Simulated values.
- Returns:
- float
Root mean square error.
Metric support functions
- hydrobm.metrics.calculate_metric(obs, sim, metric, ignore_nan=True)[source]
Helper function to check metric existence and simplify loops.
- Parameters:
- obsarray-like
Observed values.
- simarray-like
Simulated values.
- metric: str
Name of the metric to calculate.
- ignore_nanbool, optional
Flag to consider only non-NaN values. Default is True.
- Returns:
- float
Metric score.
Utilities
- hydrobm.utils.rain_to_melt(data, precipitation='precipitation', temperature='temperature', snow_and_melt_temp=0.0, snow_and_melt_rate=3.0)[source]
Calculate snow accumulation and melt based on temperature thresholds.
- Parameters:
- datapandas DataFrame
Input data containing precipitation and temperature columns.
- precipitationstr, optional
Name of the precipitation column in the input data. Default is ‘precipitation’.
- temperaturestr, optional’
Name of the temperature column in the input data. Default is ‘temperature’.
- snow_and_melt_tempfloat, optional
Temperature threshold for snow accumulation and melt. Default is 0.0 [C].
- snow_and_melt_ratefloat, optional
Snow melt rate if temperature above threshold. Default is 3.0 [mm/timestep/degree C].
- Returns:
- datapandas DataFrame
Input data with additional columns for snow depth and rain plus melt.
Notes
The default values for snow_and_melt_temp and snow_and_melt_rate are given in units of degrees Celsius and millimeters per time step per degree Celsius, respectively. These are not used in the code however, as the function is designed to work with any units.
For example, providing the input data in Kelvin and setting snow_and_melt_temp to 273.15 will work as expected. Similarly, if the input precipitation data is not in millimeters, simply providing the snow_and_melt_rate in those same units will yield the correct output.
- hydrobm.utils.bme_nse(q_obs, q_sim, q_bm, cal_mask, val_mask=None)[source]
Calculate NSE-based Benchmark Efficiency (BME) for cal and val periods. The formulation can be found in Seibert (2001) and Schaefli and Gupta (2007).
BME = 1 - sum((q_obs - q_sim)^2) / sum((q_obs - q_bm)^2)
- Parameters:
- q_obspandas Series
Observed streamflow.
- q_simpandas Series
Simulated streamflow.
- q_bmpandas Series
Benchmark streamflow.
- cal_maskpandas Series
Boolean mask for the calibration period.
- val_maskpandas Series, optional
Boolean mask for the validation period. Default is None (no val score returned).
- Returns:
- cal_scorefloat
NSE-based BME score for the calibration period.
- val_scorefloat
NSE-based BME score for the validation period. NaN if no val_mask specified.
References
Seibert, J. (2001). On the need for benchmarks in hydrological modelling. Hydrological Processes, 15(6), 1063–1064. https://doi.org/10.1002/hyp.446
Schaefli, B., & Gupta, H. V. (2007). Do Nash values have value? Hydrological Processes, 21(15), 2075–2080. https://doi.org/10.1002/hyp.6825
- hydrobm.utils.bme_kge(q_obs, q_sim, q_bm, cal_mask, val_mask=None)[source]
Calculate KGE-based Benchmark Model Efficiency (KGE skill score) for cal and val periods. This skill score formulation can be found in Knoben et al. (2019) among others.
KGE_skill = (KGE_model - KGE_benchmark) / (1 - KGE_benchmark)
- Parameters:
- q_obspandas Series
Observed streamflow.
- q_simpandas Series
Simulated streamflow.
- q_bmpandas Series
Benchmark streamflow.
- cal_maskpandas Series
Boolean mask for the calibration period.
- val_maskpandas Series, optional
Boolean mask for the validation period. Default is None (no val score returned).
- Returns:
- cal_scorefloat
KGE skill score for the calibration period.
- val_scorefloat
KGE skill score for the validation period. NaN if no val_mask specified.
References
Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323–4331. https://doi.org/10.5194/hess-23-4323-2019
References
Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323–4331. https://doi.org/10.5194/hess-23-4323-2019
Schaefli, B., & Gupta, H. V. (2007). Do Nash values have value? Hydrological Processes, 21(15), 2075–2080. https://doi.org/10.1002/hyp.6825
Seibert, J. (2001). On the need for benchmarks in hydrological modelling. Hydrological Processes, 15(6), 1063–1064. https://doi.org/10.1002/hyp.446