Skip to content

pipeline

The pipeline module orchestrates the full AFML research workflow in a single function call. It chains: CUSUM event detection → triple-barrier labeling → bet sizing → portfolio allocation → risk metrics → backtest statistics. Each stage passes its output to the next, and built-in leakage checks verify that inputs are aligned, events are chronologically ordered, and no forward-looking bias is present.

This is designed for rapid research iteration — change a parameter, re-run the pipeline, and compare the summary table. The _frames variant enriches output with Polars DataFrames for each stage, making notebook exploration ergonomic.

Use this when you want to run a complete AFML workflow without manually chaining individual modules. It’s the fastest path from “I have prices and a model” to “I have a backtested strategy with risk metrics.”

Prerequisites: Timestamps, close prices, model probability forecasts, and multi-asset price matrix.

Alternatives: Call individual modules (filters, labeling, bet_sizing, etc.) for more control over each stage.

ParameterTypeDescriptionDefault
cusum_thresholdfloatCUSUM event filter threshold0.001
num_classesintNumber of label classes for bet sizing2
step_sizefloatBet size discretization step0.1
risk_free_ratefloatRisk-free rate for Sharpe calculations0.0
confidence_levelfloatConfidence level for VaR/ES0.05
from openquant.pipeline import run_mid_frequency_pipeline_frames, summarize_pipeline
out = run_mid_frequency_pipeline_frames(
timestamps=timestamps,
close=close,
model_probabilities=probabilities,
asset_prices=asset_prices,
model_sides=sides,
asset_names=["CL", "NG", "RB", "GC"],
cusum_threshold=0.001,
)
# Polars DataFrames for each stage
signals_df = out["frames"]["signals"]
backtest_df = out["frames"]["backtest"]
weights_df = out["frames"]["weights"]
# One-row summary with key metrics
summary = summarize_pipeline(out)
print(summary)
# portfolio_sharpe | realized_sharpe | value_at_risk | has_forward_look_bias
  • Not checking leakage_checks in the output — the pipeline flags forward-look bias but doesn’t stop execution.
  • Using the raw dict output when DataFrames are more convenient — prefer run_mid_frequency_pipeline_frames.
  • pipeline.run_mid_frequency_pipeline
  • pipeline.run_mid_frequency_pipeline_frames
  • pipeline.summarize_pipeline
  • run_mid_frequency_pipeline
  • run_mid_frequency_pipeline_frames
  • summarize_pipeline
  • The pipeline enforces input alignment and event ordering as leakage guards.
  • run_mid_frequency_pipeline_frames adds Polars DataFrames to the raw dict output.
  • summarize_pipeline extracts key metrics into a single-row DataFrame for notebook display.