pipeline
Concept Overview
Section titled “Concept Overview”The pipeline module orchestrates the full AFML research workflow in a single function call. It chains: CUSUM event detection → triple-barrier labeling → bet sizing → portfolio allocation → risk metrics → backtest statistics. Each stage passes its output to the next, and built-in leakage checks verify that inputs are aligned, events are chronologically ordered, and no forward-looking bias is present.
This is designed for rapid research iteration — change a parameter, re-run the pipeline, and compare the summary table. The _frames variant enriches output with Polars DataFrames for each stage, making notebook exploration ergonomic.
When to Use
Section titled “When to Use”Use this when you want to run a complete AFML workflow without manually chaining individual modules. It’s the fastest path from “I have prices and a model” to “I have a backtested strategy with risk metrics.”
Prerequisites: Timestamps, close prices, model probability forecasts, and multi-asset price matrix.
Alternatives: Call individual modules (filters, labeling, bet_sizing, etc.) for more control over each stage.
Key Parameters
Section titled “Key Parameters”| Parameter | Type | Description | Default |
|---|---|---|---|
cusum_threshold | float | CUSUM event filter threshold | 0.001 |
num_classes | int | Number of label classes for bet sizing | 2 |
step_size | float | Bet size discretization step | 0.1 |
risk_free_rate | float | Risk-free rate for Sharpe calculations | 0.0 |
confidence_level | float | Confidence level for VaR/ES | 0.05 |
Usage Examples
Section titled “Usage Examples”Python
Section titled “Python”Run a complete research pipeline
Section titled “Run a complete research pipeline”from openquant.pipeline import run_mid_frequency_pipeline_frames, summarize_pipeline
out = run_mid_frequency_pipeline_frames( timestamps=timestamps, close=close, model_probabilities=probabilities, asset_prices=asset_prices, model_sides=sides, asset_names=["CL", "NG", "RB", "GC"], cusum_threshold=0.001,)
# Polars DataFrames for each stagesignals_df = out["frames"]["signals"]backtest_df = out["frames"]["backtest"]weights_df = out["frames"]["weights"]
# One-row summary with key metricssummary = summarize_pipeline(out)print(summary)# portfolio_sharpe | realized_sharpe | value_at_risk | has_forward_look_biasCommon Pitfalls
Section titled “Common Pitfalls”- Not checking leakage_checks in the output — the pipeline flags forward-look bias but doesn’t stop execution.
- Using the raw dict output when DataFrames are more convenient — prefer run_mid_frequency_pipeline_frames.
API Reference
Section titled “API Reference”Python API
Section titled “Python API”pipeline.run_mid_frequency_pipelinepipeline.run_mid_frequency_pipeline_framespipeline.summarize_pipeline
Key Functions
Section titled “Key Functions”run_mid_frequency_pipelinerun_mid_frequency_pipeline_framessummarize_pipeline
Implementation Notes
Section titled “Implementation Notes”- The pipeline enforces input alignment and event ordering as leakage guards.
- run_mid_frequency_pipeline_frames adds Polars DataFrames to the raw dict output.
- summarize_pipeline extracts key metrics into a single-row DataFrame for notebook display.