Skip to content

By AFML Chapter

Information-driven bars replace fixed-time sampling with activity-based aggregation, producing returns that are closer to IID normal. CUSUM and z-score filters extract structurally meaningful events from the bar stream.

  • data_structures — bar construction (dollar, volume, tick, imbalance, run bars). Entry point for raw trade data.
  • filters — CUSUM and z-score event filters
  • etf_trick — synthetic ETF and futures roll utilities

Triple-barrier labeling converts filtered events into ML labels with controlled profit-taking, stop-loss, and holding-period barriers. Meta-labeling separates direction from sizing.

  • labeling — triple-barrier and meta-labeling. Entry point for label generation.
  • bet_sizing — probability-to-position-size conversion

Overlapping labels violate IID assumptions. Sequential bootstrap and uniqueness weighting correct for this by measuring and accounting for label overlap.

  • sampling — indicator matrix and sequential bootstrap. Entry point for overlap-aware sampling.
  • sample_weights — uniqueness and time-decay weighting
  • sb_bagging — sequentially bootstrapped bagging ensembles

Fractional differencing finds the minimum transformation order that achieves stationarity while preserving predictive long-memory in price series.

  • fracdiff — FFD and standard fractional differencing. Entry point.

Bias-variance decomposition and bagging diagnostics determine whether bagging or boosting improves ensemble quality under financial label structure.

  • ensemble_methods — bias/variance diagnostics, aggregation, bagging-vs-boosting recommendation

Standard k-fold CV leaks information through overlapping labels. Purged k-fold with embargo removes this leakage source.

Multiple importance methods (MDI, MDA, SFI) are needed to detect substitution effects and unstable features before deploying models.

Tuning must use purged CV to avoid leakage-inflated scores. Randomized search is preferred for large parameter spaces.

Chapters 10-12: Position Sizing and Robust Backtesting

Section titled “Chapters 10-12: Position Sizing and Robust Backtesting”

Backtesting is a scenario sanity check, not a performance estimator. CPCV provides path distributions instead of point estimates.

  • bet_sizing — dynamic and reserve sizing for execution
  • backtesting_engine — walk-forward, purged CV, and CPCV. Entry point for validation.

Selecting trading rules on a single historical path overfits. Synthetic path ensembles from calibrated O-U processes test rule robustness.

Chapters 14-15: Diagnostics and Strategy Risk

Section titled “Chapters 14-15: Diagnostics and Strategy Risk”

Strategy risk (probability of failing a Sharpe target) is distinct from portfolio risk (VaR, ES, drawdown).

Hierarchical methods (HRP, HCAA) avoid covariance inversion fragility. CLA solves constrained mean-variance problems exactly.

  • hrp — Hierarchical Risk Parity
  • hcaa — Hierarchical Clustering Asset Allocation
  • onc — Optimal Number of Clusters
  • portfolio_optimization — mean-variance allocators (min-vol, max-Sharpe, efficient risk)
  • cla — Critical Line Algorithm

Chapters 17-19: Microstructure, Dependence, and Regime Detection

Section titled “Chapters 17-19: Microstructure, Dependence, and Regime Detection”

Microstructure features capture liquidity and order-flow dynamics invisible in OHLC bars. Structural break detection flags regime changes that invalidate model assumptions.

Atom/molecule parallelism scales independent computations. Streaming analytics maintain bounded-memory indicators for real-time early warning.

See the full per-module detail in the Module Reference Index.