Skip to content

data_structures

Traditional financial data uses fixed-time bars (1-minute, daily), but these sample uniformly regardless of market activity. During quiet periods you get noise; during volatile periods you under-sample important information.

Information-driven bars (AFML Chapter 2) sample based on market activity instead of clock time. Dollar bars trigger a new bar when cumulative traded dollar volume reaches a threshold, producing roughly equal-information observations. Volume bars trigger on cumulative share volume. Tick bars trigger on trade count.

Imbalance bars go further: they detect when the net signed trade flow (buy minus sell) exceeds its expected magnitude, capturing points where informed trading pressure shifts. Run bars detect runs of same-signed trades exceeding expectations.

The key insight is that information-driven bars produce returns that are closer to IID normal, which makes downstream ML models (labeling, feature importance, cross-validation) better behaved. All AFML workflows assume information-driven bars as input.

This is the first module in any AFML pipeline. Raw tick or trade data goes in; structured OHLCV bars come out. Everything downstream — labeling, features, sampling — consumes these bars.

Prerequisites: Raw trade or tick data with timestamps, prices, and volumes.

Alternatives: Standard time bars if your data is already aggregated. For pre-aggregated OHLCV data, use the data module’s load_ohlcv and clean_ohlcv functions instead.

i=t0tpiviθ\sum_{i=t_0}^{t} p_i v_i \ge \theta

biE[bi]\left|\sum b_i\right| \ge E[|\sum b_i|]

ParameterTypeDescriptionDefault
dollar_value_per_barfloatDollar notional threshold for dollar bars (Python)5_000_000.0
volume_per_barfloatCumulative volume threshold for volume bars (Python)100_000.0
ticks_per_barintTrade count threshold for tick bars (Python)50
intervalstrTime interval for time bars, e.g. ‘1d’, ‘5m’, ‘1h’ (Python)‘1d’
thresholdf64Bar trigger threshold for standard_bars, run_bars, imbalance_bars (Rust)
bar_typeStandardBarTypeTick, Volume, or Dollar — selects accumulation metric (Rust)
from openquant.bars import build_dollar_bars, bar_diagnostics
import polars as pl
# Input: Polars DataFrame with ts, symbol, open, high, low, close, volume columns
df = pl.read_parquet("trades.parquet")
# Dollar bars: each bar aggregates ~$5M of notional
bars = build_dollar_bars(df, dollar_value_per_bar=5_000_000.0)
# Returns: Polars DataFrame with ts, symbol, open, high, low, close, volume, adj_close, start_ts, n_obs, dollar_value
# Check bar quality: low autocorrelation = good
diag = bar_diagnostics(bars)
print(diag) # {"n_bars": 482.0, "lag1_return_autocorr": -0.02, ...}
from openquant.bars import build_tick_bars, build_volume_bars, build_time_bars
tick_bars = build_tick_bars(df, ticks_per_bar=50)
vol_bars = build_volume_bars(df, volume_per_bar=100_000.0)
time_bars = build_time_bars(df, interval="5m")
use chrono::Duration;
use openquant::data_structures::{
standard_bars, time_bars, run_bars, imbalance_bars,
Trade, StandardBarType, ImbalanceBarType,
};
let trades: Vec<Trade> = vec![];
// Fixed-time bars
let t_bars = time_bars(&trades, Duration::minutes(5));
// Dollar bars via standard_bars
let d_bars = standard_bars(&trades, 50_000.0, StandardBarType::Dollar);
// Run bars
let r_bars = run_bars(&trades, 100);
// Tick imbalance bars
let ib = imbalance_bars(&trades, 500.0, ImbalanceBarType::Tick);
  • Using time bars when your data has highly variable activity — dollar or volume bars will produce more stationary returns.
  • Setting the threshold too low, creating extremely noisy high-frequency bars, or too high, losing intraday resolution.
  • Forgetting to assign trade direction (buy/sell sign) before constructing imbalance or run bars — these require signed volume.
  • Mixing bar types across train and inference: if you train on dollar bars, your live pipeline must also use dollar bars with the same threshold.
  • Run bars and imbalance bars are available in Python via bars.build_run_bars and bars.build_imbalance_bars.
  • bars.build_time_bars
  • bars.build_tick_bars
  • bars.build_volume_bars
  • bars.build_dollar_bars
  • bars.build_run_bars
  • bars.build_imbalance_bars
  • bars.bar_diagnostics
  • standard_bars
  • time_bars
  • run_bars
  • imbalance_bars
  • Trade
  • StandardBar
  • StandardBarType
  • ImbalanceBarType
  • Threshold selection controls bar frequency and noise level.
  • Keep OHLCV semantics consistent across downstream features.
  • Run bars and imbalance bars are available via bars.build_run_bars and bars.build_imbalance_bars.
  • bar_diagnostics is Python-only; use it to verify low return autocorrelation after bar construction.