sample_weights
Concept Overview
Section titled “Concept Overview”In AFML’s event-driven framework (Chapter 4), labels are derived from overlapping price paths. When two events overlap in time, their labels share information — the price observations that determine event A’s outcome also influence event B’s outcome. Treating these labels as independent samples inflates effective sample size and biases model training.
Uniqueness-based weighting addresses this by computing how unique each sample is at each time step. If a bar contributes to 3 concurrent events, each event gets 1/3 credit for that bar. The total weight of a sample is the sum of its per-bar uniqueness scores. Samples that overlap with many others get down-weighted; isolated samples get full weight.
Return-attribution weighting weights samples by their absolute return, giving more training influence to economically significant events.
Time-decay weighting applies a power-law decay so recent observations contribute more than older ones, useful when the data-generating process evolves over time.
These weights should be passed as sample_weight to your classifier or loss function.
When to Use
Section titled “When to Use”Apply sample weights after labeling and before model training. They correct for the non-IID structure caused by overlapping triple-barrier labels.
Prerequisites: Labeled events from the labeling module, with event start/end times.
Alternatives: Equal weights (ignores overlap, biases toward dense clusters), or sequential bootstrap (sampling-based approach instead of weighting).
Mathematical Foundations
Section titled “Mathematical Foundations”Uniqueness Weight
Section titled “Uniqueness Weight”
Time Decay
Section titled “Time Decay”
Key Parameters
Section titled “Key Parameters”| Parameter | Type | Description | Default |
|---|---|---|---|
delta | f64 | Time-decay exponent; 0 = uniform, 1 = linear decay, >1 = aggressive recency bias | 1.0 |
Usage Examples
Section titled “Usage Examples”Python
Section titled “Python”Compute sample weights for overlapping labels
Section titled “Compute sample weights for overlapping labels”from openquant._core import sample_weights
# Returns from labeled events (used for return-attribution weighting)returns = [0.01, -0.005, 0.007, -0.002, 0.003, 0.01, -0.008]
# Weight by absolute return (higher-impact events get more weight)w_return = sample_weights.get_weights_by_return(returns)
# Weight by time decay (more recent events weighted higher, delta=0.5)w_decay = sample_weights.get_weights_by_time_decay(returns, 0.5)
# Use these weights in model training:# model.fit(X, y, sample_weight=w_return)Compute event weights
Section titled “Compute event weights”use openquant::sample_weights::get_weights_by_time_decay;
let w = get_weights_by_time_decay(&returns, 0.5);Common Pitfalls
Section titled “Common Pitfalls”- Training without any overlap correction — highly overlapping labels effectively duplicate data and overfit the dense-event regime.
- Using uniqueness weights without the indicator matrix from the sampling module — the weights require knowledge of which bars each event spans.
- Combining time-decay and uniqueness weights incorrectly — multiply them element-wise, don’t add.
API Reference
Section titled “API Reference”Python API
Section titled “Python API”sample_weights.get_weights_by_returnsample_weights.get_weights_by_time_decay
Rust API
Section titled “Rust API”get_weights_by_returnget_weights_by_time_decay
Implementation Notes
Section titled “Implementation Notes”- Pair with sequential bootstrap for robust label sampling.
- Time-decay controls recency bias explicitly.