Skip to content

sample_weights

In AFML’s event-driven framework (Chapter 4), labels are derived from overlapping price paths. When two events overlap in time, their labels share information — the price observations that determine event A’s outcome also influence event B’s outcome. Treating these labels as independent samples inflates effective sample size and biases model training.

Uniqueness-based weighting addresses this by computing how unique each sample is at each time step. If a bar contributes to 3 concurrent events, each event gets 1/3 credit for that bar. The total weight of a sample is the sum of its per-bar uniqueness scores. Samples that overlap with many others get down-weighted; isolated samples get full weight.

Return-attribution weighting weights samples by their absolute return, giving more training influence to economically significant events.

Time-decay weighting applies a power-law decay so recent observations contribute more than older ones, useful when the data-generating process evolves over time.

These weights should be passed as sample_weight to your classifier or loss function.

Apply sample weights after labeling and before model training. They correct for the non-IID structure caused by overlapping triple-barrier labels.

Prerequisites: Labeled events from the labeling module, with event start/end times.

Alternatives: Equal weights (ignores overlap, biases toward dense clusters), or sequential bootstrap (sampling-based approach instead of weighting).

wi=tIt,ijIt,jw_i=\sum_t\frac{I_{t,i}}{\sum_j I_{t,j}}

wi=(iT)δw_i=(\frac{i}{T})^\delta

ParameterTypeDescriptionDefault
deltaf64Time-decay exponent; 0 = uniform, 1 = linear decay, >1 = aggressive recency bias1.0

Compute sample weights for overlapping labels

Section titled “Compute sample weights for overlapping labels”
from openquant._core import sample_weights
# Returns from labeled events (used for return-attribution weighting)
returns = [0.01, -0.005, 0.007, -0.002, 0.003, 0.01, -0.008]
# Weight by absolute return (higher-impact events get more weight)
w_return = sample_weights.get_weights_by_return(returns)
# Weight by time decay (more recent events weighted higher, delta=0.5)
w_decay = sample_weights.get_weights_by_time_decay(returns, 0.5)
# Use these weights in model training:
# model.fit(X, y, sample_weight=w_return)
use openquant::sample_weights::get_weights_by_time_decay;
let w = get_weights_by_time_decay(&returns, 0.5);
  • Training without any overlap correction — highly overlapping labels effectively duplicate data and overfit the dense-event regime.
  • Using uniqueness weights without the indicator matrix from the sampling module — the weights require knowledge of which bars each event spans.
  • Combining time-decay and uniqueness weights incorrectly — multiply them element-wise, don’t add.
  • sample_weights.get_weights_by_return
  • sample_weights.get_weights_by_time_decay
  • get_weights_by_return
  • get_weights_by_time_decay
  • Pair with sequential bootstrap for robust label sampling.
  • Time-decay controls recency bias explicitly.