Machine learning for markets

Unsupervised learning and clustering

3 min

Unsupervised learning has no labels — no 'right answer' to predict. Instead it finds structure hidden in the data. In markets it is used less for direct prediction and more for understanding and organising.

Clustering

Clustering groups similar items together. Algorithms like k-means or hierarchical clustering can:

  • Group stocks that behave alike into data-driven 'sectors' that may differ from the official classification.
  • Identify market regimes — clusters of days that share volatility and correlation characteristics, so a strategy can adapt to which regime it is in.

Dimensionality reduction

Markets generate enormous numbers of correlated variables. Principal Component Analysis (PCA) compresses them into a handful of independent factors that explain most of the variation. Applied to a yield curve, for instance, PCA famously recovers three intuitive factors: level, slope and curvature.

This matters because feeding hundreds of redundant, correlated features into a model invites overfitting. Reducing them to a few meaningful components is both cleaner and more robust.

The honest caveat

Unsupervised results are interpretations, not truths. A clustering algorithm will always return clusters even in pure noise; the clusters are only as meaningful as the data and the human reading them. Used well, unsupervised learning is a microscope for structure. Used carelessly, it manufactures patterns that are not there.

Finished reading?
Risk disclaimer

This content is for educational and informational purposes only and is not investment, financial, tax or legal advice. Trading and investing carry risk, including the possible loss of capital. Any performance shown by third-party tools is hypothetical and not a promise of future results. Do your own research and consider professional advice before making any decision.