Machine learning for markets
Reinforcement learning
3 min
Reinforcement learning (RL) is a different paradigm again. There are no labelled examples — instead an agent takes actions in an environment, receives rewards, and learns a policy that maximises cumulative reward over time.
Why it is appealing for trading
Trading is naturally sequential: each decision changes your position, affects the market slightly, and shapes future opportunities. RL frames the whole problem the way a trader actually faces it:
- State — current prices, your position, volatility, time of day.
- Action — buy, sell, hold, or how much to size.
- Reward — profit and loss, ideally penalised for risk and transaction costs.
RL is used successfully in trade execution (slicing a large order to minimise market impact) and in dynamic position sizing.
Why it is brutally hard for alpha
- Reward is noisy. A good decision can lose money and a bad one can win, so the learning signal is drowned in randomness.
- The environment is non-stationary. The 'rules' the agent learns shift as markets change, unlike a board game with fixed rules.
- Backtesting RL is treacherous. The agent's actions would have moved real markets, but a historical replay assumes they did not — flattering the results.
RL is powerful where the environment is well-defined and the reward is clean (execution). For predicting the market itself, it is a research frontier, not a reliable production tool — treat any claim otherwise with deep suspicion.
This content is for educational and informational purposes only and is not investment, financial, tax or legal advice. Trading and investing carry risk, including the possible loss of capital. Any performance shown by third-party tools is hypothetical and not a promise of future results. Do your own research and consider professional advice before making any decision.