ForecastingStocks

Reinforcement learning (RL) is a different paradigm again. There are no labelled examples — instead an agent takes actions in an environment, receives rewards, and learns a policy that maximises cumulative reward over time.

Why it is appealing for trading

Trading is naturally sequential: each decision changes your position, affects the market slightly, and shapes future opportunities. RL frames the whole problem the way a trader actually faces it:

State — current prices, your position, volatility, time of day.
Action — buy, sell, hold, or how much to size.
Reward — profit and loss, ideally penalised for risk and transaction costs.

RL is used successfully in trade execution (slicing a large order to minimise market impact) and in dynamic position sizing.

Why it is brutally hard for alpha

Reward is noisy. A good decision can lose money and a bad one can win, so the learning signal is drowned in randomness.
The environment is non-stationary. The 'rules' the agent learns shift as markets change, unlike a board game with fixed rules.
Backtesting RL is treacherous. The agent's actions would have moved real markets, but a historical replay assumes they did not — flattering the results.

RL is powerful where the environment is well-defined and the reward is clean (execution). For predicting the market itself, it is a research frontier, not a reliable production tool — treat any claim otherwise with deep suspicion.