Roostoo

Train a RL trading agent live – then battle-test it.

This is a RL agent backtesting replay. Press ▶ Play to initialize a fresh PPO trading agent and train it inside a market sandbox. The agent collects market observations, takes actions, receives rewards, updates its policy, and repeats. Each episode becomes part of its learning record, while the reward curve shows whether the agent is actually improving.

This is the first step in the Roostoo pipeline: train the agent, evaluate its behavior, then graduate it into live-market competitions.

Agent Factory — configure your trading agent DNA before launch

Choose the market signals your agent can observe, the objective it tries to optimize, and the risk limits it must adhere. Then launch it into training and watch how its configuration shapes the strategy it learns.
Agent DNA Templates

Features · which inputs the agent can see

Reward function · what the agent maximizes

Risk profile · cap on position size

Max position100%
Episode
0
Episode reward
Best 50-episode average
▲▼Profitable round trip ▲▼Losing round trip Position still open ▲ buy · ▼ sell · color = outcome
What the policy wants to do
For the current bar at the playhead — updates every step as the state changes.
SELL
33%
HOLD
33%
BUY
33%
What the agent sees
The 17 features fed into the policy network this bar — grouped by source.
Portfolio this episode
Position and profit so far in the current 200-bar rollout.
PositionFLAT
Portfolio value$10,000
Episode profit+0.00%

The PPO update loop — live values from the algorithm

Every box updates with live numbers from the algorithm. One episode = a full pass through the loop: 200 rollout steps (boxes 1–3 fire 200 times) → 1 update phase running K=5 epochs of clipped-surrogate gradient descent (box 4). The center counter shows your position in both — outer episode count, inner rollout step.

1SEE
Observe state
waiting…
2ACT
Sample an action
waiting…
0
episode
0/200
step in this loop
4UPDATE
Clipped policy update
waiting…
3REWARD
Reward & advantage
waiting…
PPO clipped surrogate objective · Schulman et al. 2017, Eq. 7
LCLIP(θ) = 𝔼t [ min( rt(θ) · Ât, clip( rt(θ), 1−ε, 1+ε ) · Ât ) ]   ε = 0.2