A trailing-stop that ratchets.
In plain English — This is a new auto-pilot rule for a trading bot. When a trade is winning, the bot keeps a “safety net” price below it: if the market falls back to that line, the trade closes and locks in the gains. This feature makes the net smarter — as the price climbs past milestones, the net follows it upward so more profit is protected, and at each milestone the bot can optionally cash out a slice of the position.
The LGTM loop.
Every PR ran the same gate: claude-code-action@v1 with Opus 4.8 on xhigh, reviewing locally — building, testing, verifying claims against the tree, not just reading the diff.
Review. The action posts a verdict on the PR.
Not LGTM? Send it back to the model's harness to fix every finding.
Re-review. The action re-checks the updated PR.
Repeat until LGTM.
How long it took.
Wall-clock split into the first pass and the fix pass needed to reach LGTM; the bar shows total time. The spread is roughly 10× end-to-end.
| Model | 1st pass | 2nd pass | Total time |
|---|---|---|---|
| Composer 2.5 | 4.5 min | 1 min | 5.5 min |
| GPT 5.5 | 14 min | 8 min | 22 min |
| Opus 4.8 high | 38 min | 4 min | 42 min |
| Opus 4.8 max | 54.5 min | — | 54.5 min |
The token bill.
Total spend per PR — every pass combined, not just the first draft. Units differ by provider, so these aren't directly comparable.
| Model | Total token / quota cost |
|---|---|
| Composer 2.5 | ~2% Cursor Pro plan Composer monthly usage · ~75% of 200k context |
| GPT 5.5 | 77% of Plus plan 5-hr window |
| Opus 4.8 high | 32% of 1M ctx + ~245k subagent tokens |
| Opus 4.8 max | 45% of 1M ctx + ~416k subagent tokens |
How many tries to pass?
Re-reviews needed before reaching LGTM — and how much code each touched. One model cleared review on the first pass with zero blocking findings.
| Model | Re-reviews to LGTM | First-pass verdict | Files | Net lines |
|---|---|---|---|---|
| Opus 4.8 max #858 | 0 | LGTM + 4 optional | 15 | +1408 / −4 |
| Composer 2.5 #856 | 1 | 3 blocking + 2 optional | 15 | +1121 / −6 |
| GPT 5.5 #857 | 1 | 3 findings + 1 nit | 19 | +1544 / −11 |
| Opus 4.8 high #859 | 1 | 2 findings + 1 optional | 15 | +1093 / −12 |
But which one was
actually best?
Cost and review rounds tell you about effort. They don't tell you if the code is right. We scored all four against the 11 acceptance criteria — verifying every claim against the codebase.
Counting down, from fourth place…
Most reliable generator
The only PR to reach LGTM with zero fix cycles — no blocking findings on the first review. If you want it right the first time, this was it.
Best cost-to-quality
A correct, manual-supporting draft in 4.5 min at ~2% quota — an order of magnitude cheaper than the field. The value pick for a fast first draft to harden.
GPT 5.5
The only PR that was simultaneously feature-complete against the literal criteria (including manual support), fully tested with a verified-green backtester regression, and accurately described. Its only deductions were cosmetic.
The asterisk: it was also the most quota-hungry to produce — 77% of the 5-hour Plus window in a single attempt. Best result, steepest bill. If manual can wait for a follow-up, Opus 4.8 Max (91) is the co-best merge candidate and the more dependable one-shot.
All four, at a glance.
The winner became the base.
The bake-off wasn't academic. Issue #844 shipped to main as PR #860 — a best-of-4 synthesis built on the winner, GPT-5's #857, as its foundation, with the strongest parts of the other three folded in. The individual contestant PRs were closed; the winning implementation is what's now running.