Codex This retry backoff design is ridiculous: after a 403, the more you retry, the more it looks like botting

Recently, while using Codex CLI against a not-so-stable upstream, I ran into a pretty funny—but very real—problem.

Codex has now implemented streaming reconnect with a hard-coded exponential backoff. The first few times look normal, but later it balloons to something absurd:

  • The 1st time is about 0.2 seconds
  • The 5th time is about 3.2 seconds
  • By the 10th time it’s already over 1 minute
  • After that it can even grow to once every ten-plus minutes, tens of minutes per attempt

The problem is that this design implicitly assumes: “the longer it keeps failing, the slower you should retry.” But in reality, many upstreams don’t work that way:

  • The gateway occasionally hiccups
  • Backend routing is unstable
  • Some OpenAI-compatible relays briefly return 403 / quota state not yet refreshed
  • In practice, if you try a few more times, it often recovers quickly

In other words, what we actually need is:

  • Let users decide the retry frequency themselves
  • At minimum, allow fixed-interval retries, e.g., once every 500ms
  • Instead of being held hostage by a hard-coded exponential backoff

What’s even more ridiculous is that Codex currently gives users stream_max_retries, but provides no way to configure the retry delay or backoff strategy. This leads to the situation where:
you can set the count to 100, but after the 10th attempt, each wait starts getting unreasonably long—completely deviating from the “try a few more times and it’ll work” scenario.

I’ve already raised this to upstream:

I feel the core issue isn’t just that the parameters were chosen poorly, but that the design is too presumptuous and doesn’t give users the freedom to choose.

If TOML could explicitly support a configuration like the following, it would at least return the choice to users:

[model_providers.custom]
stream_max_retries = 100
stream_retry_delay_ms = 500
stream_retry_backoff = "fixed"

This kind of need is actually very common:
“This upstream flakes out a lot, but a few quick consecutive retries usually bring it back—please don’t automatically stretch it into once every tens of minutes.”