Five mistakes we made building a multi-provider gateway
We didn't get the shape of ByteSpike right on the first try. Here are five decisions we walked back after they hit production — each one cheap to write down, each one expensive to live through.
We've been running ByteSpike in some form since 2025 Q4. The current shape — Anthropic-compatible default, dual-protocol shim, failures-don't-bill, per-endpoint rate transparency — looked obvious in retrospect. None of it was obvious going in. These are five turns we took, then walked back, written down so the next builder gets to skip them.
1. We tried to invent a neutral abstraction layer
Early ByteSpike had a custom intermediate request shape — call it `bs_request` — that was meant to look like all providers and like none of them. We rebuilt tool_use, we rebuilt cache_control, we rebuilt thinking blocks. Three months in, every new model launch required mapping work in both directions, and our "neutral" types were just one provider's types with the names changed. Walked back: pick the provider whose API ships fastest, mirror their wire format verbatim, shim everyone else into that. We picked Anthropic Messages.
2. We billed on submit, refunded on failure
First version of the credit ledger charged at request submit and refunded async on 5xx / NSFW / connection drop. It worked technically, but every customer's monthly statement had a refund column that needed explaining. Walked back to a two-phase commit — reservation at submit, debit only on successful delivery — and now the refund column is empty. The line item simply doesn't exist for failures. (Full write-up in our "how failures don't bill actually works" piece.)
3. We tried to ship one app that did everything
Marketing landing page, customer console, API docs, all in the same Next.js process. Cute for two weeks; agony for two months. Every marketing copy fix required redeploying the auth-handling code path. Every console iteration needed a marketing-side smoke test. Walked back to three apps (marketing / console / docs) with shared workspace packages for design tokens and locales. Three deploy logs, three blast radii, one of which can blow up at a time. (Full write-up in the three-app architecture piece.)
4. We let pricing live in a separate database from the gateway
First pricing source-of-truth was a Notion table that ops maintained, exported to a static config the gateway read at startup. When ops shifted a number, the gateway didn't notice until the next restart. Sometimes the marketing site and the gateway charged different prices for two hours. Walked back to a single admin /channels endpoint in the gateway as the canonical source, with the marketing /pricing table joining against an export of that same endpoint. Same source on both sides; the marketing page either matches the gateway or it's a build away from matching.
5. We launched without a deploy pipeline
Our marketing site sat on the same lisahost container for six days while we shipped twenty PRs of cache / SEO / performance fixes — none of which reached prod until we manually `docker build → scp → ssh load → restart` once. Walked into mid-week with no CI/CD because, frankly, we hadn't needed it before. The day Lighthouse reports started reading flat we noticed. Walking back is in progress: a GitHub Action that triggers the build chain on merge to main, mirroring the manual steps we now know by heart.
“Every decision that turned out to be wrong took less than a week to write and several weeks to walk back. The leverage of getting the shape right early is enormous.”
If you're building something adjacent — a gateway, a multi-vendor router, an aggregation product — none of these five are universal rules. The first four are the shape we converged to after walking other shapes. The fifth is one we wish we'd noticed before the cache-debug rabbit hole. Yours might be different. But it's almost certainly there waiting to be walked back.