DOSIA × ByteSpike: one agent, every capability

You shouldn't have to keep three apps open to get a marketing kit done. DOSIA's main brain now picks up image generation, image analysis, video, and 'use a different LLM' as native capabilities — one OAuth connect to ByteSpike, then you just type what you want.

May 16, 2026KL7 min read

If you've ever tried to ship a campaign that needs a hero image, a 5-second product clip, a headline written in a different model's voice, and a paragraph of copy for the landing page, you know the shape of the problem: four tools, four logins, four paste-buffers, and the cognitive overhead of remembering which one is good at what today. The work isn't writing the prompt — the work is the switching.

DOSIA is our answer to that. It's a desktop agent — Mac-native today — built around the idea that you should have one main brain, and that main brain should pick up tools as you give it permission, without you ever having to switch panels.

One OAuth connect, then capability

Open DOSIA → Settings → Account → Connect ByteSpike. A browser tab opens, you approve, the toast says "Connected · N main models + M tool capabilities", and that's the whole onboarding. No API key pasted into a field, no manual model list, no "which provider hosts gpt-image-2 again" panic.

Behind the scenes DOSIA asks ByteSpike for your account's capability set, partitions it against a built-in model registry, and loads the tools your account is actually permissioned for. A user without video models in their plan won't see a greyed-out generate_video tool — they won't see one at all. The main brain genuinely has the surface area your account paid for, no more, no less.

Three plugins, one chat

The capabilities show up as three plugins, but you don't think about them as plugins — you just talk to the main brain. Here's what's actually wired up:

image-tools — text-to-image, image-to-image edits, vision-on-image ("is there a person in this photo?")
video-tools — text-to-video and image-to-video generation, returns a task id, you poll it, the clip comes back
text-writing-tools — write with a non-primary model (GPT, Gemini, DeepSeek, Doubao) when you want a different voice or different strengths for one specific paragraph

You say "draw a red apple in flat style", the main brain calls image-tools. You say "have GPT-5.5 rewrite this paragraph in a more formal tone", the main brain calls text-writing-tools with model=gpt-5.5. You say "make a 5-second product video out of this photo", the main brain calls video-tools, polls the task, and shows you the clip. You don't see the routing. You don't pick the panel. You keep typing.

What the data flow actually looks like

When you ask DOSIA to draw something, this is what happens: your message goes to the main brain, the main brain picks the right tool (generate_image), the tool resolves to a model from the registry (say gpt-image-2), the request goes to ByteSpike with your token, ByteSpike routes to the provider, the provider generates the image, ByteSpike bills your wallet against the per-image rate card — failures don't bill, by the way — and the image comes back into the chat. The whole thing is one conversation turn from your end.

And because ByteSpike speaks both Anthropic Messages and OpenAI Chat Completions, the same account is the same key for whatever your main brain happens to be — Claude today, possibly something else tomorrow, definitely a mix the day after that.

What ByteSpike brings to the deal

One token, 23+ frontier models across Anthropic / OpenAI / Google / DeepSeek / Doubao / ByteDance image and video stacks.
Public per-model rate card, no surprise charges, failures never bill.
Two protocols (Anthropic Messages, OpenAI Chat Completions) so the same key works for whatever code you already have.
Org-level wallets and per-member quotas — IT can hand DOSIA to ten people without ten provider accounts.

How to actually try it

Two pieces. First, grab DOSIA from bytespike.ai/dosia — signed macOS DMG, Apple Silicon native, Windows tracked for a future cycle. Second, sign up at bytespike.ai for free credits (no card required) and connect from DOSIA's Settings. Total time from "never heard of this" to "main brain just generated an image for me" is under five minutes if your Mac isn't busy.

If you've been juggling three tools to ship one piece of content, the One Agent setup is — at the very least — worth the five minutes.

One OAuth connect, then capability

Three plugins, one chat

The capabilities show up as three plugins, but you don't think about them as plugins — you just talk to the main brain. Here's what's actually wired up:

image-tools — text-to-image, image-to-image edits, vision-on-image ("is there a person in this photo?")

video-tools — text-to-video and image-to-video generation, returns a task id, you poll it, the clip comes back

text-writing-tools — write with a non-primary model (GPT, Gemini, DeepSeek, Doubao) when you want a different voice or different strengths for one specific paragraph

What the data flow actually looks like

What ByteSpike brings to the deal

One token, 23+ frontier models across Anthropic / OpenAI / Google / DeepSeek / Doubao / ByteDance image and video stacks.

Public per-model rate card, no surprise charges, failures never bill.

Two protocols (Anthropic Messages, OpenAI Chat Completions) so the same key works for whatever code you already have.

Org-level wallets and per-member quotas — IT can hand DOSIA to ten people without ten provider accounts.

How to actually try it

If you've been juggling three tools to ship one piece of content, the One Agent setup is — at the very least — worth the five minutes.