Choosing the right image and video model: a practical guide

Nine image models and eight video models behind one ByteSpike key. They aren't interchangeable — each has a brief it owns and a brief it loses on. Here's the call-it-by-name decision tree.

May 8, 2026KL7 min read

Image: pick by brief, not by tier

If your brief reads `studio lighting, glass surfaces, anatomical correctness`, reach for **Nano Banana Pro** at `quality: hd` for hero / 2048² output. **Nano Banana 2** is the right step-up only when in-image text legibility or hand / face anatomy at small scale matters — that's where it visibly improves over Pro.

If your prompt reads `the kettle on the LEFT, mug on the RIGHT, both steaming`, reach for **GPT-Image 2**. It respects spatial relationships better than the photoreal alternatives.

If your prompt is in Chinese / Japanese / Korean and the brief is illustrative or aesthetic (not photoreal), reach for **Seedream V4** at 1024² or **Seedream V4.5** at 2048². V5 lite is the iteration tier — cheap drafts before committing to a V4.x final.

Video: pick by latency budget, then by aesthetic

If you need 1080p hero quality and have 90+ seconds of latency budget, **Sora 2 Pro** for cinematic / studio shots, **Veo 3.1** for natural-world footage (water, weather, wildlife), **Seedance 2.0 Pro** for character motion in CJK contexts.

If you have a 30-second SLO and 720p is acceptable, **Veo 3.1 Fast** or **Seedance Fast** are the right call. Both are tuned for prompt iteration and time-bound UX where the user is watching the clock.

If you need Pro fidelity but the deadline isn't full-Pro friendly, **Seedance 2.0 Pro Fast** — about half the wait of full Pro on 5-8s clips, output that holds against Pro on most A/B tests.

Cost / latency cheat sheet

Per the live rate card at docs.bytespike.ai/pricing — image generation runs from $0.012 (Seedream V5 lite 1024²) to $0.250 (GPT-Image 2 high 2048²). Video runs from $0.03/s (Seedance Fast 720p) to $0.45/s (Sora 2 Pro 1080p). All sync image calls land in 4-30s; all video calls go through tasks/submit and complete in 15-180s depending on tier.

Failed image and video tasks don't bill (cancellation during `queued` is free; cancellation during `running` partial-bills). Estimated credits ship in the submit response so you can preview the cost before paying for the GPU work.