
Inference from a Python function to a billion devices.
@compile AI models into native binaries. Deploy on cloud GPU fleets, personal devices, and everything in-between.
0b00Open-weight & Proprietary Models
Any open model.
Every modality.
One drop-in SDK.
We provide an OpenAI-compatible client across every framework you build in: Python, JavaScript, Kotlin, Unity, and Swift.
Large Language Models
Audio and Voice
Embeddings
0b01Inference Placement and Cost
Pick where each inference runs.
Spend 3× less.
Decide where each inference runs at call-time. Pay per-second for cloud GPU inference, or zero for on-device inference.
Price · vs hosted inference
embedding = muna.beta.openai.embeddings.create(
input="I can choose where each and every inference runs?",
model="@nomic/nomic-embed-text-v1.5",
acceleration="..."
)
Traditional ProviderH100 · $6.50/hr
Muna
0b10Cold starts
No containers.
No cold starts.
Boot 45× faster.
By removing everything between your model and the GPU, there's nothing left to cold-start. The first call lands as fast as the millionth.
Cold start · container vs binary
Traditional
Muna

0b11Quickstart
From pip install to your first prediction in one minute.
Literally two commands. No sign up required to start.
terminal
# Install the Muna CLI and Python client
$ pip install muna
# Create speech with Kokoro TTS
$ muna predict @hexgrad/kokoro-tts \
--text "What a time to be alive" \
--voice "af_bella"