Inference from a Python function to a billion devices.

@compile AI models into native binaries. Deploy on cloud GPU fleets, personal devices, and everything in-between.

0b00Open-weight & Proprietary Models

Any open model.
Every modality.
One drop-in SDK.

We provide an OpenAI-compatible client across every framework you build in: Python, JavaScript, Kotlin, Unity, and Swift.

0b01Inference Placement and Cost

Pick where each inference runs.
Spend 3× less.

Decide where each inference runs at call-time. Pay per-second for cloud GPU inference, or zero for on-device inference.

Price · vs hosted inference
embedding = muna.beta.openai.embeddings.create(
input="I can choose where each and every inference runs?",
model="@nomic/nomic-embed-text-v1.5",
acceleration="..."
)
$0$2$4$6Time →
Traditional ProviderH100 · $6.50/hr
Time →
Muna
0b10Cold starts

No containers.
No cold starts.
Boot 45× faster.

By removing everything between your model and the GPU, there's nothing left to cold-start. The first call lands as fast as the millionth.

Cold start · container vs binary
Traditional
Muna
0b11Quickstart

From pip install to your first prediction in one minute.

Literally two commands. No sign up required to start.

terminal
# Install the Muna CLI and Python client
$ pip install muna
# Create speech with Kokoro TTS
$ muna predict @hexgrad/kokoro-tts \
--text "What a time to be alive" \
--voice "af_bella"