Compile before you deploy.

Muna optimizes LLMs and other AI models before they hit production; shrinking size, boosting performance, and cutting cold starts by up to 45×. Serve the compiled model on Muna, on your own GPUs, or through an OpenAI-compatible endpoint.

Start with an API key Deploy on your compute

0b000Why compile inference?

Portable

Move the same model from Muna GPUs to your own infrastructure with one CLI command.

Optimized

Compile models into hardware-aware inference servers for the target device.

Compatible

Use the OpenAI SDK, streaming, model aliases, and familiar request shapes.

0b001Open-weight & Proprietary Models

Any open model.
Every modality.
Your OpenAI client.

Point the official OpenAI SDK at our endpoint — no new concepts to learn. Compiled models serve chat, embeddings, transcription, and speech behind the API you already use.

0b010Inference Placement and Cost

Tune latency & cost per request.
Serve 3× more.

Compiled models run wherever you point them. Decide where each inference runs at call-time, and prioritize latency, throughput, or cost with extremely fine control.

Price · vs hosted inference

embedding = muna.beta.openai.embeddings.create(
    input="I can choose where each and every inference runs?",
    model="@nomic/nomic-embed-text-v1.5",
    acceleration="..."
)

Traditional ProviderH100 · $6.50/hr

Muna

0b011Cold starts

No containers.
No cold starts.
Boot 45× faster.

Compilation removes everything between your model and the GPU, so cold starts disappear. The first call lands as fast as the millionth.

Cold start · container vs binary

Traditional

Muna

See methodology and raw data →

0b100Bring your own compute

From hosted to your compute
in one command.

Start on Muna GPUs, then deploy the same compiled model to Modal, Baseten, or on-prem. Take full ownership of your AI inference stack.

terminal

# Deploy a compiled model to your own GPUs
$ muna deploy @openai/gpt-oss-20b     \
    --provider modal                  \
    --gpu h100

Ready to compile?

Grab an API key and make your first request with the OpenAI client you already have.

Get Started Read the docs

Compile before you deploy.

Portable

Optimized

Compatible

Any open model. Every modality. Your OpenAI client.

Large Language Models

Audio and Voice

Vision

Embeddings

Tune latency & cost per request.Serve 3× more.

No containers. No cold starts. Boot 45× faster.

From hosted to your compute in one command.

Ready to compile?

Any open model.
Every modality.
Your OpenAI client.

Tune latency & cost per request.
Serve 3× more.

No containers.
No cold starts.
Boot 45× faster.

From hosted to your compute
in one command.