Providers and Inference Types
Updated: Dec 11, 2025
- Understand how Providers define where AI inference runs (Cloud, Local,
or On-Device).
- Learn how to configure and authenticate Cloud Providers like OpenAI, Llama
API, Hugging Face, and ElevenLabs.
- Connect Local Providers such as Ollama for LAN-based, private inference.
- Set up On-Device Providers with the Unity Inference Engine for offline
on-device execution.
- Learn about the Provider Installation Routine and
RemoteProviderProfileRegistry to import Providers during setup.
AI Building Blocks separate what you run (the Agent) from where it
runs (the Provider). A Provider is a Unity ScriptableObject that
performs inference through a Cloud, Local Server, or On-Device
backend using the Unity Inference Engine.
A Provider handles provider-specific input/output handling and
formatting to support the Agent’s focus on core logic. Agents
therefore should be Provider-agnostic and work with any Provider that supports
the same task through a task interface, for example IObjectDetectionTask.
| Inference Type | Description | Typical Use |
|---|
Cloud | Sends text, audio, or image payloads to a hosted model over HTTPS and returns results. | Fastest way to prototype using the latest models (LLMs, TTS/STT, DETR). |
Local | Communicates with a model running on a local machine in the same Wi-Fi network (for example, Ollama). | Low latency, private demos, exhibitions. |
On-Device | Runs the model directly on the headset via Unity Inference Engine. | Lowest latency, full privacy, and no network dependency. Significant performance impact. |
Cloud Providers are the easiest way to get started: Just create a Provider
asset, paste your API key, and enter the endpoint/model you want to
use.
Always check provider and model availability
Providers and models may not always be available on the provider's servers. Always check provider and model availability before using them in your experience. | Provider (Asset) | Common Models / Capabilities | Editor Features |
|---|
LlamaApiProvider | Official Meta Llama family (chat and multimodal variants) | Curated model list with automatic vision toggling. |
OpenAIProvider | gpt-5, gpt-4o (chat/vision), whisper-1 (STT), tts-1, tts-1-hd (TTS)
| Model picker, Chat/Vision toggle, STT/TTS configuration foldouts. |
HuggingFaceProvider | Any Hugging Face-hosted model (for example, facebook/detr-resnet-101, Llama family) | Token validator, endpoint health checker, image inlining options. |
ReplicateProvider | Community-hosted models (owner/model[:version]) | Endpoint override, base64/data URI support, inline byte cap. |
ElevenLabsProvider | Text-to-Speech and Speech-to-Text (Scribe) | Fetches voices, models, and metadata directly from your ElevenLabs account. |
Setting Up a Cloud Provider
- Create a Provider Asset:
Create → Meta → AI → Provider Assets → <Cloud>/<Your Provider> - Enter API Key: Use the Get Key… button in the Inspector to open your
provider’s developer portal.
- Set Endpoint and Model: Copy these from the provider’s example
curl
request. - Click Validate / Check: Confirm authentication and connectivity on your
provider’s website or the asset itself.
- (Optional) Configure Vision Options: Adjust settings like Inline Remote
Images, Resolve Redirects, and Max Inline Bytes.
| Provider (Asset) | Backend | Description |
|---|
OllamaProvider | Ollama daemon (http://localhost:11434) | Discovers installed models via /api/tags and lets you select local tags (for example, llama3, llava:latest, gemma3). |
- Run Ollama on your local machine:
ollama pull llama3ollama serve
- In Unity, open your
OllamaProvider asset and configure:
- Click Refresh Models to fetch available tags, then press Play to test
the connection.
| Provider (Asset) | Runtime | Highlights |
|---|
UnityInferenceEngineProvider | Unity Inference Engine (Sentis) | Supports GPU/CPU backends, optional Split Over Frames for smoother performance, and GPU-based NMS compute shader integration. |
- Create →
Create → Meta → AI → Provider Assets → On-Device → Unity Inference Engine - Assign your model file (
.onnx or .sentis) - Configure Backend:
GPUCompute or CPU - Adjust Split Over Frames / Layers Per Frame for performance tuning
- (Optional) Add Class Labels via a
.txt file - (Optional) Enable GPU Non-Max Suppression (NMS) using the provided
compute shader
Provider Selection During Building Block Installation
The RemoteProviderProfileRegistry automatically retrieves configuration
files from Meta’s CDN containing official Provider profiles and defaults
(endpoints, model names, and so on). When adding a new AI Building Block from
Meta Hub → Building Blocks:
- The installer detects all available inference types for that block.
- It loads compatible Providers from the RemoteProviderProfileRegistry.
- You choose your preferred inference type (Cloud, Local, or On-Device).
- The selected Provider asset is saved with your prefab or component.
- You can later edit it directly in the Inspector to update models, endpoints,
or keys.