This weblog submit focuses on new options and enhancements. For a complete listing, together with bug fixes, please see the launch notes.
Introducing Native Runners: Run Fashions on Your Personal {Hardware}
Constructing AI fashions typically begins regionally. You experiment with structure, fine-tune on small datasets, and validate concepts utilizing your personal machine. However the second you wish to check that mannequin inside a real-world pipeline, issues grow to be difficult.
You normally have two choices:
Add the mannequin to a distant cloud atmosphere, even for early-stage testing
Construct and expose your personal API server, deal with authentication, safety, and infrastructure simply to check regionally
Neither path is good, particularly in the event you’re:
Engaged on private or resource-limited tasks
Growing fashions that want entry to native information, OS-level instruments, or restricted knowledge
Managing edge or on-prem environments the place cloud is not viable
Native Runners resolve this drawback.
They mean you can develop, check, and run fashions by yourself machine whereas nonetheless connecting to Clarifai’s platform. You don’t have to add your mannequin to the cloud. You merely run it the place it’s — your laptop computer, workstation, or server — and Clarifai takes care of routing, authentication, and integration.
As soon as registered, the Native Runner opens a safe connection to Clarifai’s management aircraft. Any requests to your mannequin’s Clarifai API endpoint are securely routed to your native runner, processed, and returned. From a person perspective, it really works like every other mannequin hosted on Clarifai, however behind the scenes it is working fully in your machine.
Right here’s what you are able to do with Native Runners:
Streamlined mannequin improvement
Develop and debug fashions with out deployment overhead. Watch real-time visitors, examine inputs, and check outputs interactively.Leverage your personal compute
If in case you have a strong GPU or customized setup, use it to serve fashions. Your machine does the heavy lifting, whereas Clarifai handles the remainder of the stack.Non-public knowledge and system-level entry
Serve fashions that work together with native information, personal APIs, or inner databases. With assist for the MCP (Mannequin Context Protocol), you may expose native capabilities securely to brokers, with out making your infrastructure public.
Getting Began
Earlier than beginning a Native Runner, ensure you’ve executed the next:
Constructed or downloaded a mannequin – You need to use your personal mannequin or decide a appropriate one from a repo like Hugging Face. In case you’re constructing your personal, try the documentation on find out how to construction it utilizing the Clarifai-compatible challenge format.
Put in the Clarifai CLI – run
pip set up --upgrade clarifai
Generated a Private Entry Token (PAT) – out of your Clarifai account’s settings web page beneath “Safety.”
Created a context – this shops your native atmosphere variables (like person ID, app ID, mannequin ID, and so on.) so the runner is aware of how to hook up with Clarifai.
You possibly can arrange the context simply by logging in by the CLI, which is able to stroll you thru coming into all of the required values:
clarifai login
Beginning the Runner
As soon as every little thing is ready up, you can begin your Native Dev Runner from the listing containing your mannequin (or present a path):
clarifai mannequin local-runner [OPTIONS] [MODEL_PATH]
MODEL_PATH
is the trail to your mannequin listing. In case you depart it clean, it defaults to the present listing.This command will launch a neighborhood server that mimics a manufacturing Clarifai deployment, letting you check and debug your mannequin reside.
If the runner doesn’t discover an present context or config, it’ll immediate you to generate one with default values. This may create:
A devoted native compute cluster and nodepool.
An app and mannequin entry in your Clarifai account.
A deployment and runner ID that ties your native occasion to the Clarifai platform.
As soon as launched, it additionally auto-generates a shopper code snippet that can assist you check the mannequin.
Native Runners provide the flexibility to construct and check fashions precisely the place your knowledge and compute reside, whereas nonetheless integrating with Clarifai’s API, workflows, and platform options. Take a look at the complete instance and setup information within the documentation right here.
You possibly can strive Native Runners without spending a dime. There’s additionally a $1/month Developer Plan for the primary 12 months, which supplies you the power to attach as much as 5 Native Runners to the cloud API with limitless runner hours.
Compute UI
- We’ve launched a brand new Compute Overview dashboard that offers you a transparent, unified view of all of your compute sources. From a single display screen, now you can handle Clusters, Nodepools, Deployments, and the newly added Runners.
- This replace additionally consists of two main additions: Join a Native Runner, which helps you to run fashions immediately by yourself {hardware} with full privateness, and Join your personal cloud, permitting you to combine exterior infrastructure like AWS, GCP, or Oracle for dynamic, cost-efficient scaling. It’s now simpler than ever to regulate the place and the way your fashions run.
- We’ve additionally redesigned the cluster creation expertise to make provisioning compute much more intuitive. As a substitute of choosing every parameter step-by-step, you now get a unified, filterable view of all accessible configurations throughout suppliers like AWS, GCP, Azure, Vultr, and Oracle. You possibly can filter by area, occasion sort, and {hardware} specs, then choose precisely what you want with full visibility into GPU, reminiscence, CPU, and pricing. As soon as chosen, you may spin up a cluster immediately with a single click on.
Printed New Fashions
We revealed the Gemma-3n-E2B and Gemma-3n-E4B fashions. We’ve added each the E2B and E4B variants, optimized for text-only era and fitted to completely different compute wants.
Gemma 3n is designed for real-world, low-latency use on units like telephones, tablets, and laptops. These fashions leverage Per-Layer Embedding (PLE) caching, the MatFormer structure, and conditional parameter loading.
You possibly can run them immediately within the Clarifai Playground or entry them through our OpenAI-compatible API.
Token-Primarily based Billing
We’ve began rolling out token-based billing for choose fashions on our Neighborhood platform. This modification aligns with trade requirements and extra precisely displays the price of inference, particularly for big language fashions.
Token-based pricing will apply solely to fashions working on Clarifai’s default Shared compute within the Neighborhood. Fashions deployed on Devoted compute will proceed to be billed primarily based on compute time, with no change. Legacy imaginative and prescient fashions will nonetheless comply with per-request billing for now.
Playground
- The Playground web page is now publicly accessible — no login required. Nonetheless, sure options stay accessible solely to logged-in customers.
- Added mannequin descriptions and predefined immediate examples to the Playground, making it simpler for customers to know mannequin capabilities and get began shortly.
- Added Pythonic assist within the Playground for consuming the brand new mannequin specification.
- Improved the Playground person expertise with enhanced inference parameter controls, restored mannequin model selectors, and clearer error suggestions.
Extra Modifications
Python SDK: Added per-output token monitoring, async endpoints, improved batch assist, code validation, and construct optimizations.
Examine all SDK updates right here.Platform Updates: Improved billing accuracy, added dynamic code snippets, UI tweaks to Neighborhood Residence and Management Heart, and higher privateness defaults.
Discover all platform adjustments right here.Clarifai Organizations: Made invitations clearer, improved token visibility, and added persistent invite prompts for higher onboarding.
See full org enhancements right here.
Prepared to begin constructing?
With Native Runners, now you can serve fashions, MCP servers, or brokers immediately from your personal {hardware} with out importing mannequin weights or managing infrastructure. It’s the quickest solution to check, iterate, and securely run fashions out of your laptop computer, workstation, or on-prem server. You possibly can learn the documentation, watch the demo video to get began.