Run Fashions on Your Personal {Hardware}
Most AI growth begins domestically. You experiment with mannequin architectures, fine-tune them on small datasets, and iterate till the outcomes look promising. However when it’s time to check the mannequin in a real-world pipeline, issues shortly turn into difficult.
You often have two decisions: add the mannequin to the cloud even for easy testing, or arrange your personal API, managing routing, authentication, and safety simply to run it domestically.
Neither method works effectively should you’re:
Engaged on smaller or resource-limited tasks
Needing entry to native information or non-public information
Constructing for edge or on-prem environments the place cloud entry isn’t sensible
Introducing Native Runners – ngrok for AI fashions.
Native Runners allow you to serve AI fashions, MCP servers, or brokers instantly out of your laptop computer, workstation, or inner server, securely and seamlessly by way of a Public API. You don’t must add your mannequin or handle any infrastructure. Merely run it domestically, and Clarifai takes care of the API dealing with, routing, and integration.
As soon as working, the Native Runner establishes a safe connection to Clarifai’s management aircraft. Any API request despatched to your mannequin is routed to your machine, processed domestically, and returned to the consumer. From the skin, it behaves like a Clarifai-hosted mannequin, whereas all computation happens in your native {hardware}.
With Native Runners, you’ll be able to:
- Run fashions by yourself {hardware}
Use laptops, workstations, or on-prem servers to serve fashions instantly, with full entry to native GPUs or system instruments. - Hold information and compute non-public
Keep away from importing something. Helpful for regulated environments, inner instruments, or tasks involving delicate info. - Skip infrastructure setup
No must construct and host your personal API. Clarifai offers the endpoint, routing, and authentication. - Prototype and iterate shortly
Take a look at fashions in real-world pipelines with out deployment delays. Watch requests circulation via and examine outputs stay. - Hook up with native information and personal APIs
Let fashions entry your file system, inner databases, or OS-level sources—with out exposing your setting.
Now that you simply perceive the advantages and capabilities of Native Runners, let’s see how one can run Hugging Face fashions domestically and expose them securely.
Operating Hugging Face Fashions Domestically
The Hugging Face Toolkit in Clarifai CLI lets you obtain, configure, and run Hugging Face fashions domestically whereas exposing them securely via a public API. You possibly can take a look at, combine, and iterate on fashions instantly out of your native setting with out managing any exterior infrastructure.
Step 1: Stipulations
First, set up the Clarifai Package deal. This additionally offers the Clarifai CLI:
Subsequent, log in to Clarifai to hyperlink your native setting to your account. This lets you handle and expose your fashions.
Observe the prompts to enter your Consumer ID and Private Entry Token (PAT). If you happen to need assistance acquiring these, check with the documentation.
If you happen to plan to entry non-public Hugging Face fashions or repositories, generate a token out of your Hugging Face account settings and set it as an setting variable:
Lastly, set up the Hugging Face Hub library to allow mannequin downloads and integration:
With these steps full, your setting is able to initialize and run Hugging Face fashions domestically with Clarifai.
Step 2: Initialize a Mannequin
Use the Clarifai CLI to initialize and configure any supported Hugging Face mannequin domestically with the Toolkit:
By default, this command downloads and units up the unsloth/Llama-3.2-1B-Instruct mannequin in your present listing.
If you wish to use a special mannequin, you’ll be able to specify it with the --model-name flag and go the total mannequin title from Hugging Face. For instance:
Observe: Some fashions will be very giant and require important reminiscence or GPU sources. Be sure that your machine has sufficient compute capability to load and run the mannequin domestically earlier than initializing it.
Now, when you run the above command, the CLI will scaffold the venture for you. The generated listing construction will appear like this:
mannequin.py – Comprises the logic for loading the mannequin and working predictions.
config.yaml – Holds mannequin metadata, compute sources, and checkpoint configuration.
necessities.txt – Lists the Python dependencies required on your mannequin.
Step 3: Customise mannequin.py
As soon as your venture scaffold is prepared, the subsequent step is to configure your mannequin’s habits in mannequin.py. By default, this file features a class known as MyModel that extends ModelClass from Clarifai. Inside this class, you’ll discover 4 fundamental strategies prepared to be used:
load_model()– Masses checkpoints from Hugging Face, initializes the tokenizer, and units up streaming for real-time output.predict()– Handles single-prompt inference and returns responses. You possibly can modify parameters similar tomax_tokens,temperature, andtop_p.generate()– Streams outputs token by token, helpful for stay previews.chat()– Manages multi-turn conversations and returns structured responses.
You need to use these strategies as-is, or customise them to suit your particular mannequin habits. The scaffold ensures that every one core performance is already carried out, so you may get began with minimal setup.
Step 4: Configure config.yaml
The config.yaml file defines mannequin metadata and compute necessities. For Native Runners, most defaults work, but it surely’s vital to know every part:
checkpoints– Specifies the Hugging Face repository and token for personal fashions.inference_compute_info– Defines compute necessities. For Native Runners, you’ll be able to sometimes use defaults. When deploying on devoted infrastructure, you’ll be able to customise accelerators, reminiscence, and CPU primarily based on the mannequin necessities.mannequin– Comprises metadata similar toapp_id,model_id,model_type_id, anduser_id. ExchangeYOUR_USER_IDwith your personal Clarifai consumer ID.
Lastly, the necessities.txt file lists all Python dependencies required on your mannequin. You possibly can add any further packages your mannequin must run.
Step 5: Begin the Native Runner
As soon as your mannequin is configured, you’ll be able to launch it domestically utilizing the Clarifai CLI:
This command begins a Native Runner occasion in your machine. The CLI routinely handles all mandatory setup, so that you don’t must manually configure infrastructure.
After the Native Runner begins, you’ll obtain a public Clarifai URL. This URL acts as a safe gateway to your domestically working mannequin. Any requests made to this endpoint are routed to your native setting, processed by your mannequin, and returned via the identical endpoint.
Run Inference with Native Runner
As soon as your Hugging Face mannequin is working domestically and uncovered by way of the Clarifai Native Runner, you’ll be able to ship inference requests to it from wherever — utilizing both the OpenAI-compatible endpoint or the Clarifai SDK.
Utilizing the OpenAI-Suitable API
Use the OpenAI consumer to ship a request to your domestically working Hugging Face mannequin:
Utilizing the Clarifai Python SDK
You too can work together instantly via the Clarifai SDK, which offers a light-weight interface for inference:
You too can experiment with:
With this setup, your Hugging Face mannequin runs fully in your native {hardware} — but stays accessible by way of Clarifai’s safe public API.
Conclusion
Native Runners offer you full management over the place your fashions run — with out sacrificing integration, safety, or flexibility.
You possibly can prototype, take a look at, and serve actual workloads by yourself {hardware} whereas nonetheless utilizing Clarifai’s platform to route site visitors, deal with authentication, and scale when wanted.
You possibly can strive Native Runners without spending a dime with the Free Tier, or improve to the Developer Plan at $1/month for the primary 12 months to attach as much as 5 Native Runners with limitless hours. Learn extra in the documentation right here to get began.
