11.1 C
New York
Sunday, October 26, 2025

Run Ollama Fashions Domestically and make them Accessible by way of Public API


Blog thumbnail - Expose Local Ollama Models with a Public API

Introduction

Operating Giant Language Fashions (LLMs) and different open-source fashions domestically provides vital benefits for builders. That is the place Ollama shines. Ollama simplifies the method of downloading, establishing, and working these highly effective fashions in your native machine, supplying you with higher management, enhanced privateness, and lowered prices in comparison with cloud-based options.

Whereas working fashions domestically provides immense advantages, integrating them with cloud-based tasks or sharing them for broader entry generally is a problem. That is exactly the place Clarifai Native Runners are available in. Native Runners allow you to show your domestically working Ollama fashions by way of a public API endpoint, permitting seamless integration with any challenge, anyplace, successfully bridging the hole between your native atmosphere and the cloud.

On this publish, we’ll stroll via the right way to run open-source fashions utilizing Ollama and expose them with a public API utilizing Clarifai Native Runners. This makes your native fashions accessible globally whereas nonetheless working completely in your machine.

Native Runners Defined

Native Runners allow you to run fashions by yourself machine, whether or not it is your laptop computer, workstation, or on-prem server, whereas exposing them via a safe, public API endpoint. You needn’t add the mannequin to the cloud. The mannequin stays native however behaves prefer it’s hosted on Clarifai.

As soon as initialized, the Native Runner opens a safe tunnel to Clarifai’s management airplane. Any requests to your mannequin’s Clarifai API endpoint are routed to your machine, processed domestically, and returned to the caller. From the surface, it capabilities like every other hosted mannequin. Internally, every part runs in your {hardware}.

Native Runners are particularly helpful for:

  • Quick native growth: Construct, take a look at, and iterate on fashions in your individual atmosphere with out deployment delays. Examine visitors, take a look at outputs, and debug in actual time.
  • Utilizing your individual {hardware}: Benefit from native GPUs or customized {hardware} setups. Let your machine deal with inference whereas Clarifai manages routing and API entry.
  • Personal and offline knowledge: Run fashions that depend on native recordsdata, inner databases, or personal APIs. Maintain every part on-prem whereas nonetheless exposing a usable endpoint.

Native Runners provides you the flexibleness of native execution together with the attain of a managed API, all with out giving up management over your knowledge or atmosphere.

Expose Native Ollama Fashions by way of Public API

This part will stroll you thru the steps to get your Ollama mannequin working domestically and accessible by way of a Clarifai public endpoint.

Stipulations

Earlier than we start, guarantee you’ve gotten:

Step 1: Set up Clarifai and Login

First, set up the Clarifai Python SDK:

Subsequent, log in to Clarifai to configure your context. This hyperlinks your native atmosphere to your Clarifai account, permitting you to handle and expose your fashions.

Observe the prompts to enter your Consumer ID and Private Entry Token (PAT). When you need assistance acquiring these, consult with the documentation right here.

Step 2: Set Up Your Native Ollama Mannequin for Clarifai

Subsequent, you’ll put together your native Ollama mannequin so it may be accessed by Clarifai’s Native Runners. This step units up the required recordsdata and configuration to show your mannequin via a public API endpoint utilizing Clarifai’s platform.

Use the next command to initialize the setup:

This generates three key recordsdata inside your challenge listing:

  • mannequin.py

  • config.yaml

  • necessities.txt

These outline how Clarifai will talk along with your domestically working Ollama mannequin.

You can too customise the command with the next choices:

  • --model-name: Identify of the Ollama mannequin you need to serve. This pulls from the Ollama mannequin library (defaults to llama3:8b).

  • --port: The port the place your Ollama mannequin is working (defaults to 23333).

  • --context-length: Units the mannequin’s context size (defaults to 8192).

For instance, to make use of the gemma:2b mannequin with a 16K context size on port 8008, run:

After this step, your native mannequin is able to be uncovered utilizing Clarifai Native Runners.

Step 3: Begin the Clarifai Native Runner

As soon as your native Ollama mannequin is configured, the following step is to run Clarifai’s Native Runner. This exposes your native mannequin to the web via a safe Clarifai endpoint.

Navigate into the mannequin listing and run:

As soon as the runner begins, you’ll obtain a public Clarifai URL. This URL is your gateway to accessing your domestically working Ollama mannequin from anyplace. Requests made to this Clarifai endpoint will probably be securely routed to your native machine, permitting your Ollama mannequin to course of them.

Operating Inference on Your Uncovered Mannequin

Together with your Ollama mannequin working domestically and uncovered by way of Clarifai Native Runner, now you can ship inference requests to it from anyplace utilizing the Clarifai SDK or an OpenAI-compatible endpoint.

Inference utilizing OpenAI suitable methodology

Set your Clarifai PAT as an atmosphere variable:

Then, you need to use the OpenAI consumer to ship requests:

For multimodal inference, you may embrace picture knowledge:

Inference with Clarifai SDK

You can too use the Clarifai Python SDK for inference. The mannequin URL will be obtained out of your Clarifai account.

Customizing Ollama Mannequin Configuration

The clarifai mannequin init --toolkit ollama command generates a mannequin file construction:

ollama-model-upload/
├── 1/
│ └── mannequin.py

├── config.yaml
└── necessities.txt

You may customise the generated recordsdata to regulate how your mannequin works:

  • 1/mannequin.py – Customise to tailor your mannequin’s habits, implement customized logic, or optimize efficiency.

  • config.yaml – Outline settings akin to compute necessities, particularly helpful when deploying to devoted compute utilizing Compute Orchestration.

  • necessities.txt – Record any required Python packages on your mannequin.

This setup provides you full management over how your Ollama mannequin is uncovered and used by way of API. Confer with the documentation right here.

Conclusion

Operating open-source fashions domestically with Ollama provides you full management over privateness, latency, and customization. With Clarifai Native Runners, you may expose these fashions by way of a public API with out counting on centralized infrastructure. This setup makes it straightforward to plug native fashions into bigger workflows or agentic techniques, whereas preserving compute and knowledge totally in your management. If you wish to scale past your machine, take a look at Compute Orchestration to deploy fashions on devoted GPU nodes.



Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles