Run GLM 4.6 with an API

21 November 2025

32

Introduction

Zhipu AI launched GLM-4.6, the most recent mannequin in its Basic Language Mannequin (GLM) collection. In contrast to many proprietary frontier methods, the GLM household stays open-weight and is licensed beneath permissive phrases equivalent to MIT and Apache, making it one of many solely frontier-scale fashions that organizations can self-host.

GLM-4.6 builds on the reasoning and coding strengths of GLM-4.5 and introduces a number of main upgrades.

The context window expands from 128k to 200k tokens, enabling the mannequin to course of total books, codebases or multi-document evaluation duties in a single go.
It retains the Combination-of-Specialists structure with 355 billion complete parameters and roughly 32 billion energetic per token, however improves reasoning high quality, coding accuracy and tool-calling reliability.
A brand new pondering mode improves multi-step reasoning and sophisticated planning.
The mannequin helps native instrument calls, permitting it to resolve when to invoke exterior features or providers.
All weights and code are brazenly out there, permitting self-hosting, fine-tuning and enterprise customization.

These upgrades make GLM-4.6 a powerful open different for builders who want high-performance coding help, long-context evaluation and agentic workflows.

Mannequin Structure and Technical Particulars

Combination of Specialists Core

GLM-4.6 is constructed on a Combination-of-Specialists (MoE) Transformer structure. Though the complete mannequin comprises 355 billion parameters, solely round 32 billion are energetic per ahead go attributable to sparse skilled routing. A gating community selects the suitable consultants for every token, lowering compute overhead whereas preserving the advantages of a big parameter pool.

Key architectural options carried over from GLM-4.5 and refined in model 4.6 embody:

Grouped Question Consideration, which improves long-range interactions through the use of a lot of consideration heads and partial RoPE for environment friendly scaling.
QK-Norm, which stabilizes consideration logits by normalizing question–key interactions.
The Muon optimizer, which permits bigger batch sizes and quicker convergence.
A Multi-Token Prediction head, which predicts a number of tokens per step and enhances the efficiency of the mannequin’s pondering mode.

Hybrid Reasoning Modes

GLM-4.6 helps two reasoning modes.

The usual mode supplies quick responses for on a regular basis interactions.
The pondering mode slows down decoding, makes use of the MTP head for multi-token planning and generates inner chain-of-thought. This mode improves efficiency on logic issues, longer coding duties and multi-step agentic workflows.

Prolonged Context Window

One of the crucial necessary upgrades is the expanded context window. Transferring from 128k tokens to 200k tokens permits GLM-4.6 to course of massive codebases, full authorized paperwork, lengthy transcripts or multi-chapter content material with out chunking. This functionality is especially beneficial for engineering duties, analysis evaluation and long-form summarization.

Coaching Knowledge and Fantastic-Tuning

Zhipu AI has not disclosed the complete coaching dataset, however GLM-4.6 builds on the muse of GLM-4.5, which was pre-trained on trillions of numerous tokens after which fine-tuned closely on code, reasoning and alignment duties. Reinforcement studying strengthens its coding accuracy, reasoning high quality and tool-usage reliability. GLM-4.6 seems to incorporate further knowledge for tool-calling and agentic workflows, given its improved planning talents.

Software-Calling and Agentic Capabilities

GLM-4.6 is designed to perform because the management system for autonomous brokers. It helps structured perform calling and decides when to invoke instruments based mostly on context. Its inner reasoning improves argument validation, error rejection and multi-tool planning. In coding-assistant evaluations, GLM-4.6 achieves excessive tool-call success charges and approaches the efficiency of prime proprietary fashions.

Effectivity and Quantization

Though GLM-4.6 is massive, its MoE structure retains energetic parameters manageable. Public weights can be found in BF16 and FP32, and group quantizations in 4- to 8-bit codecs permit the mannequin to run on extra reasonably priced GPUs. It’s suitable with widespread inference frameworks equivalent to vLLM, SGLang and LMDeploy, giving groups versatile deployment choices.

Benchmark Efficiency

Zhipu AI evaluated GLM-4.6 on a spread of benchmarks masking reasoning, coding and agentic duties. Throughout most classes, it exhibits constant enhancements over GLM-4.5 and aggressive efficiency in opposition to high-end proprietary fashions equivalent to Claude Sonnet 4.

In real-world coding evaluations, GLM-4.6 achieved near-parity outcomes with proprietary fashions whereas utilizing fewer tokens per activity. It additionally demonstrates improved efficiency in tool-augmented reasoning and multi-turn coding workflows, making it one of many strongest open fashions at the moment out there.

Run GLM 4.6 with an API

Licensing and Openness

GLM-4.6 is launched beneath permissive licenses equivalent to MIT and Apache, permitting unrestricted industrial use, self-hosting and fine-tuning. Builders can obtain each base and instruct variations and combine them into their very own infrastructure. This openness stands in distinction to proprietary fashions like Claude and GPT, which might solely be used by way of paid APIs.

Accessing GLM-4.6 by way of API

GLM-4.6 is on the market on the Clarifai Platform, and you may entry it by way of API utilizing the OpenAI-compatible endpoint.

Step 1: Create a Clarifai Account and Get a Private Entry Token(PAT)

Enroll, and generate a Private Entry Token. You may also take a look at GLM-4.6 within the Clarifai Playground by choosing the mannequin and attempting coding, reasoning or agentic prompts.

Step 2: Set Up Your Surroundings

Step 3: Name GLM-4.6 by way of the API

Step 4: Utilizing TypeScript or JavaScript

You may also entry GLM 4.6 by way of the API utilizing different languages like Node.js and cURL. Try all of the examples right here.

Use Instances for GLM-4.6

Superior Coding Help

GLM-4.6 exhibits sturdy enhancements in code technology accuracy and effectivity. It produces high-quality code whereas utilizing fewer tokens than GLM-4.5. In human-rated evaluations, its coding means approaches that of proprietary frontier fashions. This makes it appropriate for full-stack improvement assistants, automated code evaluate, bug-fixing brokers and repository-level evaluation.

Agentic Workflows and Software Orchestration

GLM-4.6 is constructed for tool-augmented reasoning. It could plan multi-step duties, name exterior APIs, verify outcomes and preserve state throughout interactions. This permits autonomous coding brokers, analysis assistants and sophisticated workflow automation methods that depend on structured instrument calls.

Lengthy-Context Doc Evaluation

With a 200k-token window, the mannequin can learn and cause over total books, authorized paperwork, technical manuals or multi-hour transcripts. It helps compliance evaluate, multi-document synthesis, long-form summarization and codebase understanding.

Bilingual Growth and Artistic Writing

The mannequin is educated on each Chinese language and English and delivers sturdy efficiency in bilingual duties. It’s helpful for translation, localization, bilingual code documentation and artistic writing duties that require pure fashion and voice.

Enterprise-Grade Deployment and Customization

Because of its open license and versatile MoE structure, organizations can self-host GLM-4.6 on non-public clusters, fine-tune on proprietary knowledge and combine it with their inner instruments. Group quantizations additionally allow lighter deployments on restricted {hardware}. Clarifai supplies another cloud-hosted pathway for groups that need API entry with out managing infrastructure.

Conclusion

GLM-4.6 is a serious milestone in open AI improvement. It combines a big MoE structure, a 200k-token context window, hybrid reasoning modes and native tool-calling to ship efficiency that rivals proprietary frontier fashions. It improves on GLM-4.5 throughout coding, reasoning and tool-augmented duties whereas remaining absolutely open and self-hostable.

Whether or not you might be constructing autonomous coding brokers, analyzing massive doc units or orchestrating complicated multi-tool workflows, GLM-4.6 supplies a versatile, high-performance basis with out vendor lock-in.

Tags
API
GLM
Run

Run GLM 4.6 with an API

Introduction

Mannequin Structure and Technical Particulars

Combination of Specialists Core

Hybrid Reasoning Modes

Prolonged Context Window

Coaching Knowledge and Fantastic-Tuning

Software-Calling and Agentic Capabilities

Effectivity and Quantization

Benchmark Efficiency

Licensing and Openness

Accessing GLM-4.6 by way of API

Step 1: Create a Clarifai Account and Get a Private Entry Token(PAT)

Step 2: Set Up Your Surroundings

Step 3: Name GLM-4.6 by way of the API

Step 4: Utilizing TypeScript or JavaScript

Use Instances for GLM-4.6

Superior Coding Help

Agentic Workflows and Software Orchestration

Lengthy-Context Doc Evaluation

Bilingual Growth and Artistic Writing

Enterprise-Grade Deployment and Customization

Conclusion

Related Articles

How On-line Buying Apps Can Enhance Gross sales: The Final Information

Why Check Environments Fail—and What High Groups Do to Keep away from the Chaos

Cease Paving the Cowpath: Why Agentic-First Is the Solely Option to Construct for the Enterprise

LEAVE A REPLY Cancel reply

Latest Articles

How On-line Buying Apps Can Enhance Gross sales: The Final Information

Why Check Environments Fail—and What High Groups Do to Keep away from the Chaos

Cease Paving the Cowpath: Why Agentic-First Is the Solely Option to Construct for the Enterprise

Organizational Context for AI Coding Brokers with Dennis Pilarinos

The 5 Pillars of Software program Assurance in System Acquisition