Liquid AI’s LFM2-2.6B-Exp Makes use of Pure Reinforcement Studying RL And Dynamic Hybrid Reasoning To Tighten Small Mannequin Conduct

28 December 2025

15

Liquid AI has launched LFM2-2.6B-Exp, an experimental checkpoint of its LFM2-2.6B language mannequin that’s educated with pure reinforcement studying on prime of the prevailing LFM2 stack. The purpose is easy, enhance instruction following, information duties, and math for a small 3B class mannequin that also targets on machine and edge deployment.

The place LFM2-2.6B-Exp Suits within the LFM2 Household?

LFM2 is the second era of Liquid Basis Fashions. It’s designed for environment friendly deployment on telephones, laptops, and different edge gadgets. Liquid AI describes LFM2 as a hybrid mannequin that mixes brief vary LIV convolution blocks with grouped question consideration blocks, managed by multiplicative gates.

The household consists of 4 dense sizes, LFM2-350M, LFM2-700M, LFM2-1.2B, and LFM2-2.6B. All share a context size of 32,768 tokens, a vocabulary measurement of 65,536, and bfloat16 precision. The two.6B mannequin makes use of 30 layers, with 22 convolution layers and eight consideration layers. Every measurement is educated on a ten trillion token funds.

LFM2-2.6B is already positioned as a excessive effectivity mannequin. It reaches 82.41 % on GSM8K and 79.56 % on IFEval. This locations it forward of a number of 3B class fashions resembling Llama 3.2 3B Instruct, Gemma 3 4B it, and SmolLM3 3B on these benchmarks.

LFM2-2.6B-Exp retains this structure. It reuses the identical tokenization, context window, and {hardware} profile. The checkpoint focuses solely on altering habits by means of a reinforcement studying stage.

https://huggingface.co/LiquidAI/LFM2-2.6B-Exp

Pure RL on Prime of a Pretrained, Aligned Base

This checkpoint is constructed on LFM2-2.6B utilizing pure reinforcement studying. It’s particularly educated on instruction following, information, and math.

The underlying LFM2 coaching stack combines a number of levels. It consists of very giant scale supervised high quality tuning on a mixture of downstream duties and common domains, customized Direct Desire Optimization with size normalization, iterative mannequin merging, and reinforcement studying with verifiable rewards.

However precisely ‘pure reinforcement studying’ means? LFM2-2.6B-Exp begins from the prevailing LFM2-2.6B checkpoint after which goes by means of a sequential RL coaching schedule. It start with instruction following, then prolong RL coaching to information oriented prompts, math, and a small quantity of software use, with out a further SFT heat up or distillation step in that remaining section.

The necessary level is that LFM2-2.6B-Exp doesn’t change the bottom structure or pre coaching. It modifications the coverage by means of an RL stage that makes use of verifiable rewards, on a focused set of domains, on prime of a mannequin that’s already supervised and desire aligned.

Benchmark Sign, Particularly On IFBench

Liquid AI crew highlights IFBench as the primary headline metric. IFBench is an instruction following benchmark that checks how reliably a mannequin follows advanced, constrained directions. On this benchmark, LFM2-2.6B-Exp surpasses DeepSeek R1-0528, which is reported as 263 instances bigger in parameter rely.

LFM2 fashions present robust efficiency throughout a normal set of benchmarks resembling MMLU, GPQA, IFEval, GSM8K, and associated suites. The two.6B base mannequin already competes properly within the 3B section. The RL checkpoint then pushes instruction following and math additional, whereas staying in the identical 3B parameter funds.

Structure and Capabilities that Issues

The structure makes use of 10 double gated brief vary LIV convolution blocks and 6 grouped question consideration blocks, organized in a hybrid stack. This design reduces KV cache value and retains inference quick on client GPUs and NPUs.

The pre coaching combination makes use of roughly 75 % English, 20 % multilingual information, and 5 % code. The supported languages embrace English, Arabic, Chinese language, French, German, Japanese, Korean, and Spanish.

LFM2 fashions expose a ChatML like template and native software use tokens. Instruments are described as JSON between devoted software listing markers. The mannequin then emits Python like calls between software name markers and reads software responses between software response markers. This construction makes the mannequin appropriate because the agent core for software calling stacks with out customized immediate engineering.

LFM2-2.6B, and by extension LFM2-2.6B-Exp, can also be the one mannequin within the household that permits dynamic hybrid reasoning by means of particular suppose tokens for advanced or multilingual inputs. That functionality stays accessible as a result of the RL checkpoint doesn’t change tokenization or structure.

Key Takeaways

LFM2-2.6B-Exp is an experimental checkpoint of LFM2-2.6B that provides a pure reinforcement studying stage on prime of a pretrained, supervised and desire aligned base, focused at instruction following, information duties, and math.
The LFM2-2.6B spine makes use of a hybrid structure that mixes double gated brief vary LIV convolution blocks and grouped question consideration blocks, with 30 layers, 22 convolution layers and eight consideration layers, 32,768 token context size, and a ten trillion token coaching funds at 2.6B parameters.
LFM2-2.6B already achieves robust benchmark scores within the 3B class, round 82.41 % on GSM8K and 79.56 % on IFEval, and the LFM2-2.6B-Exp RL checkpoint additional improves instruction following and math efficiency with out altering the structure or reminiscence profile.
Liquid AI studies that on IFBench, an instruction following benchmark, LFM2-2.6B-Exp surpasses DeepSeek R1-0528 despite the fact that the latter has many extra parameters, which exhibits a powerful efficiency per parameter for constrained deployment settings.
LFM2-2.6B-Exp is launched on Hugging Face with open weights beneath the LFM Open License v1.0 and is supported by means of Transformers, vLLM, llama.cpp GGUF quantizations, and ONNXRuntime, making it appropriate for agentic techniques, structured information extraction, retrieval augmented era, and on machine assistants the place a compact 3B mannequin is required.

Try the Mannequin right here. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you may be a part of us on telegram as properly.

Max is an AI analyst at MarkTechPost, based mostly in Silicon Valley, who actively shapes the way forward for expertise. He teaches robotics at Brainvyne, combats spam with ComplyEmail, and leverages AI every day to translate advanced tech developments into clear, comprehensible insights

Liquid AI’s LFM2-2.6B-Exp Makes use of Pure Reinforcement Studying RL And Dynamic Hybrid Reasoning To Tighten Small Mannequin Conduct

The place LFM2-2.6B-Exp Suits within the LFM2 Household?

Pure RL on Prime of a Pretrained, Aligned Base

Benchmark Sign, Particularly On IFBench

Structure and Capabilities that Issues

Key Takeaways

Related Articles

How A lot Does it Value to Create an App?

TypeScript 6.0 beta lays the inspiration for shifting the codebase from JavaScript to Go

Fuel City, Beads, and the Rise of Agentic Growth with Steve Yegge

LEAVE A REPLY Cancel reply

Latest Articles

How A lot Does it Value to Create an App?

TypeScript 6.0 beta lays the inspiration for shifting the codebase from JavaScript to Go

Fuel City, Beads, and the Rise of Agentic Growth with Steve Yegge

The right way to Construct a Procurement Administration System: Plan & System Necessities

Arcjet launch v1 of its SDK for enabling safety capabilities in JavaScript apps