Gemma4 12b released with "unified" approach to multi-modality

From the model card, sounds interesting:

The "Unified" in Gemma 4 12B Unified refers to its encoder-free architecture. Other Gemma 4 models use dedicated encoders to process multimodal data before passing it to the LLM. Gemma 4 12B eliminates these encoders entirely, projecting raw image patches and audio waveforms directly into the LLM's embedding space through lightweight linear layers. This unified approach means all modalities flow straight into a single decoder-only transformer, reducing multimodal latency and allowing the entire model to be fine-tuned in one pass.

The benchmarks put it closer to the 26b MoE than to the E variants of the Gemma4 series, but mostly below Qwen3.5 9b.

Looking forward to giving it a shot.

Gemma4 12b released with "unified" approach to multi-modality

https://huggingface.co/google/gemma-4-12BOpen link View original on lemmy.ml

Comments13

lemmysmash

piefed.social

Are there already any uncesored models based on it? Asking for a friend...

robber reply

lemmy.ml

A bunch

Schilling2304 reply

thelemmy.club

Did you try any ? Because, I tried iglors and mradermacher, I got refusal to make a pipe bomb. Their answer are funny because they say to study academic engineering instead, lol. Still a refusal. I will try this one.