Spyke
localllama·LocalLLaMAbyrobber

Gemma4 12b released with "unified" approach to multi-modality

From the model card, sounds interesting:

The "Unified" in Gemma 4 12B Unified refers to its encoder-free architecture. Other Gemma 4 models use dedicated encoders to process multimodal data before passing it to the LLM. Gemma 4 12B eliminates these encoders entirely, projecting raw image patches and audio waveforms directly into the LLM's embedding space through lightweight linear layers. This unified approach means all modalities flow straight into a single decoder-only transformer, reducing multimodal latency and allowing the entire model to be fine-tuned in one pass.

The benchmarks put it closer to the 26b MoE than to the E variants of the Gemma4 series, but mostly below Qwen3.5 9b.

Looking forward to giving it a shot.

Gemma4 12b released with "unified" approach to multi-modalityhttps://huggingface.co/google/gemma-4-12BOpen linkView original on lemmy.ml
piefed.social

Are there already any uncesored models based on it? Asking for a friend...

3
thelemmy.club

Did you try any ? Because, I tried iglors and mradermacher, I got refusal to make a pipe bomb. Their answer are funny because they say to study academic engineering instead, lol. Still a refusal. I will try this one.

1
Mwareply
thelemmy.club

same, ig i will wait for a high quality uncensored model. (i.e, HauhauCS)

1
robberreply
lemmy.ml

You might want to check out heretic or similar tools. I did not try it but there are a lot of heretic finetunes available ond HF.

3
Mwareply
thelemmy.club

Hauhaucs said he's working on a uncensored version of Gemma 4 12b.

2

so Qwen 9b is for like asking questions(and getting good responses) and Gemma 12b is for audio and video input aswell as roleplay,creative writing?

1

You reached the end

Gemma4 12b released with "unified" approach to multi-modality | Spyke