Spyke

Status symbols could also be called symbols of inequality

Some days ago I saw people who attended a Fridays for Future demonstration excitedly put political stickers on a shiny blue Lamborghini which was obviously parked at the wrong point in spacetime.

When discussing this with a friend, we concluded that there was quite strong symbolism in that situation - like direct payback for the unnecessary pollution of the planet, the car being the canvas where the activists were able to project their anger onto.

We also talked about luxury cars being a symbol of social inequality.

And only later it hit me, how luxury cars, among other things, are usually called status symbols and how actually they could also be called symbols of equality.

View original on lemmy.ml

Comments3

localllama·LocalLLaMAbyrobber

Gemma4 12b released with "unified" approach to multi-modality

From the model card, sounds interesting:

The "Unified" in Gemma 4 12B Unified refers to its encoder-free architecture. Other Gemma 4 models use dedicated encoders to process multimodal data before passing it to the LLM. Gemma 4 12B eliminates these encoders entirely, projecting raw image patches and audio waveforms directly into the LLM's embedding space through lightweight linear layers. This unified approach means all modalities flow straight into a single decoder-only transformer, reducing multimodal latency and allowing the entire model to be fine-tuned in one pass.

The benchmarks put it closer to the 26b MoE than to the E variants of the Gemma4 series, but mostly below Qwen3.5 9b.

Looking forward to giving it a shot.

Gemma4 12b released with "unified" approach to multi-modality

https://huggingface.co/google/gemma-4-12BOpen link View original on lemmy.ml

Comments13

firefox·Firefoxbyrobber

FIFA Worldcup Ad in FireFox on Android

WTF? Do you see this as well?

View original on lemmy.ml

216

Comments77

localllama·LocalLLaMAbyrobber

llama.cpp: don't sleep on --split-mode tensor

In case you missed it, 2-3 weeks ago, experimental tensor-parallelism support was merged into llama.cpp.

In a nutshell, this allows in multi-GPU setups to not only combine the VRAM of the cards but also their computing power. The results depend a lot on the specific setup and model, but on my 3x RTX 2000e Ada rig running Qwen3.6-35b it almost doubled generation throughput (these are low-powered cards which are not very powerful on their own).

The option to turn it on is --split-mode tensor.

It's not yet officially documented, I assume because it's still experimental. But since #22362 was merged yesterday, in my case it now also work for the latest Qwen3.6 models.

llama.cpp: don't sleep on --split-mode tensor

https://github.com/ggml-org/llama.cpp/pull/19378Open link View original on lemmy.ml

Comments

localllama·LocalLLaMAbyrobber

Gemma 4 is here

https://huggingface.co/collections/google/gemma-4Open link View original on lemmy.ml

Comments2

localllama·LocalLLaMAbyrobber

Smaller qwen3.5 models released

The wait is over, most ggufs are already up. Nice to see there's models for many different hardware configurations.

https://huggingface.co/collections/Qwen/qwen35Open link View original on lemmy.ml

Comments8

localllama·LocalLLaMAbyrobber

Qwen3-Coder-Next

It's out!

https://huggingface.co/Qwen/Qwen3-Coder-NextOpen link View original on lemmy.ml

Comments10

localllama·LocalLLaMAbyrobber

Relevance of GPU driver version for inference performance

Hey everyone! I was just skimming through some inference benchmarks of other people and noticed the driver version is usually mentioned. It made me wonder how relevant this is. My prod server runs Debian 12 so the packaged nvidia drivers are rather old, but I'd prefer not to mess with the drivers if it won't bring a benefit. Does any of you have any experience or did do some testing?

View original on lemmy.ml

Comments11

localllama·LocalLLaMAbyrobber

Magistral-Small-2509 by Mistral has been released

https://huggingface.co/mistralai/Magistral-Small-2509Open link View original on lemmy.ml

Comments2

localllama·LocalLLaMAbyrobber

Qwen3-Next with 80b-a3b parameters is out

https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9dOpen link View original on lemmy.ml

Comments5

localllama·LocalLLaMAbyrobber

ExLlamaV3 adds tensor parallelism support

Title says it - it's been 10 days already but I didn't catch the release. This might be huge for those of us running on multiple GPUs. At least for Gemma3, I was able to double inference speed by using vLLM with tensor parallelism vs. ollama's homegrown parallelism. Support in ExLlamaV3 could additionally allow to pair TP with lower-bit quants. Haven't tested this yet, but I'm looking very much forward to.

https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.6Open link View original on lemmy.ml

Comments

localllama·LocalLLaMAbyrobber

New, promising MoE model "Hunyuan" by Tencent

Tencent recently released a new MoE model with ~80b parameters, 13b of which are active at inference. Seems very promising for people with access to 64 gigs of VRAM.

https://huggingface.co/tencent/Hunyuan-A13B-InstructOpen link View original on lemmy.ml

Comments1

localllama·LocalLLaMAbyrobber

Do you quantize models yourself?

Hey fellow llama enthusiasts! Great to see that not all of lemmy is AI sceptical.

I'm in the process of upgrading my server with a bunch of GPUs. I'm really excited about the new Mistral / Magistral Small 3.2 models and would love to serve them for me and a couple of friends. My research led me to vLLM with which I was able to double inference speed compared to ollama at least for qwen3-32b-awq.

Now sadly, the most common quantization methods (GGUF, EXL, BNB) are either not fully (GGUF) or not at all (EXL) supported in vLLM, or multi-gpu inference thouth tensor parallelism is not supported (BNB). And especially for new models it's hard to find pre-quantized models in different, more broadly supported formats (AWQ, GPTQ).

Does any of you guys face a similar problem? Do you quantize models yourself? Are there any up-to-date guides you would recommend? Or did I completely overlook another, obvious solution?

It feels like when I've researched something yesterday, it's already outdated again today, since the landscape is so rapidly evolving.

Anyways, thank you for reading and sharing your thoughts or experience if you feel like it.

View original on lemmy.ml

Comments8

linux·Linuxbyrobber

Well, that's offending

Text: Allows you to determine whether to limit CPUID maximum value. Set this to enabled for legacy operating systems such as Linux or Unix.

Found this in the BIOS of a Gigabyte Z97X-UD3H mobo.

View original on lemmy.ml

187

Comments10

selfhosted·Selfhostedbyrobber

Any experience with Pangolin?

Hi fellow homelabbers! I hope your day / night is going great.

Just stubled across this self-hosted cloudflare tunnel alternernative called Pangolin.

Does anyone use it for exposing their homelab? It looks awesome, but I've never heard of it before.
Should I be reluctant since it's developed by a US-based company? I mean security-wise. (I'll remove this question if it's too political.)
Does anyone know of alternatives pieces or stacks or software that achieve the same without relying on cloudflare?

Your insights are highly appreciated!

View original on lemmy.ml

Comments22

technology·Technologybyrobber

More than 140 Kenya Facebook moderators diagnosed with severe PTSD

https://www.theguardian.com/media/2024/dec/18/kenya-facebook-moderators-sue-after-diagnoses-of-severe-ptsdOpen link View original on lemmy.ml

272

Comments21

selfhosted·Selfhostedbyrobber

[Solved] Chaining routers and GUA IPv6 addresses

Hey fellow self-hosting lemmoids

Disclaimer: not at all a network specialist

I'm currently setting up a new home server in a network where I'm given GUA IPv6 addresses in a 64 bit subnet (which means, if I understand correctly, that I can set up many devices in my network that are accessible via a fixed IP to the oustide world). Everything works so far, my services are reachable.

Now my problem is, that I need to use the router provided by my ISP, and it's - big surprise here - crap. The biggest concern for me is that I don't have fine-grained control over firewall rules. I can only open ports in groups (e.g. "Web", "All other ports") and I can only do this network-wide and not for specific IPs.

I'm thinking about getting a second router with a better IPv6 firewall and only use the ISP router as a "modem". Now I'm not sure how things would play out regarding my GUA addresses. Could a potential second router also assign addresses to devices in that globally routable space directly? Or would I need some sort of NAT? I've seen some modern routers with the capability of "pass-through" IPv6 address allocation, but I'm unsure if the firewall of the router would still work in such a configuration.

In IPv4 I used to have a similar setup, where router 1 would just forward all packets for some ports to router 2, which then would decide which device should receive them.

Has any of you experience with a similar setup? And if so, could you even recommend a router?

Many thanks!

Edit: I was able to achieve what I wanted by using OpenWrt and their IPv6 relay mode. Now my ISP router handles all IPv6 addresses directly, but I'm still able to filter the packets using the OpenWrt firewall. For IPv4 I didn't figure out how to, at the same time, use the ISP's DHCP server, so I just went with double NAT. Everything works like a charm. Thank you guys for pointing me in the right direction.

View original on lemmy.ml

Comments8

lemmyshitpost·Lemmy Shitpostbyrobber

USA to be renamed to XXX

Most relevant section translated to english:

If he (Trump) wins the election on November 5, his billionaire supporter Musk will chair the new board. This is to implement a full financial and performance audit of the entire government and make recommendations for drastic reforms.

Source: Swiss state media article

View original on lemmy.ml

140

Comments21

mildlyinfuriating·Mildly Infuriatingbyrobber

Modern online banking

A couple of years ago, QR-bills were introduced in Switzerland as a means to make payments easier. My bank provides an app to scan the QR codes, which I prefer not to install. The only other option they provide to scan the codes is to use the webcam. Am I supposed to print my digital bills to have my webcam scan them again? Just let me upload a goddamn screenshot.

View original on lemmy.ml

Comments13

selfhosted·Selfhostedbyrobber

Any of you have a self-hosted AI "hub"? (e.g. for LLM, stable-diffusion, ...)

I've been looking into self-hosting LLMs or stable diffusion models using something like LocalAI and / or Ollama and LibreChat.

Some questions to get a nice discussion going:

Any of you have experience with this?
What are your motivations?
What are you using in terms of hardware?
Considerations regarding energy efficiency and associated costs?
What about renting a GPU? Privacy implications?

View original on lemmy.ml

Comments21

Posts