Recommended Hardware

Ollama·Ollama - Local LLMs for everyone!by𝕨𝕒𝕤𝕒𝕓𝕚

Recommended Hardware

What hardware is recommended for adequate performance and low power consumption. I've seen the Mac mini recommended a few times. Would the 16 GB model be enough? Or do I need the 24 gig model? What about this framework desktop thingy? It should be very good as well, but what will it consume at idle?

View original on feddit.org

Comments4

wise_pancake

lemmy.ca

I run a m1 mac pro with 32gb ram, I would recommend going for more RAM if you can. I have no idea how that compares with dedicated GPU setups.

The m4 mini with 16gb is going to be too small for most models to run well, the models I run are Phi4 (8.5gb), Gemma 3n (12-15gb), Magistral Small (12gb), Deekspeek R1 Qwen3 8B (4GB).

The 8gb models are going to be the smallest relatively useful ones, and the 12-16gb are much more reliable than them.

I do have a preorder on the Framework Desktop, which I think is going to be a good value for money, but there aren't comprehensive reviews out yet. The mac does have a lot of good reviews about performance out.

catty

lemmy.world

Yep, I keep reading that 32GB is considered a minimum. I also see that ollama is capable of sharing the model between GPU RAM and machine RAM, so the more RAM you have the better, starting with GPU RAM.

humanspiral

lemmy.ca

Hardware decisions are complete nightmare, because the best models the day after you decide will surely change. For, Coding/text better large models more important than fast dumber models, IMO

but 16gb, 5060ti
20gb, 7900 24gb, 7900xtx

for GPUs, I'd decide based on $/gb. They will all be fast enough. Nvidia generally faster, and so my pick for 16gb, if there are a lot of options with $100 of each other.

64-128gb, amd igpu or arm Mac (latter much better performance) lpddr speeds increasing next year, lpddr GPUs next year, maybe too. Nvidia spark and thor coming soon. wait for benchmarks on both.

256gb m4 max. 4 dimm slot 8700g is only PC desktop with igpu option. Older threadripper/xeon and cpu only could work.
512gb m3 ultra

Even if you need high memory, software design/evolution could end up supporting dual cooperating models, or you may use a GPU for retraining/fine tuning.

You can also have multiple computers with shared mouse/keyboard software that switches based on active mouse monitor, and cascading network connects, but giant models are pretty slow when scaled this way, though GPUs on older hardware, or pure minimal full 128gb linux system for LLM only, but with other option of gaming exclusive mode, as opposed to your 300 browser tab main computer.

For me, I could expand recent 7835hs with 128gb ddr5, and/or get oculink GPU, but also and/or get mac/strix halo or other fairly expensive computer as a dedicated "HPC/AI expansion platform". Or wait for better options.

CyberSeeker

discuss.tchncs.de

The more memory, the better. On a discrete GPU, you want to focus on the VRAM, but the Mac platforms have the benefit of integrated memory, which is shared between the system and graphics processor, so it can hold much larger models in memory than in VRAM alone.

For comparison, an RTX 3080, which has 12GB of VRAM, gives a pretty decent token rate on models which fit in memory (typically 110 tok/s on a 7-11B parameter model like mistral 7b).

However, a Mac mini 32GB could run a more advanced model like DeepSeek R1 with 32B parameters, at about 11 tok/s. However, when running the same 7B parameter model as the RTX 3080, it would still only generate 14-18 tok/s.

So you really need to balance capability/more advanced models with speed. If you’re okay submitting a prompt and walking away, the Mac mini is great value due to its integrated memory. But if you’re just getting started, you may find it frustratingly slow.

Finally, for comparison, most of the closed cloud models like Opus, ChatGPT, and MiniMax are closer to 700B parameters, an order of magnitude larger than what a layperson could run locally. These models are finally getting to the point where they’re ‘useful’ for complex tasks like unattended coding, but good luck getting 1TB of VRAM outside of a datacenter.