I've been learning about using AI locally for a few months now. First, I learned about Quantization, Llamacpp and the GGUF format. I managed to get some models run on my Steam Deck, though heavily quantized and they weren't very useful. Models smaller than 3GB haven't gotten there yet...
Now I finally got a Zen 4 device with a decent RAM, and I went wild. I downloaded a bunch of models, and strove to test the limits of what I can do with this new computer.
I dabbled with LMStudio and I'm thinking of adding Open-WebUI too, but right now I'm using Koboldcpp as the server and the frontend. I like how it provides both an API and web-based UIs accessible from my tablet and other devices connected to the same router.
HuggingFace is full of models I wanted to try, but my daily internet bandwidth was limited so I could only download one big model or a few smaller ones...
I managed to get Qwen 30B A3B running. It's the most useful (for now) model I ran locally! The fastest token generation speed I got was 18 tokens/s with my RAM bandwidth probably being the bottleneck. Surprisingly, I got lower generation speed on the iGPU than I did on CPU.
On models like Ministral 8B, and GLM4.6V Flash I got around 10 tokens/s at best. They lose speed even quicker than smaller models, so I'll stick to models with 4B or less Active Parameters at a time.
Another limitation I found is the Context Window. All models lose speed at 4000 tokens, and the speed drop becomes way worse at 8000 tokens. I didn't test with context bigger than 10,000 tokens yet, but if it reaches speeds of less than 4t/s, the model is basically useless to me.
Anyway, this experience made me decide that if I want to use local AI for coding, I WILL need a dedicated GPU, or at least a specialized AI machine. I will stick to Venice.ai's inference for coding for now... I hope these become affordable in 2026, but the current trends are devouring this hope I have...
So, What Do You Think?
I'd love to keep you guys updated on my AI journey. See you in another article.~
Related Threads
@ahmadmanga/re-leothreads-2mc8ooqca
@ahmadmanga/re-leothreads-gyriwmnl