"The cost of running LLMs is just too damn high" (aussie.zone)

submitted 1 day ago* (last edited 1 day ago) by SuspiciousCarrot78@aussie.zone to c/localllama@sh.itjust.works

8 comments fedilink hide all child comments

I was browsing Reddit (yetch) while waiting for some stuff to finish when I came across this post

https://old.reddit.com/r/LocalLLM/comments/1tek00h/why_is_llm_is_so_expensive/

The author make a (very) interesting claim: if table stakes are $6K (they're not...but go with it for now), then most folks are cooked from the get go.

Personally, I have been figuring out how to get more from less. For example, people have found ways to run Qwen3.6 35B on a 6GB VRAM GTX 1060 at ~20tok/s (--ctx 64K IIRC, but go check the vids yourself)

https://youtu.be/8F_5pdcD3HY

I think there's a lot of juice to squeeze by turning LLMs from "all seeing sages" into basically mouth pieces for shit that actually runs fast on regular silicon - but that's just me and my crazy brain. YMMV.

you are viewing a single comment's thread
view the rest of the comments

[-] SuspiciousCarrot78@aussie.zone 6 points 1 day ago

Hey, me too :) As my school teachers use to tell me "Great minds think alike (but fools seldom differ :)"

For me, I'm thinking of having a LLM as one layer / one container in a homelab that does some specific stuff

queries against local docs / notes / manuals / PDFs / wiki material as the trusted knowledge layer
uses tools for search, file lookup, shell, git, Docker, Home Assistant, calendar, etc.
a local “Codex” / wiki layer that turns my own source material into an inspectable knowledge base
provenance and audit trails

I want to take a screenshot of something, drop it into Syncthing from my phone, then later ask "did I fuck the pins on this?" ... and for it to look up the schematics, eyeball the pins and tell me. Or I say "hey, can you grab a copy of X for me, usual params" and have the LLM instruct Sonarr/Radarr/Sabnzdb to do that. (That is, make your OWN "Alexa" with an Arduino ESP32, stick it in a room and then call it when you need it).

So instead of asking a 70B model to “know” why your media server is down, the system checks service status, logs, last config changes, prior notes, Docker state, network state, etc., then the LLM explains the result in human language. You can probably do that with a 4B (I'm testing that assumption now).

Same for “find that motherboard note,” “summarize this email thread,” “turn this into a task,” “compare this Ebay listing to my saved hardware notes,” “what did I do last time this broke,” or “run the smoke test and tell me the first real failure.”

I think small models are the shit for this because if the model only has to classify intent, route the request, render structured evidence, and talk like a normal human...then it doesn’t need to be a giant oracle. The expensive (time wise) part becomes less “make the model smarter” and more “build a better control plane around it.”

Basically: local LLM as semantic HID; expert system/tool router underneath; user owns the data and the machine.

As always, ICBW....but fuck it, I'm gonna try.

PS: I have an idea of how to apply that to coding too...but that's a project for much later. I've been cooking this shit for far too long. The next thing I wanna do is a fun project for myself (that is: ROM hack a parachute and grappling gun into Super Mario Sunshine, so I can basically play "What if Super Mario Sunshine but actually Just Cause 2" on my Wii with the kids.

this post was submitted on 16 May 2026

32 points (100.0% liked)

LocalLLaMA

4729 readers

39 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works