overview for SuspiciousCarrot78

"The cost of running LLMs is just too damn high" by SuspiciousCarrot78 in c/localllama@sh.itjust.works

[-] SuspiciousCarrot78@aussie.zone 1 points 13 hours ago* (last edited 13 hours ago)

On the broader topic of "the llm is the mouth, not the brain", I just stumbled across this.

https://www.atomelm.com/index.html#what

https://www.atomelm.com/index.html#prototypes

Might turn out to be something yet, dunno. Web demo is a bit meh.

Your Google Android device is about to stop being yours by SuspiciousCarrot78 in c/foss@beehaw.org

[-] SuspiciousCarrot78@aussie.zone 1 points 16 hours ago

Thanks for that. I've been meaning to (re) disable Google Play services. I have a few older phones too that never had it to begin with. I wonder how/if Aurora Store will be impacted. Presumably, if you don't have Google Play Services functioning, you don't get the poison pill. But...given that Big Evil likes to just ... do shit (cf the recent 4GB forced ingestion of their LLM with Chrome) I dunno.

In any case, step 1 is probably nuking that.

<8B multilingual models for language learning chatbots by SuspiciousCarrot78 in c/localllama@sh.itjust.works

[-] SuspiciousCarrot78@aussie.zone 2 points 16 hours ago

Granite is much more straight laced. Qwen is more expressive. Honestly, it reminds me a lot of early days with GPT 4 class models (and the benchmarks show it about matches that, too).

Your Google Android device is about to stop being yours by SuspiciousCarrot78 in c/foss@beehaw.org

[-] SuspiciousCarrot78@aussie.zone 1 points 19 hours ago* (last edited 19 hours ago)

Cool. So what happens if I run a version of Android that doesn't inherit Google security theater cruft? That is to say...what if the user simply...does not...upgrade the Android version to be affected by this (eg: uses an old phone or blocks OS version update?).

My phone is going on 7yrs old. Perfectly happy with it. When it breaks, I will get a phone of the same era (2nd hand or new-old stock) or investigate other options.

So, it seems to me, the winning move is not to play the game (in any one of 100 diff ways).

Or am I missing something here? Is there something that will prevent older tech from working? Because if so, I am happy to YOLO my phone and switch to a dumbphone if I have to.

"The cost of running LLMs is just too damn high" by SuspiciousCarrot78 in c/localllama@sh.itjust.works

[-] SuspiciousCarrot78@aussie.zone 3 points 22 hours ago* (last edited 22 hours ago)

Good man/woman. Nerd Valhalla awaits you :)

"The cost of running LLMs is just too damn high" by SuspiciousCarrot78 in c/localllama@sh.itjust.works

[-] SuspiciousCarrot78@aussie.zone 6 points 23 hours ago

Hey, me too :) As my school teachers use to tell me "Great minds think alike (but fools seldom differ :)"

For me, I'm thinking of having a LLM as one layer / one container in a homelab that does some specific stuff

queries against local docs / notes / manuals / PDFs / wiki material as the trusted knowledge layer
uses tools for search, file lookup, shell, git, Docker, Home Assistant, calendar, etc.
a local “Codex” / wiki layer that turns my own source material into an inspectable knowledge base
provenance and audit trails

I want to take a screenshot of something, drop it into Syncthing from my phone, then later ask "did I fuck the pins on this?" ... and for it to look up the schematics, eyeball the pins and tell me. Or I say "hey, can you grab a copy of X for me, usual params" and have the LLM instruct Sonarr/Radarr/Sabnzdb to do that. (That is, make your OWN "Alexa" with an Arduino ESP32, stick it in a room and then call it when you need it).

So instead of asking a 70B model to “know” why your media server is down, the system checks service status, logs, last config changes, prior notes, Docker state, network state, etc., then the LLM explains the result in human language. You can probably do that with a 4B (I'm testing that assumption now).

Same for “find that motherboard note,” “summarize this email thread,” “turn this into a task,” “compare this Ebay listing to my saved hardware notes,” “what did I do last time this broke,” or “run the smoke test and tell me the first real failure.”

I think small models are the shit for this because if the model only has to classify intent, route the request, render structured evidence, and talk like a normal human...then it doesn’t need to be a giant oracle. The expensive (time wise) part becomes less “make the model smarter” and more “build a better control plane around it.”

Basically: local LLM as semantic HID; expert system/tool router underneath; user owns the data and the machine.

As always, ICBW....but fuck it, I'm gonna try.

PS: I have an idea of how to apply that to coding too...but that's a project for much later. I've been cooking this shit for far too long. The next thing I wanna do is a fun project for myself (that is: ROM hack a parachute and grappling gun into Super Mario Sunshine, so I can basically play "What if Super Mario Sunshine but actually Just Cause 2" on my Wii with the kids.

"The cost of running LLMs is just too damn high" by SuspiciousCarrot78 in c/localllama@sh.itjust.works

[-] SuspiciousCarrot78@aussie.zone 6 points 1 day ago* (last edited 1 day ago)

I'm actually thinking of pivoting my router/orchestrater entirely. I think the way forward is to look at expert systems (yes, those ancient things from the long, long ago of...1980) but with modern tooling (that can be user updated), with a small LLM in the middle that the user can talk to. That is, de-emphasize the central role of the LLM entirely; rather, make it the user-facing NLP input/output and let the real programs, running on real silicon, do the work. I might have a different use case than most, but I bet not so different (that is to say, online LLM discussion seem to gravitate around user that use LLMs for coding; Anthropic and OAI internal reports say otherwise)

Ironically, I'm writing the blurb now while waiting for smoke test #90238472398 to finish.

30

"The cost of running LLMs is just too damn high" (aussie.zone)

submitted 1 day ago* (last edited 1 day ago) by SuspiciousCarrot78@aussie.zone to c/localllama@sh.itjust.works

8 comments fedilink

I was browsing Reddit (yetch) while waiting for some stuff to finish when I came across this post

https://old.reddit.com/r/LocalLLM/comments/1tek00h/why_is_llm_is_so_expensive/

The author make a (very) interesting claim: if table stakes are $6K (they're not...but go with it for now), then most folks are cooked from the get go.

Personally, I have been figuring out how to get more from less. For example, people have found ways to run Qwen3.6 35B on a 6GB VRAM GTX 1060 at ~20tok/s (--ctx 64K IIRC, but go check the vids yourself)

https://youtu.be/8F_5pdcD3HY

I think there's a lot of juice to squeeze by turning LLMs from "all seeing sages" into basically mouth pieces for shit that actually runs fast on regular silicon - but that's just me and my crazy brain. YMMV.

GitHub - ThroatyMumbo/WinCE64: Windows CE 2.11 for N64! · GitHub by SuspiciousCarrot78 in c/retrogaming@lemmy.world

[-] SuspiciousCarrot78@aussie.zone 4 points 1 day ago* (last edited 1 day ago)

I'd ask why...but "because I fucking wanted to" is entirely cormulent (and 100% valid) response. Just wish it had some screenshots or videos of it in action that we could geek out over.

EDIT: I need reading glasses, clearly

https://www.youtube.com/watch?v=eGS9su_inBY

The next step for the dev (are you here?) - get IE running and post from your N64 onto this Lemmy thread. I double dog dare you :)

Making a (kid and wife) friendly htpc experience? by SuspiciousCarrot78 in c/retrogaming@lemmy.world

[-] SuspiciousCarrot78@aussie.zone 2 points 1 day ago* (last edited 1 day ago)

What I did was this -

Lenovo M93P tiny (i7-4785t, 8GB, no GPU: cost $50. I can do upto PS2 at 1.5x, AAA games upto 2014/5 and later indies)
Offline (once art scrapped by below etc)
Windows 8.1 install (era appropriate, correct drivers, offline, yadda yadda) + ClassicShell
Installed Xbox 360 dongle with drivers
Installed games I wanted / emulators (eg: Dolphin for Wii and GC, PCSX2 for PS2 etc)
Installed Playnite, set it to launch full screen
Define scripts / launch conditions (e.g., Getting AntiMicroX to launch when Luanti launches, so that it can be played with controllers instead of keyboard, then shutdown cleanly when return to PlayNite)
Replaced Explorer.exe as the default shell in Regedit

End result: turn on PC, boots into Windows (in about 2 seconds), launches Playnite (which is full controller / couch mode compatible). Additionally, I can fine tune things like EDID (fine grained control of display modes), ReShade (per game sharpening etc effects), to say nothing of the extra Win programs I can run.

With a bit of skill, you can make games look way better than they have any right to, even on low end hardware. I can dig up some screenshots of Just Cause 2 and FireWatch running in 540p for you if you'd like...you'd be hard pressed to tell it wasn't much higher resolution (viewed on 75" tv from 8 feet away).

Reason I did it this way:

People will tell you Batocera is awesome (and it is) but...there are just some things that run better natively (e.g., Fallout 3 GOG Game of the Year Edition, Just Cause 2 etc). Windows lets you play windows shit natively and the emulation scene (Dolphin, PCSX2 etc) is mature. No need for Wine, Proton blah blah. It just ... runs.

Playnite lets you "hide" games you don't want the kiddies to run. Once you're done with it, you can exit and return to desktop - you have normal PC (though if you do the shell replacement I mentioned, you will have to exit, CTRL-ALT-DEL to get task manager, then run explorer.exe. I only set Playnite as default shell because I wanted ZERO flashes or indication this was a normal windows PC on boot; if a small 2-3 second desktop flash doesn't annoy you, then just set Playnite to launch at start, black screen desktop and go from there. It's much easier for something that is multi-use). Also, because it's just a front end, you should in theory just be able to make a shortcut to "Jellyfin.exe" and launch it as needed from Playnite (haven't explored that myself tho).

Win 8.1 (with Classic Shell) launches fast, is lightweight, and doesn't need hacking to get around log in permissions and shell replacement the way Win 10 and later might. You wouldn't want to leave it hooked up to the net unsupervised, but on a HTPC being treated mostly as an offline appliance, the so-called security trade-offs are worth it to me (plus, I have firewall and other isolation in place).

PS: Controller-wise: Xbox 360 wireless + dongle for me. 1 $30 dongle can host up to 4 controllers and I already had to controllers :)

PPS: Can I be honest with you? After all this - the kids decided they just prefer the Wii. I had to laugh. Fine...we'll use the Wii (even though I replicated everything on the M93p - INCLUDING upscale, making wii controllers etc work in Dolphin, bought a Dolphin bar etc. I even put the fucking wii music as the background in Playnite!). So much work ... ignored LOL. Eh, I learned a lot doing it :)

PPPS: We have a Google chrome cast with TV dongle attached to the TV, so it can stream Jellyfin from the media server just fine. I really recommend those things (not the new one, the old hockey puck style one) or the off-label one you can get now (ONN I think?). Actually, come to think of it, I'm pretty sure Wii can stream JellyFin now in glorious 480p too lol

8

Token Speed visualiser (mikeveerman.github.io)

submitted 1 day ago* (last edited 1 day ago) by SuspiciousCarrot78@aussie.zone to c/localllama@sh.itjust.works

0 comments fedilink

https://mikeveerman.github.io/tokenspeed/?rate=20&mode=agent&think=15

Exactly what it says on the tin :)

Pretty good simulator this. May it cause you to reconsider your expensive GPU upgrade :)

Jellyfin / Paperless-ngx on Raspberry Pi 4? by SuspiciousCarrot78 in c/selfhosted@lemmy.world

[-] SuspiciousCarrot78@aussie.zone 2 points 2 days ago* (last edited 2 days ago)

Yeah, transcoding entirely off - directly stream stored 720/1080p files (downloaded like that, although I did use handbrake on the pi once to transcode Space 1999 season 1. Took about 2 days I think).

Someone else was just talking about Wyse thin clients. I'm fairly sure that a $40 Wyse thin client out performs even the best Pi 4 (maybe 5 sometimes). If I can't find a way to fix mine, I may have to buy a few for uh...science. IIRC, they idle at about the same as the Pi

How/what to start self-hosting? by SuspiciousCarrot78 in c/selfhosted@lemmy.world

[-] SuspiciousCarrot78@aussie.zone 1 points 2 days ago* (last edited 2 days ago)

Oh man I love those wyze thin clients. They can't go for much more that $40 these days.

I hope people keep sleeping on em - I could use a Raspberry Pi replacement or two

<8B multilingual models for language learning chatbots by SuspiciousCarrot78 in c/localllama@sh.itjust.works

[-] SuspiciousCarrot78@aussie.zone 2 points 2 days ago* (last edited 2 days ago)

I actually (just last night) abliterated a Qwen3.5-2B for this sort of purpose (well, more specifically, to fit neatly into a socket for a project). It's fast and light, cooked for edge devices, and should have inherited all of base Qwen's tricks (~200 languages, vision etc) polaris-heretic-Q4_K_M-GGUF

Try it and see if it works? I inadvertently made it really fucking love dotpoints (GPT-OSS 20B disease) so am trying to unfuck it right now.

Else - I can recommend something like Granite-4H or the old Qwen3-4B 2507 instruct

granite-4.1-3b-heretic.i1-Q4_K_M

Qwen3-4B 2507 instruct