Hell yeah. Love HomeAssistant even without the LLM. Local control forever.
This was the greatest thing for my home.
Privacy concerns abound with google/amazon/apple obviously but the bigger thing from a day to day perspective was the lack of control
I used siri for awhile and it was frustrating because it lagged behind tremendously. I tried alexa and google at friends and families houses and while they were notably better they were still awful. Ads were obviously the worst part, with Alexa especially there was so much fluff to try and sell shit. A voice assistant should be as unobtrusive as possible and the second it says extra bullshit it’s doing a shitty job. Poor query handling happened with all three to varying degrees but the worse issue was not having queries locally processed.
This meant 3 major deal breaking issues: the assistants needed a constant internet connection (so outages meant your “smart” home was now objectively worse than before), the internet connection was a key factor so if there was bandwidth saturation, dns issues, etc latency would notably increase or timeout query processing, and because the processing is server sided and not controlled by you query response can suddenly change: where a macro would work one day and stop the next or asking a question might work one day and then not next week.
With whisper and qwen I am entirely local and have control of the entire process. If a query is processed poorly or not at all I can create intent scripts to shift the behavior as I please. I can have the entire smart home cordoned into an isolated vlan or separate switch with only intranet access so the possibly of data collection is impossible and external intrusion is extremely unlikely. I can bridge key components around this with something like headscale so I can still check my cameras or whatever remotely.
It’s not all roses: the latency can still occur depending on demand and hardware. If you’re like me and have 100+ entities that the llm can interact with things can slow down a bit. VRAM matters more than gpu in my experience. The speech transcription is fairly lightweight and can occur in sub 1 second timing but the query processing can take a few moments if hardware isn’t beefy enough. This also means a server that is more of a power hog, sometimes significantly so (relative to something like an rpi4).
Biggest hurdle is that because it is so controllable, so open, and relatively new setup can be a pain. There are several guides on github and the HA forums that are very helpful, but this is definitely in the space of “tinkerer who is comfortable with yaml and docker”. Additionally the other parts of the hardware can be a pain. Where in other ecosystems you can just spend $80-100 on an echo or homepod or whatever here you have to figure out a speaker and microphone. This can be very cheap (literally an esp32 with a cheap mic and speaker attached) but this typically doesn’t give you the “smart speaker” functionality which can be nice to have. If you like the idea of music wherever in your house those devices have really improved the concept of “multi room audio”, which can be an absurdly expensive nightmare otherwise.
The other pain point is that if you truly want local only for iot devices you either have to accept that there will be some unknowns (mainly new and time sensitive data, eg “what was the score of the game”, stock prices, etc) or use another system as a bridge to run something like searxng then set it up so that the llm invokes this bridge upon unknowns. This adds several seconds of latency but honestly 95% of queries exist within the LLMs model so asking for general info is usually entirely local (eg “who wrote flowers for algernon”)
I know there can be a lot of hate for smarthome stuff, which is often justified, but if you set it up correctly there’s no internet access and it just kind of works. With presence sensors I don’t even do much voice command of devices. Lighting and hvac are all automated. The voice stuff is for incidentals: playing music, searching for info, setting timers, etc. it’s handy most of the time and it’s especially helpful when my hands are full, when my elderly family members visit, when I’m working on something and can’t move away to adjust lighting, etc. but it is frivolous for sure
That was an interesting read. I've been against personal assistants/smarthome/etc., but your comment might have brought me around. Thank you for sharing your setup.
Integrate a rear facing camera. Might help you to look back /s
I am most interested in how to set up and integrate the local LLM for this, particularly the hardware requirements. Sadly, the article doesn't have any details on that.
https://community.home-assistant.io/t/local-llm-for-dummies/769407 For the llm part
https://www.home-assistant.io/voice_control/voice_remote_local_assistant/ For the speech transcription part
VRAM is more important. If the model can fit in vram a slower gpu is workable. Gpu determines speed of token generation but if model is too big for vram and is partially offloaded to ram/cpu performance drops considerably. A 13 billion parameter llm will run significantly better on a slower gpu with 12gb vram than a faster gpu with 6gb. 7B models are pretty capable (6-8gb vram) but going to 13b (8-10gb) or 32b (20+ gb) are each a notably better improvement in capability though 32b is largely impractical for this use case (unless you don’t mind dropping $1300-$2500 on a 24gb 3090 or 4090 and paying to power it)
Thanks. I saved your comment to look into this. Running a local Alexa is a dream of mine.
Local LLM hardware requirements highly depends on the model. You should try out Ollama and see which models work well on whatever hardware you're testing it on.
You will want to look up what quantized models are too.
Don't use Ollama. If you're on Windows, better try LM Studio and look into HuggingFace models rather than relying on the Ollama repository. Much better.
Why is it better?
Dubious open-source practices on the part of Ollama devs. Other than that, LM Studio is using the latest stable llama.cpp rather than the one developed by Ollama, which brings significant speed improvements. You also have a better understanding of what model you're deploying by not using Ollama, and instead looking into the HF repository. For example, Ollama states that they're serving DeepSeek-R1, but pulling this one gives you a distilled 8B billion version that is not actually the DeepSeek-R1 (671B parameters) that one would have expected.
I get it that it might make it easier to use, but you will not learn much by using it. Even worse, competition is even better with performance and similar out-of-the-box capabilities.
Smart Homes
For the discussion of smart homes, home automation and the like. Because of the instance it will tend to have a more UK flavour but everyone is welcome.
Elsewhere in the Fediverse:
- !homeautomation@lemmy.world
- !homeassistant@lemmy.world
- !homeassistant@lemmy.ml
- !homeassistant@lemux.minnix.dev
- !automation@weloveautomation.xyz
- !raspberrypi@lemmy.ml
- !internetofshit@suppo.fi
Rules:
- Be excellent to each other
NB: looking for moderators.