79
I replaced Google Home with Home Assistant and a local LLM, and I'm not looking back
(www.xda-developers.com)
For the discussion of smart homes, home automation and the like. Because of the instance it will tend to have a more UK flavour but everyone is welcome.
Elsewhere in the Fediverse:
Rules:
NB: looking for moderators.
I am most interested in how to set up and integrate the local LLM for this, particularly the hardware requirements. Sadly, the article doesn't have any details on that.
https://community.home-assistant.io/t/local-llm-for-dummies/769407 For the llm part
https://www.home-assistant.io/voice_control/voice_remote_local_assistant/ For the speech transcription part
VRAM is more important. If the model can fit in vram a slower gpu is workable. Gpu determines speed of token generation but if model is too big for vram and is partially offloaded to ram/cpu performance drops considerably. A 13 billion parameter llm will run significantly better on a slower gpu with 12gb vram than a faster gpu with 6gb. 7B models are pretty capable (6-8gb vram) but going to 13b (8-10gb) or 32b (20+ gb) are each a notably better improvement in capability though 32b is largely impractical for this use case (unless you don’t mind dropping $1300-$2500 on a 24gb 3090 or 4090 and paying to power it)
Thanks. I saved your comment to look into this. Running a local Alexa is a dream of mine.
Local LLM hardware requirements highly depends on the model. You should try out Ollama and see which models work well on whatever hardware you're testing it on.
You will want to look up what quantized models are too.
Don't use Ollama. If you're on Windows, better try LM Studio and look into HuggingFace models rather than relying on the Ollama repository. Much better.
Why is it better?
Dubious open-source practices on the part of Ollama devs. Other than that, LM Studio is using the latest stable llama.cpp rather than the one developed by Ollama, which brings significant speed improvements. You also have a better understanding of what model you're deploying by not using Ollama, and instead looking into the HF repository. For example, Ollama states that they're serving DeepSeek-R1, but pulling this one gives you a distilled 8B billion version that is not actually the DeepSeek-R1 (671B parameters) that one would have expected.
I get it that it might make it easier to use, but you will not learn much by using it. Even worse, competition is even better with performance and similar out-of-the-box capabilities.