13
LLM with Web Search functionality
(lemmy.ml)
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
To avoid context switching on the GPU. OpenWebUi for example uses it for memory and title generation.
Those are not performance critical and background tasks, so instead of slowing down qwen, we just outsource this stuff to the NPU.
Edit: see here for more details
Oh I see. Okay this makes sense. I just throw Qwen 3.6 35B Q8 on 2 GPUs and use it for everything but coding agent.