13
LLM with Web Search functionality
(lemmy.ml)
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Open WebUI + SearXNG + llama.cpp + Qwen 3.6 35B + 16-32GB GPU. Gives you 256K context and runs with 80-100tps on 3090. If you have less VRAM like 16GB it'll be slower but still probably tens of tps on anything recent. I run it on AMD Pro 9700 which is about as fast as 3090.