13

Conducting deep web searches and gathering sources is one of the main things I've been using LLMs for. How far away are we from being able to self-host something like Claude's web search capabilities? Or even just a service where I'd pay with my money instead of my data?

you are viewing a single comment's thread
view the rest of the comments
[-] avidamoeba@lemmy.ca 1 points 1 day ago* (last edited 1 day ago)

Curious why do you swap between Qwen and E4B. On my hardware they perform with similar tps. Qwen 3.6 35B spits out 80-100tps on AMD 9700 and E4B gives me about the same tps.

[-] vapeloki@lemmy.world 2 points 1 day ago* (last edited 1 day ago)

To avoid context switching on the GPU. OpenWebUi for example uses it for memory and title generation.

Those are not performance critical and background tasks, so instead of slowing down qwen, we just outsource this stuff to the NPU.

Edit: see here for more details

[-] avidamoeba@lemmy.ca 1 points 1 day ago

Oh I see. Okay this makes sense. I just throw Qwen 3.6 35B Q8 on 2 GPUs and use it for everything but coding agent.

this post was submitted on 21 Jun 2026
13 points (100.0% liked)

Self Hosted - Self-hosting your services.

20016 readers
4 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules

Important

Cross-posting

If you see a rule-breaker please DM the mods!

founded 5 years ago
MODERATORS