What AI services are you selfhosting? Or, have tested and passed on (discuss.james.network)

submitted 1 month ago by kiol@lemmy.world to c/selfhosted@lemmy.world

57 comments fedilink hide all child comments

Wondering about services to test on either a 16gb ram "AI Capable" arm64 board or on a laptop with modern rtx. Only looking for open source options, but curious to hear what people say. Cheers!

top 50 comments

sorted by: hot top controversial new old

[-] kata1yst@sh.itjust.works 22 points 1 month ago

I use OLlama & Open-WebUI, OLlama on my gaming rig and Open-WebUI as a frontend on my server.

It's been a really powerful combo!

[-] kiol@lemmy.world 6 points 1 month ago

Would you please talk more about it. I forgot about Open-webui, but intending to start playing with. Honestly, what do you actually do with it?

[-] Oisteink@feddit.nl 6 points 1 month ago* (last edited 1 month ago)

I have the same setup, but its not very usable as my graphics card has 6gb ram. I want one with 20 or 24, as the 6b models are pain and the tiny ones don’t give me much.

Ollama was pretty easy to set up on windows, and its eqsy to download and test the models ollama has available

[-] kiol@lemmy.world 2 points 1 month ago

Sounds like you and I are in a similar place of testing.

[-] Oisteink@feddit.nl 4 points 1 month ago* (last edited 1 month ago)

Possibly. Been running it since last summer, but like i say the small models dont do much good for me. I have tried llama3.1 olmo2, deepseek r1 in a few variants, qwen2. Qwen2.5 coder, mistral, codellama, starcoder2, nemotron-mini, llama3.2, qwen2.5-coder, gamma2 and llava.

I use perplexity and mistral as paid, with much better quality. Openwebui is great though, but my hardware is lacking

Edit: saw that my mate is still using it a bit so i’ll update openwebu frpm 0.4 to 0.5.20 for him. Hes a bit anxious about sending data to the cloud so he dont mind the quality

[-] Oisteink@feddit.nl 2 points 1 month ago

Scrap that - after upgrading it went bonkers and will always use one of my «knowledges» no matter what I try. The websearch fails even with ddg as engine. Its aways seemed like the ui was made by unskilled labour, but this is just horrible. 2/10 not recommended

[-] mac@lemm.ee 5 points 1 month ago* (last edited 1 month ago)

I have Linkwarden pointed at my ollama deployment, so it auto tags links that I archive which is nice.

I've seen other people send images captured on their security cameras on frigate to ollama to get it to describe the image

There's a bunch of other use cases I've thought of for coding projects, but haven't started on any of them yet

[-] afk_strats@lemmy.world 3 points 1 month ago

I have this exact same setup. Open Web UI has more features than I've been able to use such as functions and pipelines.

I use it to share my LLMs across my network. It has really good user management so I can set up a user for my wife or brother in law and give them general use LLM while my dad and I can take advantage of Coding-tuned models.

The code formatting and code execution functions are great. It's overall a great UI.

Ive used LLMs to rewrite code, help format PowerPoint slides, summarize my notes from work, create D&D characters, plan lessons, etc

[-] 30p87@feddit.org 3 points 1 month ago

Sex chats. For other uses, just simple searches are better 99% of the time. And for the 1%, something like the Kagis FastGPT helps to find the correct keywords.

[-] RonnyZittledong@lemmy.world 9 points 1 month ago

None currently. Wish I could afford a GPU to play with some stuff.

[-] kiol@lemmy.world 2 points 1 month ago

Well, let me know your suggestions if you wish. I took the plunge and am willing to test on your behalf, assuming I can.

[-] state_electrician@discuss.tchncs.de 1 points 1 month ago

Yeah. I have a mini PC with an AMD GPU. Even if I were to buy a big GPU I couldn't use it. That frustrates me, because I'd love to play around with some models locally. I refuse to use anything hosted by other people.

[-] moomoomoo309@programming.dev 1 points 1 month ago

Your M.2 port can probably fit an M.2 to PCIe adapter and you can use a GPU with that - ollama supports AMD GPUs just fine nowadays (well, as well as it can, rocm is still very hit or miss)

[-] state_electrician@discuss.tchncs.de 1 points 1 month ago

Oh, then I need to give it another try.

[-] moomoomoo309@programming.dev 1 points 1 month ago

Check rocm's supported cards, oh and after you install rocm, restart your computer - made that mistake when I was doing it and couldn't figure out why it wasn't working.

[-] L_Acacia@lemmy.ml 1 points 1 month ago

Most LLM projects support Vulkan if you have enough VRAM

[-] colourlesspony@pawb.social 7 points 1 month ago

I messed around with home assistant and the ollama integration. I have passed on it and just use the default one with voice commands I set up. I couldn't really get ollama to do or say anything useful. Like I asked it what's a good time to run on a treadmill for beginners and it told me it's not a doctor.

[-] metoosalem@feddit.org 11 points 1 month ago

Like I asked it what's a good time to run on a treadmill for beginners and it told me it's not a doctor.

Kirkland brand meseeks energy.

[-] psmgx@lemmy.world 3 points 1 month ago

Hey now Kirkland brand is respectable, usually premium brands repackaged. Such as how Costco vodka was secretly ("secretly") Grey Goose

[-] Starfighter@discuss.tchncs.de 5 points 1 month ago* (last edited 1 month ago)

There are some experimental models made specifically for use with Home Assistant, for example home-llm.

Even though they are tiny 1-3B I've found them to work much better than even 14B general purpose models. Obviously they suck for general purpose questions just by their size alone.

That being said they're still LLMs. I like to keep the "prefer handling commands locally" option turned on and only use the LLM as a fallback.

[-] kiol@lemmy.world 2 points 1 month ago

Haha, that is hilarious. Sounds like it gave you some snark. afaik you have to clarify by asking again when it says such things. "I'm not asking for medical advice, but..."

[-] Grandwolf319@sh.itjust.works 5 points 1 month ago

I have Immich that has AI searching for my photos. Pretty useful for finding stuff actually

[-] gdog05@lemmy.world 1 points 1 month ago

Once I changed the default model, immich search became amazing. I want to show it off to people but alas, way too many NSFW pics in my library. I would create a second "clean" version to show off to people but I've been too lazy.

[-] L_Acacia@lemmy.ml 1 points 1 month ago

Can't you tag the NSFW to filter it out?

[-] ikidd@lemmy.world 4 points 1 month ago

LMStudio is pretty much the standard. I think it's opensource except for the UI. Even if you don't end up using it long-term, it's great for getting used to a lot of the models.

Otherwise there's OpenWebUI that I would imagine would work as a docker compose, as I think there's ARM images for OWU and ollama

[-] L_Acacia@lemmy.ml 2 points 1 month ago

Well they are fully closed source except for the open source project they are a wrapper on. The open source part is llama.cpp

[-] ikidd@lemmy.world 1 points 1 month ago

Fair enough, but it's damn handy and simple to use. And I don't know how to do speculative decoding with ollama, which massively speeds up the models for me.

[-] L_Acacia@lemmy.ml 1 points 1 month ago

Their software is pretty nice. That's what I'd recommand to someone who doesn't want to tinker. It's just a shame they don't want to open source their software and we have to reinvent the wheel 10 times. If you are willing to tinker a bit koboldcpp + openewebui/librechat is a pretty nice combo.

[-] ikidd@lemmy.world 1 points 1 month ago

That koboldcpp is pretty interesting. Looks like I can load a draft model for spec decode as well as a pile of other things.

What local models have you been using for coding? I've been disappointed with things like deepseek-coder and the qwen-coder, it's not even a patch on Claude, but that damn cost for anthropic has been killing me.

load more comments (3 replies)

[-] truxnell@infosec.pub 4 points 1 month ago

I run ollama and auto1111 on my desktop when it's powers on. Using open-webui in my homelab always on, and also connected to openrouter. This way I can always use openwebui with openrouter models and it's pretty cheap per query and a little more private that using a big tech chatbot. And if I want local, I turn on the desktop and have local lamma and stab diff.

I also get bugger all benefit out of it., it's a cute toy.

load more comments (2 replies)

[-] Helmaar@lemmy.world 3 points 1 month ago

I was able to run a distilled version of DeepSeek on Linux. I ran it inside a PODMAN container with ROCM support (I have an AMD GPU). It wasn't super fast but for a locally deployed and self hosted option the performance was okay. Apart from that I have deployed Fooocus for image generation in a similar manner. Currently, I am working on deploying Stable Diffusion with either ComfyUI or Automatic1111 inside a PODMAN container with ROCM support.

[-] kiol@lemmy.world 2 points 1 month ago

Didn't know about these image generation tools, besides Stable Diffusion. Thanks!

[-] y0shi@lemm.ee 3 points 1 month ago

I’ve an old gaming PC with a decent GPU laying around and I’ve thought of doing that (currently use it for linux gaming and GPU related tasks like photo editing etc) However ,I’m currently stuck using LLMs on demand locally with ollama. Energy costs of having it powered on all time for on demand queries seems a bit overkill to me…

load more comments (3 replies)

[-] rikudou@lemmings.world 2 points 1 month ago

Try running an AI Horde worker, it's a really great service!

load more comments (2 replies)

[-] couch1potato@lemmy.dbzer0.com 2 points 1 month ago* (last edited 1 month ago)

I spun up ollama and paperless-gpt to add ai ocr sidecar to paperless-ngx. It's okay. It can read handwritten stuff okayish, which is better than tesseract (doesnt read hand writing at all), so I throw handwritten stuff to it, but the difference on typed text is marginal in my single day I spent testing 3 different models on a few different typed receipts.

[-] kiol@lemmy.world 1 points 1 month ago

Which specific models did you try and how would you rank each is usability?

[-] couch1potato@lemmy.dbzer0.com 2 points 1 month ago

I tried minicpm-v, granite3.2-vision, and mistral.

Granite didn't work with paperless-gpt at all. Mistral worked sometimes but also just kept running sometimes and didn't finish within a reasonable time (15 minutes for 2 pages). minicpm-v finishes every time, but i just looked at some of the results and seems as though it's not even worth keeping it running either. I suppose maybe the first one I tried that gave me a good impression was a fluke.

To be fair, I'm a noob at local ai, and I also don't have a good gpu (gtx1650). So these failures could all be self induced. I like the idea of ai powered ocr so I'll probably try again in the future...

[-] kiol@lemmy.world 1 points 1 month ago

I find your experiments inspired. Thank you! I'm learning about this myself on an rtx and excited to discuss on my little podcast.james.network one of these days. Been using paperless minus the AI functionality so far. About to start testing different AI services on an arm64 device with 16gb ram that claims some level of AI support; will see how that goes. Let me know if there are any other specific services/models you'd recommend or are curious about.

[-] couch1potato@lemmy.dbzer0.com 2 points 1 month ago

Sure, and let me know how it goes for you. I'm on a dell r720xd, about to upgrade my ram from 128 to 296 gb... don't want to spend the money for a new gpu right now.

I'll report back after I try again.

[-] MangoPenguin 2 points 1 month ago

If Immich counts for its search system, then there's that.

Otherwise I've tried some various things and found them lacking in functionality, and would require leaving my PC on all the time to use.

[-] kiol@lemmy.world 1 points 1 month ago

What else did you try and what was lacking?

[-] superglue@lemmy.dbzer0.com 1 points 1 month ago

Can anyone suggest a model for light coding? I'm on a 3070 mobile.

[-] L_Acacia@lemmy.ml 1 points 1 month ago

Qwen coder or the new gemma3.

But at this size using privacy respecting api might be both cheaper and lead to better results.

load more comments (1 replies)

load more comments

this post was submitted on 12 Mar 2025

62 points (100.0% liked)

Selfhosted

46639 readers

1394 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz