submitted 1 year ago by SkySyrup@sh.itjust.works to c/localllama@sh.itjust.works

16 comments fedilink hide all child comments

Hi, you've found this ~~subreddit~~ Community, welcome!

This Community is intended to be a replacement for r/LocalLLaMA, because I think that we need to move beyond centralized Reddit in general (although obviously also the API thing).

I will moderate this Community for now, but if you want to help, you are very welcome, just contact me!

I will mirror or rewrite posts from r/LocalLLama for this Community for now, but maybe we could eventually all move to this Community (or any Community on Lemmy, seriously, I don't care about being mod or "owning" it).

you are viewing a single comment's thread
view the rest of the comments

[-] SkySyrup@sh.itjust.works 1 points 1 year ago* (last edited 1 year ago)

Hi, sure, thank you so much for helping out! As for LLaMA, I would point you at llama.cpp, (https://github.com/ggerganov/llama.cpp) which is the absolute bleeding edge, but also has pretty useful instructions on the page (https://github.com/ggerganov/llama.cpp#usage). You could also use Kobold.cpp, but I don't have any experience with it, so I can't help you if you have issues.

[-] gh0stcassette@lemmy.world 2 points 1 year ago

Adding to this: text-generation-webui (https://github.com/oobabooga/text-generation-webui) works with the latest bleeding edge llama.cpp via llama-cpp-python, and it has a nice graphical front-end. You do have a manually tell pip to install llama.cpp-python with the right compiler flags to get GPU acceleration working but the llama-cpp-python github and ooba github explain how to do this.

You can even set up GPU acceleration through metal on m1 Macs I've seen some fucking INSANE performance numbers online for the higher RAM MacBook pros (20+ tokens/sec, I think with a 33b model, but it might have been 13b, either way, impressive.)

[-] pax@sh.itjust.works 0 points 1 year ago

llama cpp is crashy on my computer, it even didn't compiled.

[-] SkySyrup@sh.itjust.works 0 points 1 year ago

Huh, that's interesting. If llama.cpp doesn't work, try https://github.com/oobabooga/text-generation-webui which (tries to) provides a user-friendly(-ier) experience.

[-] pax@sh.itjust.works 0 points 1 year ago

it launches just fine, but when loading a model it says something like: successfully loaded none

[-] SkySyrup@sh.itjust.works 0 points 1 year ago

Have you put your model in the "models" folder in the "text-generation-webui" folder? If you have, then navigate over to the "Model" section (button for the menu should be at the top of the page) and select your model using the box below the menu.

[-] pax@sh.itjust.works 0 points 1 year ago

I tried to download an example one, cus I don't have any model, failed.

[-] SkySyrup@sh.itjust.works 1 points 1 year ago

I'd recommend the model Wizard-Vicuna-7b-Uncensored (i know it's like a sentence https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML) direct download link is here: https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/blob/main/Wizard-Vicuna-7B-Uncensored.ggmlv3.q5_1.bin

this post was submitted on 08 Jun 2023

23 points (100.0% liked)

LocalLLaMA

2237 readers

5 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works