66
submitted 2 years ago by serenitynot@beehaw.org to c/news@beehaw.org

It seems pretty clear that all of Huffman’s recent decisions are driven by Reddit’s hoped-for IPO. On one front is the ugly fact that Reddit’s valuation is sinking.

you are viewing a single comment's thread
view the rest of the comments
[-] Oinks@feddit.de 5 points 2 years ago* (last edited 2 years ago)

I am wondering if Reddit's content (and content on other social media sites too) is actually as valuable as everyone seems to think. OpenAI, Microsoft and Google already have enormous amounts of text data to train their LLMs and I would honestly be surprised if the barrier to making these LLMs better is needing even more data. So who is actually going to pay for this? AI startups don't have the money, maybe if Amazon wants to get into the business?

[-] blabboy@lemmy.ml 8 points 2 years ago

I work with LLMs, and yes the barrier currently is needing more data. These models get better when they are larger and trained on more data, so you really need all the data you can get your hands on.

[-] Derproid@beehaw.org 7 points 2 years ago

The recent document leaked by Google kinda claims otherwise though. It's less about quantity now and more about quality.

https://www.semianalysis.com/p/google-we-have-no-moat-and-neither

[-] adespoton@lemmy.ca 3 points 2 years ago

That’s actually where Reddit is useful as a training corpus, because different subreddits are at different levels of quality. It’s pretty easy to identify the high quality ones for training answers, and the low quality ones are excellent for training basic transforms (making sense out of an input that is niche and flawed in some way).

There are very few other sources of lightly structured training data that span all of humanity broken down into topics, graded to different levels of quality. Over time, the data will become less relevant as society moves on, so a living training set is important.

Having said that, Lemmy could prove to be an even better training source for expert system LLMs, as there could be curated instances of high quality with the ability to pull in more federated data as needed.

this post was submitted on 11 Jun 2023
66 points (100.0% liked)

World News

23414 readers
16 users here now

Breaking news from around the world.

News that is American but has an international facet may also be posted here.


Guidelines for submissions:

These guidelines will be enforced on a know-it-when-I-see-it basis.


For US News, see the US News community.


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago
MODERATORS