275

submitted 4 days ago* (last edited 3 days ago) by spujb@lemmy.cafe to c/onehundredninetysix

44 comments fedilink hide all child comments

edit to clarify a misconception in the comments, this is an instagram post so “caption” refers to the description under the image or video

as an example, this text i am typing now is also a “caption”

just saying because someone started a debate misunderstanding this to be about subtitles (aka “closed captions”) and that’s just not the case 👍

you are viewing a single comment's thread
view the rest of the comments

[-] thejoker954@lemmy.world 19 points 4 days ago

Subtitles is a perfect use case for LLMs.

[-] JustJack23@slrpnk.net 112 points 4 days ago

No, what you are thinking of is speech to text software, it is much older than LLMs and works in a very different way.

[-] unexposedhazard@discuss.tchncs.de 65 points 4 days ago

Yeah speech to text models have nothing to do with LLMs and their use for captioning is perfectly fine imo

[-] oplkill@lemmy.world 3 points 4 days ago

Nope, they still not good. I using YouTube auto gen subs and they 100% need LLM to fix mistakes.

[-] AnarchoEngineer@lemmy.dbzer0.com 41 points 4 days ago

Large language models are designed to generate text based on previous text. Translation from audio to text can be done via a neural net but it isn’t a Large Language Model.

Now, could you combine the two to say reduce error on words that were mumbled by having a generative model predict the words that would fit better in that unclear sentence. However you could likely get away with a much smaller and faster net than an LLM in fact you might be able to get away with using plain-Jane markov chains, no machine learning necessary.

Point is that there is a difference between LLMs and other neural nets that produce text.

In the case of audio to text translation, using an LLM would be very inefficient and slow (possibly to the point it isn’t able to keep up with the audio at all), and using a very basic text generation net or even just a probabilistic algorithm would likely do the job just fine.

[-] Ziglin@lemmy.world 17 points 4 days ago

How would an llm fix a mistake equivalent to something being misheard? I feel like you're misunderstanding something and could probably also use some help with your English.

[-] BadJojo 2 points 3 days ago

Be nice (Rule 2).

[-] Ziglin@lemmy.world 2 points 3 days ago* (last edited 3 days ago)

Yeah, fair enough. I really did a bad job pointing that out politely.

In hindsight, trying to fix it I think I was trying to connect two thoughts I had about the other comment in a way that was not discernible in any way by anyone other than me.

[-] princessnorah 4 points 3 days ago

[...]could probably also use some help with your English.

what the actual fluff is up with lemmy.world accounts in this thread acting like jerks?

[-] 0x2640 4 points 3 days ago

lemmy.world accounts acting like jerks

many such cases

[-] thejoker954@lemmy.world 12 points 4 days ago

While speech to text software indeed predates LLMs - LLMs do it as well. I've only tried a few basic (aka free) options so no idea how well they do en masse, but the generated results were at least on par if not better than YouTubes' auto caption.

It might not technically be LLMs though. It could be a different type of "ai". I Just cant stand the "ai" marketing when nothing they are making is actually ai so until they pull their heads out their asses all "ai" models are LLMs to me.

[-] JustJack23@slrpnk.net 9 points 4 days ago

Understandable, AI marketing now is a shitshot, but they are not even AI I think. Just people forget that tech used to do magic before AI existed.

[-] LwL@lemmy.world 5 points 3 days ago

It's kind of the other way around, we've always had AI, it used to just basically mean a computer making some decision based on data. Like a thermostat changing the heating in response to a temperature change.

Then we got LLMs and because they are good at pretending to have complex reasoning ability, AI as a term started to always mean "computer with near human level intelligence" which of course they are absolutely not.

[-] JustJack23@slrpnk.net 2 points 3 days ago

There was a book I can't remember, the whole thesis was exactly that. "AI is whatever automates the decision making process" not any group of algos

[-] ButteryMonkey@piefed.social 6 points 3 days ago

This is a big part of it. Back when ai was first becoming big, my manager said they needed to run all my kb articles through an ai to generate link clouds or some such.

I was like umm.. that’s a service this platform has always offered..? Like just because you don’t know what the kb tools do, or what our rock bottom subscription gets us, doesn’t mean I haven’t looked into it.. but that also isn’t worth doing because now we only have a handful of articles in any given category because I’m good at my job..

[-] RushLana 52 points 4 days ago

As someone who use a screen reader daily, absolutly the fuck not.

LLMs will invent things out of tin air and ruin any comprehesion. It waste my time rather than help me.

[-] thejoker954@lemmy.world 7 points 4 days ago

If you use any generic LLM then yes, but there are LLMs (like i said in another reply - its prrobably not a LLM - but as there is no 'real' ai that's what I'm calling all this ai bullshit) That are trained specifically for captioning/transcripts, just not necessarily done in real time.

Doing it "live" is what increases the error rate.

[-] leftytighty@slrpnk.net 20 points 4 days ago

LLMs are large language models, they're a specialized category of artificial neural network, which are a way of doing machine learning. All of those topics are under the academic computer science discipline of artificial intelligence.

AI, neural net, or ML model are all way more accurate to say than LLM in this case.

[-] thejoker954@lemmy.world 3 points 4 days ago

I have to disagree with you. Ai is never a more accurate way to describe what we have now. Not until they call true ai something different.

I know its a weird hill to die on, but die on it I will. Calling one artifical intelligence and one virtual intelligence could work.

Also it's my understanding that LLMs are considered a type of neural net so I don't see it being more accurate to call it a neural net vs a llm.

And they are all subsets of machine learning so calling it an ml model leads me back to the same issue I have with "ai". (And the same reason those loser usb fucks can suck a bag of dildos) lack of clairty of what it actually can do.

[-] princessnorah 5 points 3 days ago

Then call it ML or a neural net. Using the term LLM like you are for other forms of machine learning is just going to cause needless confusion, like it has in this thread.

[-] thejoker954@lemmy.world 1 points 3 days ago

No. "Machine learning" is the root of the tree.

Or to steal another commenters attempts to have me call it that - that would be like calling a chihuahu a wolf.

Machine learning -> neural net -> LLM. Thats the basic "path". I dont CARE if LLM is technically wrong when using machine learning or neural net is also inaccurate.

If anything yall should be arguing for me to call it ASR 2.0

[-] princessnorah 2 points 3 days ago

I just use ML to describe everything that isn't overhyped "AI" instead of making a big deal out of it but to each their own ig

[-] leftytighty@slrpnk.net 11 points 3 days ago* (last edited 3 days ago)

A dog is a kind of animal but that doesn't mean you can describe every animal as a dog.

The term for "true" AI is artificial general intelligence.

[-] MummysLittleBloodSlut 7 points 3 days ago

You need to spend less time watching movies and more time watching computer science lectures. We had AI back in the 1960s.

[-] RushLana 6 points 4 days ago

I will frame it another way. You cannot automate subtitles or caption. And I always find reviewing automated output is harder than doing it yourself.

[-] spujb@lemmy.cafe 21 points 3 days ago* (last edited 3 days ago)

to clarify we are talking about a post caption, not closed captions.

that is, the text you put in the description of an image or video post.

[-] thejoker954@lemmy.world 1 points 3 days ago

Thanks for the clarification. Lol kinda feeds into my whole let's call things more accurately debate.

[-] princessnorah 9 points 3 days ago

Automatic subtitles like on YouTube use Machine Learning, NOT a Large Language Model.

[-] thejoker954@lemmy.world 1 points 3 days ago

I used youtube only as a basic comparison as thats the one everybody has some experience with.

[-] forkDestroyer@infosec.pub 19 points 3 days ago

Crunchyroll really messed up their subs with AI. Not sure if they mean LLMs and are just calling it AI but still:

https://www.animenewsnetwork.com/news/2024-02-27/crunchyroll-confirms-testing-a.i-for-subtitling/.208086

Kept wondering why subtitles were so obviously off when I was watching some stuff. It was horrid.

[-] cupcakezealot@piefed.blahaj.zone 17 points 4 days ago* (last edited 4 days ago)

subtitles have a hard enough time getting the words right without llms.

[-] Natanox@discuss.tchncs.de 9 points 3 days ago

Fuck no.

[-] vzqq 9 points 4 days ago* (last edited 4 days ago)

Yes and no. There are specialized models that perform better than general purpose LLM with vastly lower resource use. But… the output part is essentially a language model too, so it’s prone to a lot of the same issues.

They perform A LOT better than traditional models though. So much better it’s not even funny.

this post was submitted on 24 Aug 2025

275 points (100.0% liked)

196

4274 readers

1676 users here now

Community Rules

You must post before you leave

Be nice. Assume others have good intent (within reason).

Block or ignore posts, comments, and users that irritate you in some way rather than engaging. Report if they are actually breaking community rules.

Use content warnings and/or mark as NSFW when appropriate. Most posts with content warnings likely need to be marked NSFW.

Most 196 posts are memes, shitposts, cute images, or even just recent things that happened, etc. There is no real theme, but try to avoid posts that are very inflammatory, offensive, very low quality, or very "off topic".

Bigotry is not allowed, this includes (but is not limited to): Homophobia, Transphobia, Racism, Sexism, Abelism, Classism, or discrimination based on things like Ethnicity, Nationality, Language, or Religion.

Avoid shilling for corporations, posting advertisements, or promoting exploitation of workers.

Proselytization, support, or defense of authoritarianism is not welcome. This includes but is not limited to: imperialism, nationalism, genocide denial, ethnic or racial supremacy, fascism, Nazism, Marxism-Leninism, Maoism, etc.

Avoid AI generated content.

Avoid misinformation.

Avoid incomprehensible posts.

No threats or personal attacks.

No spam.

Moderator Guidelines

Moderator Guidelines

Don’t be mean to users. Be gentle or neutral.
Most moderator actions which have a modlog message should include your username.
When in doubt about whether or not a user is problematic, send them a DM.
Don’t waste time debating/arguing with problematic users.
Assume the best, but don’t tolerate sealioning/just asking questions/concern trolling.
Ask another mod to take over cases you struggle with, if you get tired, or when things get personal.
Ask the other mods for advice when things get complicated.
Share everything you do in the mod matrix, both so several mods aren't unknowingly handling the same issues, but also so you can receive feedback on what you intend to do.
Don't rush mod actions. If a case doesn't need to be handled right away, consider taking a short break before getting to it. This is to say, cool down and make room for feedback.
Don’t perform too much moderation in the comments, except if you want a verdict to be public or to ask people to dial a convo down/stop. Single comment warnings are okay.
Send users concise DMs about verdicts about them, such as bans etc, except in cases where it is clear we don’t want them at all, such as obvious transphobes. No need to notify someone they haven’t been banned of course.
Explain to a user why their behavior is problematic and how it is distressing others rather than engage with whatever they are saying. Ask them to avoid this in the future and send them packing if they do not comply.
First warn users, then temp ban them, then finally perma ban them when they break the rules or act inappropriately. Skip steps if necessary.
Use neutral statements like “this statement can be considered transphobic” rather than “you are being transphobic”.
No large decisions or actions without community input (polls or meta posts f.ex.).
Large internal decisions (such as ousting a mod) might require a vote, needing more than 50% of the votes to pass. Also consider asking the community for feedback.
Remember you are a voluntary moderator. You don’t get paid. Take a break when you need one. Perhaps ask another moderator to step in if necessary.

founded 7 months ago

MODERATORS

SoleInvictus

will_steal_your_username

TheCoolerMia

kittenzrulz123

rockSlayer@lemmy.world

WillStealYourUsername@piefed.blahaj.zone

kittenzrulz123@piefed.blahaj.zone