492
you are viewing a single comment's thread
view the rest of the comments
[-] Repelle@lemmy.world 5 points 5 months ago

I don’t think that’s the full explanation though, because there are examples of models that will correctly spell out the word first (ie, it knows the component letter tokens) and still miscount the letters after doing so.

[-] vivendi@programming.dev 2 points 5 months ago

No, this literally is the explanation. The model understands the concept of "Strawberry", It can output from the model (and that itself is very complicated) in English as Strawberry, jn Persian as توت فرنگی and so on.

But the model does not understand how many Rs exist in Strawberry or how many ت exist in توت فرنگی

[-] Repelle@lemmy.world 4 points 5 months ago* (last edited 5 months ago)

I’m talking about models printing out the component letters first not just printing out the full word. As in “S - T - R - A - W - B - E - R - R - Y” then getting the answer wrong. You’re absolutely right that it reads in words at a time encoded to vectors, but if it’s holding a relationship from that coding to the component spelling, which it seems it must be given it is outputting the letters individually, then something else is wrong. I’m not saying all models fail this way, and I’m sure many fail in exactly the way you describe, but I have seen this failure mode (which is what I was trying to describe) and in that case an alternate explanation would be necessary.

[-] vivendi@programming.dev 5 points 5 months ago* (last edited 5 months ago)

The model ISN'T outputing the letters individually, binary models (as I mentioned) do; not transformers.

The model output is more like Strawberry

Tokens can be a letter, part of a word, any single lexeme, any word, or even multiple words ("let be")

Okay I did a shit job demonstrating the time axis. The model doesn't know the underlying letters of the previous tokens and this processes is going forward in time

this post was submitted on 25 May 2025
492 points (100.0% liked)

Technology

76339 readers
1118 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS