96
all 9 comments
sorted by: hot top controversial new old
[-] ViatorOmnium@piefed.social 14 points 1 month ago

a WhatsApp user and whose number appears on his company website.

That's where the LLM found the number. It being a valid WhatsApp number is a coincidence.

[-] noxypaws@pawb.social 6 points 1 month ago

I hate Facebook with a furious seething passion and have no vested interest in defending any of their shit, but this truly sounds like it made up a random number, with it being a total coincidence that the actual number belonged to another whatsapp user.

[-] jerakor@startrek.website 7 points 1 month ago

An LLM cannot generate random numbers. It has to pull from a list of numbers its model was built to include.

[-] kogasa@programming.dev 10 points 1 month ago* (last edited 1 month ago)

I mean the latter statement is not true at all. I'm not sure why you think this. A basic GPT model reads a sequence of tokens and predicts the next one. Any sequence of tokens is possible, and each digit 0-9 is likely its own token, as is the case in the GPT2 tokenizer.

An LLM can't generate random numbers in the sense of a proper PRNG simulating draws from a uniform distribution, the output will probably have some kind of statistical bias. But it doesn't have to produce sequences contained in the training data.

[-] couldbealeotard@lemmy.dbzer0.com 1 points 1 month ago

It could have the whole phone number as a string

[-] kogasa@programming.dev 2 points 1 month ago

A full phone number could be in the tokenizer vocabulary, but any given one probably isn't in there

[-] ViatorOmnium@piefed.social 3 points 1 month ago

a WhatsApp user and whose number appears on his company website.

Either the number was in the training set, or it came up on the Web search somehow.

[-] outhouseperilous@lemmy.dbzer0.com 2 points 1 month ago

Shhhhh if someone doesn't know at thisbpoint, they can't be told.

this post was submitted on 18 Jun 2025
96 points (100.0% liked)

Privacy

3295 readers
165 users here now

Welcome! This is a community for all those who are interested in protecting their privacy.

Rules

PS: Don't be a smartass and try to game the system, we'll know if you're breaking the rules when we see it!

  1. Be civil and no prejudice
  2. Don't promote big-tech software
  3. No apathy and defeatism for privacy (i.e. "They already have my data, why bother?")
  4. No reposting of news that was already posted
  5. No crypto, blockchain, NFTs
  6. No Xitter links (if absolutely necessary, use xcancel)

Related communities:

Some of these are only vaguely related, but great communities.

founded 8 months ago
MODERATORS