96
you are viewing a single comment's thread
view the rest of the comments
[-] kogasa@programming.dev 10 points 1 month ago* (last edited 1 month ago)

I mean the latter statement is not true at all. I'm not sure why you think this. A basic GPT model reads a sequence of tokens and predicts the next one. Any sequence of tokens is possible, and each digit 0-9 is likely its own token, as is the case in the GPT2 tokenizer.

An LLM can't generate random numbers in the sense of a proper PRNG simulating draws from a uniform distribution, the output will probably have some kind of statistical bias. But it doesn't have to produce sequences contained in the training data.

[-] couldbealeotard@lemmy.dbzer0.com 1 points 1 month ago

It could have the whole phone number as a string

[-] kogasa@programming.dev 2 points 1 month ago

A full phone number could be in the tokenizer vocabulary, but any given one probably isn't in there

this post was submitted on 18 Jun 2025
96 points (100.0% liked)

Privacy

3356 readers
46 users here now

Welcome! This is a community for all those who are interested in protecting their privacy.

Rules

PS: Don't be a smartass and try to game the system, we'll know if you're breaking the rules when we see it!

  1. Be civil and no prejudice
  2. Don't promote big-tech software
  3. No apathy and defeatism for privacy (i.e. "They already have my data, why bother?")
  4. No reposting of news that was already posted
  5. No crypto, blockchain, NFTs
  6. No Xitter links (if absolutely necessary, use xcancel)

Related communities:

Some of these are only vaguely related, but great communities.

founded 8 months ago
MODERATORS