That is like 10 Years old. (I did not look it up) Can we not stomp on tiny mistakes from 10 years ago?
I'm sorry, but I disagree - sometimes things just qualify as classics, and also serve to act as warnings to future generations.
"Can't we just ignore history?"
But this was arbitrary. It's not like "why are there only 16 colors on this video game" (because of space constraints). They could have made it 257 users and nothing would overflow. Given that, I think they should have made a human-comfortable number (multiple of 10) instead of a machine-comfortable number (power of 2).
It’s only arbitrary if you ignore the history of computing and the eventual settling on a standard of 8-bit bytes as the smallest addressable value in most programming languages and operating system libraries (though not always addressable in hardware).
Unless you’re making the very meta claim that it was arbitrary for us to settle on 8 bits instead of 10 or something. I think there are a lot of technical merits to 8 bit bytes (being a power of 2 is nice and 4 bits is just too small).
Yes, but this is not a historical piece of code, it is a 21 century app. I very much doubt they are using a uint8 to represent the array size, it's probably a 64 bit int. They might as well have used 300 or 250, or 1000.
WhatsApp’s back-end is written in Erlang. Erlang is a very old language with weird limitations. For one thing, it doesn’t have different machine-sized (16, 32, 64 bit) integers the way C does. Arbitrary-precision integers are the only primitive integer type. This makes it quite a slow type to use for something like a group chat member ID.
However Erlang also has a type called a binary which is used for space-efficient storage of binary data (along with primitive operations on bits). These types are stored as sequences of bytes. I’m guessing this is how WhatsApp does group chat IDs, which would make the 256 user limit perfectly understandable (keep every ID contained within a byte).
I don't think every user would have an ID in the chat of 1 byte, that would be a nightmare when leaving and joining the group, reusing IDs, etc... each user needs to be identified with its uuid (or whatever else they chose).
Using a 32 at 64 bit size and limiting the value makes much more sense, any subsequent changes would be a config tweak instead of a major refactor. I would guess the limit was a fun "Easter egg" type of thing rathar than a hard technical limit.
WhatsApp has billions of users. Scaling to that level and maintaining perfect real-time chatting with arbitrary user-created groups is not trivial. Storing 64 bit UUIDs for every single message and other interaction in a group chat would be inefficient, not to mention unidiomatic in Erlang (due to previously-mentioned lack of machine-sized integers).
The use-case of a group having <256 current users but >256 historical users and the desire to scroll back and read very old messages of people who left the group is very uncommon. It makes perfect sense to put a situation like that on a slow path while optimizing for the common case of <256 chatting right now.
I disagree for various reasons:
It's not very uncommon, it would be an issue as soon as it happens, without going back that far. Even if it was uncommon, it is possible and something to take care of, making for a super ugly "special case" code.
Plus you don't need to sort the user's ids to deliver messages, it's a foreach kind of operation.
And finally, given the underlying hardware, sorting 8 bit integers wouldn't be faster than sorting 64 bit ones (which we don't need to do, anyway), processors move all bits in parallel. Unless WhatsApp runs on 8 bit microcontrollers.
I didn’t say sorting, I said “storting” and must have corrected the typo while you were writing your reply. I meant storing. Having a 64-bit UUID attached to every single one of trillions of messages (per day) is a huge amount of wasted space (72TB per trillion messages, just to store 64-bit UUIDs without any message contents).
As an annoying aside, my phone now thinks “storting” is a word and helpfully autocorrects storing to that now. Good grief!
I nominate storting to mean storing and sorting at the same time. Like in a binary heap, binary tree, sorted array, etc. It's a common thing and similar to other words like "upsert".
I don't see how a message uuid is related to the group membership storage...
I haven't seen the code of WhatsApp, obviously, but I use a similar question to interview candidates. There's a few ways of implementing groups, and you have to store group membership somehow, but just once per group.
When a message is sent, it can be stored with a foreign key that relates it to the group, a message ID that should be unique for whatever DB is in used, plus a timestamp. When checking new messages, a client provides the timestamp of the last retrieved message and the server provides all messages since then (per group). Even read confirmations can be implemented using timestamps. There's no need of storing all group members for every message (not that you claimed it is, just making sure).
Sounds like you’re not storing who sent each message to the group, so how is anyone supposed to follow a group conversation between multiple participants if all you’re storing is the group ID and a time stamp?
Oh, you mean the sender id. I would definitely store the uuid, but I understand the tradeoff storing something smaller. However due to code complexity of reusing ids and small relative savings (even less with compression) I would definitely prefer the uuid solution.
This is how I would do it (and I think how it’s done but can’t confirm):
There’s really no complexity at all because you can just store a table of group members with 256 entries and send the index into that table with each message to each user. The users have a copy of the table on their client and when they receive the message the client looks it up in the table and stores it in the local message history.
You would not store message history on the server. Only messages which have not been delivered to all group members would be stored on the server. When people leave/join the group, you send group membership notices to all members and their clients update their tables accordingly.
Since you don’t store message histories on the server, new people who join the group can’t see messages that were sent before they joined. This eliminates the need to send UUIDs with every message and furthermore it eliminates the need to send large message histories all at once when someone joins a group. Since clients store their own histories with UUIDs attached to messages (not table indices) there is no issue with table index reuse.
Disclaimer: I don't use WhatsApp, mostly slack at work and signal personally.
Then the tradeoff is that you can't rely on the server for replay. What if you have two clients, desktop and mobile for instance? A message is delivered to the desktop while the phone is offline, I shut down the computer, turn on the phone and I won't see the message on there. All to save 7 bytes on a message of potentially hundreds? Weird tradeoff. Even less than 7, given compression.
I think it’s helpful to bring up a bit about WhatsApp’s history.
WhatsApp was developed in 2009 (for the iPhone) to provide status notifications (Away, Busy, At Work, etc) back when SMS was the only way to message people on phones and SMS did not have such statuses. It soon morphed into a drop-in replacement for SMS messaging which helped it take off in many countries around the world where SMS delivery fees were extremely expensive but small (<1 GB) data plans were cheap (or relatively cheap, on a per-byte basis).
For most of that early history and rapid growth there was no desktop app, only the phone app. You didn’t need to create an account either: your phone number was your account. This model meant that you didn’t want someone else receiving all your messages just because they inherited your phone number, so server-side history was a non-starter to begin with. I think at some point they added the ability to backup your chat history from your phone to a cloud account such as iCloud or Google Drive.
When they launched the desktop version of WhatsApp they tied it to your phone. You had to use your phone to sign in and if your desktop lost connection to the server it could not reconnect by itself.
Anyway, if you think about users in countries like India or Brazil where SMS messages were either unavailable or cost a fortune and data plans were expensive (but still much cheaper than SMS per-byte) then it makes total sense to save as many bytes as possible over the wire. Also consider that WhatsApp’s killer feature, its group chats, are a perfect match for larger families to keep in touch. However, I think even the largest families have the need for fewer than 256 people in one group chat.
The situation for chat history may have changed more recently under the stewardship of Meta / Facebook. I think they have begun to target Slack by marketing WhatsApp Business.
They could have made it 257 users and nothing would overflow
It might if the people writing the software are extremely old school about their approach to memory management
Dev here. Just because CPUs don't directly use 8 bit numbers anymore doesn't magically mean 257 wouldn't overflow. If you're storing the 8 bits in part of something else that's 32 or 64 bits (or whatever), like maybe the ID of the chat, then you only have 8 bits. A lot of time this comes down to making compact data representations of things to make uploads/downloads quicker. JSON is the most popular data format to transfer data in (probably), but other more compact binary formats like Avro, Protobuf, and even application specific custom formats exist.
FF in the chat
I see what you did
0x0BA5ED
0o377
Even if the number was chosen completely arbitrarily, why would it warrent a "yikes"?
Pretty sure the "Yikes" was because the number was obviously not arbitrary and the tech reporter didn't know that.
If it were truly an arbitrary number it likely wouldn't warrant a "yikes."
I get the impression this is the non tech crowd commenting here.
...what? Lol
Say the chat size increased to 317. Why would the tech writer say "yikes"? Just because it's not divisible by 5 or 10?
The tech writer didn't say yikes. The first person to post it did, then someone else reposted while keeping the original poster's reaction.
Ah yeah I see that now. Still a bizarre reaction from the randim tumblr user, but that's just typical tumblr stuff.
I had the same reaction, because someone who doesn't understand the significance of the number 256 isn't qualified to be a tech journalist.
2^8 = 256
Computers operate with base 2 calculations making 256 as "normal" a number to computers and those who work closely with them as 100 is to most humans.
256 is not arbitrary. The author thought it was arbitrary. The commenter said "Yikes" in response to the author not knowing the thing in the field that they report in was actually completely planned and not remotely arbitrary.
If they had increased the chat size to 317, being neither a rounded number in the base 10 or base 2 system and having no significant meaning in general communication it could safely be classified as "arbitrary" meaning the original headline would be appropriate and the commenter likely wouldn't have said "Yikes."
The tech writer did not say "Yikes."
256 = 200 + 56, initially they only wanted 200 people in a chatroom but decided 56 more was even better, so it's very oddly specific indeed.
It's like the tech writer that didn't know what the shift key did.
I wonder why its so hard for journalism institutions to find someone with an appropriate background to cover certain high-volume beats.
It's particularly egregious with military and science stuff, where it becomes painfully obvious that whoever is doing the reporting just has no clue how any of the stuff they're reporting on actually works. Seems to be it'd be a worthwhile investment to hire someone with at least some sort of science degree or military training to cover beats that are that high volume.
It's not like they go with completely ignorant randos to do their sports reporting, the sports reporters usually know some stuff about the rules of the sport and how its played.
Because they won't pay what someone actually qualified for the task would require. They still get their clicks because people want to stay informed, and yet also do so for cheap. Late stage capitalism has everybody attempting to wring value out of every last penny in order just to keep afloat in a world where the absurdly wealthy see average people as just pawns in a game the one-percenters are driven to "win" no matter what.
You say military because that is what you know about.
Murdered by Words
Responses that completely destroy the original argument in a way that leaves little to no room for reply - a targeted, well-placed response to another person, organization, or group of people.
The following things are not grounds for murder:
- Personal appearance ("You're fat", "You're ugly")
- Posts with little-to-no context
- Posts based on a grammar/spelling error
- Dick jokes, "Yo mama", "No, you" type responses and other low effort insults
- "Your values are bad" without any logcal or factual ways of showing that they are wrong ("I believe in capitalism" - "Well, then you must be evil" or "Fuck you you ignorant asshole")
Rules:
- Be civil and remember the human. No name calling or insults. Swearing in general is fine, but not to insult someone else.
- Discussion is encouraged but arguments are not. Don’t be aggressive and don’t argue for arguments sake.
- No bigotry of any kind.
- Censor the person info of anyone not in the public eye.
- If you break the rules you’ll get one warning before you’re banned.
- Enjoy the community in the light hearted way it’s intended.