24
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 20 Jan 2025
24 points (100.0% liked)
TechTakes
1645 readers
64 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
when digging around I happened to find this thread which has some benchmarks for a diff model
it's apples to square fenceposts, of course, since one llm is not another. but it gives something to presume from. if g4dn.2xl gave them 214 tok/s, and if we make the extremely generous presumption that tok==word (which, well, no; cf.
strawberry
), then any Use Deserving Of o3 (let's say 5~15k words) would mean you need a tok-rate of 1000~3000 tok/s for a "reasonable" response latency ("5-ish seconds")so you'd need something like 5x g4dn.2xl just to shit out 5000 words with dolphin-llama3 in "quick" time. which, again, isn't even whatever the fuck people are doing with openai's garbage.
utter, complete, comprehensive clownery. era-redefining clownery.
but some dumb motherfucker in a bar will keep telling me it's the future. and I get to not boop 'em on the nose. le sigh.