Oof. I tried to tell a manager why a certain technical thing wouldn't work, and he pulled out his phone and started reading the Google AI summary "no, look, you just need to check the network driver and restart the router". It was two devices that were electrical not compatible, and there was no IP infrastructure involved.
I think the average consumers are easily hitting that with current models... in part because the collapse of search engine functionality and human computer skills leads to the tech being used for extremely basic/common requests that are common enough that the answer was trained on a thousand times over.
Like, it might get 80% of answers correct because 85% of questions it is asked nowadays could have just been answered by what was the top answer on google 6 years ago, and that's already in the training data. Think "why is the sky blue?"
It is only "super users" that routinely ask it for rare or complex information synthesis (y'know, the key selling point of an LLM as an info source over a search engine) that force it up against the wall of "make shit up" more than 20% of the time.
80% is generous. Half of that is the user simply not realizing that the information is wrong.
This becomes very obvious if you see anything generated for a field you know intimately.
Oof. I tried to tell a manager why a certain technical thing wouldn't work, and he pulled out his phone and started reading the Google AI summary "no, look, you just need to check the network driver and restart the router". It was two devices that were electrical not compatible, and there was no IP infrastructure involved.
i think this is why i've never really had a good experience with an LLM - i'm always asking it for more detail about stuff i already know.
it's like chatgpt is pinocchio and users are just sitting on his face screaming "lie to me! lie to me!"
See now that sounds fun
Woah!
I think the average consumers are easily hitting that with current models... in part because the collapse of search engine functionality and human computer skills leads to the tech being used for extremely basic/common requests that are common enough that the answer was trained on a thousand times over.
Like, it might get 80% of answers correct because 85% of questions it is asked nowadays could have just been answered by what was the top answer on google 6 years ago, and that's already in the training data. Think "why is the sky blue?"
It is only "super users" that routinely ask it for rare or complex information synthesis (y'know, the key selling point of an LLM as an info source over a search engine) that force it up against the wall of "make shit up" more than 20% of the time.
Yeah, the research says its closer to 60-50% of the time its correct