1101
        you are viewing a single comment's thread
view the rest of the comments
    
  
  
    view the rest of the comments
        this post was submitted on 27 May 2024
        
  
      
  
      1101 points (100.0% liked)
      Technology
    76365 readers
  
      
      1633 users here now
  
      This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
        founded 2 years ago
      
  
  
      MODERATORS
      
  
    
So you have a product that you've made into a system for getting answers. And then you couldn't be bothered to try and sanitize training data enough to get your answer system's new headline feature from spreading blatantly incorrect information? If it doesn't work, maybe don't ship it.
The worst part is they don't seem to realize their responsibility in this as the leading search engine that the majority of the world uses. They seem to have the mindset "our answers are potentially dangerous for users but it is ok we have an army of lawyers"
I think the problem they are facing is data quantity. Sanitizing possibly terabytes of text data is a humongous task. They have probably used an AI to do the cleanup but the more suble errors have passed through the filter.
Yeah, the problem is how to sanitise effectively. You've gotta be able to find a way to automatically strip out "bad" things from your training data (via an "oracle"). But if you already had that oracle, you could just slap it on your final product (e.g. Search) and make all the "bad" things disappear before they hit the user (via some sort of filter).
I'm pretty sure google's final solution will be using mechanical turks