516

BBC will block ChatGPT AI from scraping its content (deadline.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

64 comments fedilink hide all child comments

BBC will block ChatGPT AI from scraping its content::ChatGPT will be blocked by the BBC from scraping content in a move to protect copyrighted material.

top 50 comments

sorted by: hot top controversial new old

[-] Hubi@feddit.de 92 points 2 years ago

Makes sense, OpenAI will probably have to apply for a TV-license first.

[-] FlyingSquid@lemmy.world 7 points 2 years ago

I don't live in the UK, but I would gladly pay the TV license fee, or even a premium on top of it, if I had unlimited access to iPlayer. My only option right now is BritBox, which is not great and not really worth the money.

[-] jaackf@lemm.ee 4 points 2 years ago

Just VPN to the UK and then tick the box which says you have a TV license? Or there are other ways to get the content most likely! 🏴‍☠️

[-] FlyingSquid@lemmy.world 3 points 2 years ago

VPNs are always blocked in my experience.

[-] csm10495@sh.itjust.works 75 points 2 years ago

I wonder if anyone thinks robots.txt is binding or not ignored by anyone who wants.

[-] lemmyvore@feddit.nl 46 points 2 years ago

OpenAI will have to deal with a lot of lawsuits in the future. Robots.txt may not be legally binding but disobeying it after claiming otherwise would go a long way towards establishing intent.

[-] andrew@lemmy.stuart.fun 16 points 2 years ago

I mean, under the CFAA you could probably pretty easily pursue charges when explicitly deauthorizing certain agents from accessing your data. Plenty of people have been threatened and prosecuted for less.

https://www.nacdl.org/Landing/ComputerFraudandAbuseAct

[-] totallynotfbi@lemm.ee 6 points 2 years ago

I mean, you could just block OpenAI's crawlers' IP addresses, if you wanted to

[-] Noite_Etion@lemmy.world 61 points 2 years ago

Big businesses wont lift a finger to halt global warming, but the second their precious copyrights are attacked they go into full force.

[-] Moneo@lemmy.world 4 points 2 years ago

I mean, yeah? Corporations are always going to act in their best interest, that's why regulation exists.

[-] netchami@sh.itjust.works 40 points 2 years ago

Kinda late

[-] porkins@sh.itjust.works 34 points 2 years ago

I’d rather have ChatGPT know about news content than not. I appreciate the convenience. The news shouldn’t have barriers.

[-] netchami@sh.itjust.works 48 points 2 years ago

But ChatGPT often takes correct and factual sources and adds a whole bunch of nonsense and then spits out false information. That's why it's dangerous. Just go to the fucking news websites and get your information from there. You don't need ChatGPT for that.

[-] echodot@feddit.uk 14 points 2 years ago

So they have automated Fox then.

[-] netchami@sh.itjust.works 6 points 2 years ago

Yeah, pretty much.

[-] guacupado@lemmy.world 11 points 2 years ago

More data fixes that flaw, not less.

[-] netchami@sh.itjust.works 19 points 2 years ago

Not too long ago, ChatGPT didn't know what year it is. You're telling me it needs more data than it already has to figure out the current year? I like AI for certain things (mostly some programming/scripting stuff) but you definitely don't need it to read the news.

load more comments (3 replies)

[-] CurlyMoustache@lemmy.world 18 points 2 years ago

It is not "a flaw", it is the way language learning models work. They try to replicate how humans write by guessing based on a language model. It has no knowledge of what is a fact or not, and that is why using LLMs to do research or use them as a search engine is both stupid and dangerous

load more comments (3 replies)

load more comments (2 replies)

load more comments (1 replies)

[-] Apollo@sh.itjust.works 26 points 2 years ago* (last edited 2 years ago)

Who get their news from chatgpt lol

[-] FlyingSquid@lemmy.world 5 points 2 years ago

A disturbing number of people.

[-] spez_@lemmy.world 4 points 2 years ago

I do

[-] Apollo@sh.itjust.works 9 points 2 years ago

Why?

[-] prashanthvsdvn@lemmy.world 12 points 2 years ago

It’s funny seeing Apollo and spez_ fighting on a topic regarding ChatGPT.

[-] Apollo@sh.itjust.works 4 points 2 years ago

Natural enemies must fight

[-] abhibeckert@lemmy.world 7 points 2 years ago* (last edited 2 years ago)

Because ChatGPT doesn't do clickbait headlines or have auto-play video ads, auto play video news that follows me if I try to scroll past it, or a house ad that tries to convince me to stop reading the news and instead read a puff piece about how to clean my water bottle. Which I'd bet fifty bucks will result in me seeing ads for new water bottles every day for the next month. No thanks.

With the "Web Browsing" plugin, which essentially does a Bing search then summarises the result, ChatGPT is a far better experience if you want to find out what's going on in Israel today for example.

[-] Ad4mWayn3@lemmy.world 4 points 2 years ago

Neither does lemmy, here (and in other instances) there's plenty of communities for news, and with better control of misinformation.

load more comments (1 replies)

[-] C4d@lemmy.world 10 points 2 years ago* (last edited 2 years ago)

The pure ChatGPT output would probably be garbage. The dataset will be full of all manner of sources (together with their inherent biases) together with spin, untruths and outright parody and it’s not apparent that there is any kind of curation or quality assurance on the dataset (please correct me if I’m wrong).

I don’t think it’s a good tool for extracting factual information from. It does seem to be good at synthesising prose and helping with writing ideas.

I am quite interested in things like this where the output from a “knowledge engine” is paired with something like ChatGPT - but it would be for eg writing a science paper rather than news.

load more comments (1 replies)

[-] C4d@lemmy.world 3 points 2 years ago

Exactly. The data harvest has had years in the making.

[-] patawan@lemmy.world 25 points 2 years ago

Curious what the mechanism for this will be. CAPTCHA can sometimes be relatively easy to pass and at worst can be farmed out to humans.

[-] Cqrd@lemmy.dbzer0.com 33 points 2 years ago

ChatGPT took down its Internet search to implement a robots.txt rule it would obey and allow content providers time to add it to their lists. This was done because they were being used to get around paywalls. So it’s actually very easy for them to do this for ChatGPT, specifically, which makes articles like this ridiculous.

load more comments (4 replies)

[-] Snowplow8861@lemmus.org 20 points 2 years ago

When the horses have all bolted, BBC is the one to close the barn door.

[-] callmepk@lemmy.world 16 points 2 years ago

Also FYI, you can see what some of the most popular websites that already blocked ChatGPT: https://wayde.gg/websites-blocking-openai

[-] xenomor@lemmy.world 9 points 2 years ago

It should be illegal for entities like BBC to do this. Copyright is meant to be a temporary, limited construct that carves out an opportunity for creators to profit from their works. It is not perpetual legal dominion over specific ideas. Entities that harvest content to train LLMs should pay for access like everyone else, but after that, they can use the information they learn however they see fit. Now, if their product plagiarizes, or doesn’t properly attribute authorship, that is a problem. But it’s a different issue from what the BBC is fighting here.

I think there are some content creators that believe they are owed royalties if you even think about a piece they wrote or drew. That is, of course, absurd in terms of human minds. It’s also absurd in terms of other kinds of minds.

[-] hazelnot 17 points 2 years ago

Counter-point: everyone should block AI shit, fuck the laws

[-] regbin_@lemmy.world 5 points 2 years ago

You got that backwards. Fuck copyright. Nothing should be copyrighted.

[-] hazelnot 3 points 2 years ago

I agree. Nothing should be copyrighted. But everyone should try their hardest to stop "AI" scammers and the surveillance apparatus as a whole

load more comments (1 replies)

[-] NightLily@lemmy.basedcount.com 9 points 2 years ago

Good!

[-] Immersive_Matthew@sh.itjust.works 3 points 2 years ago

Why good?

[-] NightLily@lemmy.basedcount.com 5 points 2 years ago

These things should not at all be scraping without express permission of the author or the company who owns the work. It’s just completely wrong for them to do as such.

load more comments (4 replies)

[-] uriel238 4 points 2 years ago

Not for long. AI knows how to lie.

[-] flossdaily@lemmy.world 4 points 2 years ago

This is a bit like companies blocking Google from their websites.

You're only hurting yourself.

load more comments (1 replies)

[-] vidarh@lemmy.stad.social 3 points 2 years ago* (last edited 2 years ago)

It won’t really matter, because there will continue to be other sources.

Taken to an extreme, there are indications OpenAI’s market cap is already higher than Tomson Reuters ($80bn-$90bn vs <$60bn), and it will go far higher. Getty, also mentioned, has a market cap of “only” $2.4bn. In other words: If enough important sources of content starts blocking OpenAI, they will start buying access, up to and including if necessary buying original content creators.

As it is, while BBC is clearly not, some of these other content providers are just playing hard to get and hoping for a big enough cash offer either for a license or to get bought out.

The cat is out of the bag, whatever people think about it, and sources that block themselves off from AI entirely (to the point of being unwilling to sell licenses or sell themselves) will just lose influence accordingly.

This also presumes OpenAI remains the only contender, which is clearly not the case in the long run given the rise of alternative models that while mostly still not good enough, are good enough that it’s equally clearly just a matter of time before anyone (at least, for the time being, for sufficiently rich instances of “anyone”, with the cost threshold dropping rapidly) can fine-tune their own models using their own scraped data.

In other words, it may make them feel better, but in the long run it’s a meaningless move.

EDIT: What a weird thing to downvote without replying to. I've taken no stance on whether BBC's decision is morally right or not, just addressed that it's unlikely to have any effect, and you can dislike that it won't have any effect but thinking it will is naive.

load more comments (4 replies)

load more comments

this post was submitted on 08 Oct 2023

516 points (100.0% liked)

Technology

75903 readers

28 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws