188
scraperule (lemmy.zip)
submitted 2 years ago by Monologue@lemmy.zip to c/196
top 6 comments
sorted by: hot top controversial new old
[-] R00bot 16 points 2 years ago

i tried to get access to facebook's api to mess around (as a student) but they declined my request. i ended up making a bot that ran in a headless browser wasting far more of facebooks resources and i used it to create shitposts that updated the post with the number of reactions lmao.

[-] b3nsn0w@pricefield.org 12 points 2 years ago

fun fact: on the r-site, you can still append .json to the end of any path (before the query params) to get the formatted data

fun fact 2: on the same site you get a similar json if you grab the script that says id="data" (trivial with jsdom if you run nodejs), eval it in a sandbox (node's built-in vm package), and look for your passed global object's $.___r param

fun fact 3: also on the same site, if you use the old interface it's full of data tags intended for css, jsdom goes brrr

fun fact 4: even if they stopped all of this you could use a headless browser and grab the data in flight from the api calls (virgin dom scrubber vs chad api capturer)

i don't know much about the t-site and can't check right now because you can't even access it the normal way, lol

[-] SubWoofer@catgirl.pub 7 points 2 years ago

Scraping my beloved..using more resources from a company's server makes me drool

[-] Shit@sh.itjust.works 4 points 2 years ago

This cracked me up. Especially the 10 minute delay and rate limiting making it better to just scrape.

[-] Jackolantern@lemmy.world 2 points 2 years ago

Can someone eli5 me. What’s scraping and how does it work? Like for example in the context of twitter with their current limitations, will scraping still work?

[-] 1rre@discuss.tchncs.de 9 points 2 years ago

Scraping is getting a webpage as if you're a normal user going to that page in firefox/chrome and extracting the bits you want from it. If Twitter makes you sign in to view tweets (which I guess it will now?) then scraping won't help much, otherwise it probably will, however it may take a fair bit of trickery to get working

this post was submitted on 04 Jul 2023
188 points (100.0% liked)

196

16778 readers
2596 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 2 years ago
MODERATORS