109
you are viewing a single comment's thread
view the rest of the comments
[-] MrSoup@lemmy.zip 28 points 2 years ago

I doubt Google respects any robots.txt

[-] DaGeek247@fedia.io 26 points 2 years ago

My robots.txt has been respected by every bot that visited it in the past three months. I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

I've only gotten like, 20 visits in the past three months though, so, very small sample size.

[-] mozz@mbin.grits.dev 14 points 2 years ago

I know this because i wrote a page that IP bans anything that visits it, and l also put it as a not allowed spot in the robots.txt file.

This is fuckin GENIUS

[-] Moonrise2473@feddit.it 8 points 2 years ago

only if you don't want any visits except from yourself, because this removes your site from any search engine

should write a "disallow: /juicy-content" and then block anything that tries to access that page (only bad bots would follow that path)

[-] Miaou@jlai.lu 23 points 2 years ago

That's exactly what was described..?

[-] Moonrise2473@feddit.it 3 points 2 years ago

Oops. As a non-native English speaker I misunderstood what he meant. I understood wrongly that he set the server to ban everything that asked for robots.txt

[-] Zoop@beehaw.org 2 points 2 years ago

Just in case it makes you feel any better: I'm a native English speaker who always aced the reading comprehension tests back in school, and I read it the exact same way. Lol! I'm glad I wasn't the only one. :)

[-] mozz@mbin.grits.dev 5 points 2 years ago

You need to read again the thing that was described, more carefully. Imagine for example that by “a page,” the person means a page called /juicy-content or something.

[-] thingsiplay@beehaw.org 2 points 2 years ago* (last edited 2 years ago)

Interesting way of testing this. Another would be to search the search machines with adding site:your.domain (Edit: Typo corrected. Off course without - at -site:, otherwise you will exclude it, not limit to.) to show results from your site only. Not an exhaustive check, but another tool to test this behavior.

[-] MrSoup@lemmy.zip 2 points 2 years ago

Thank you for sharing

[-] Moonrise2473@feddit.it 10 points 2 years ago

for common people they respect and even warn a webmaster if they submit a sitemap that has paths included in robots.txt

this post was submitted on 01 Aug 2024
109 points (100.0% liked)

Technology

42532 readers
275 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago
MODERATORS