Forum Moderators: open

Message Too Old, No Replies

Experibot v1

         

keyplyr

11:47 am on Jan 18, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Experibot_v1
Protocol: HTTP/1.1
Robots.txt: Yes, but after requesting 2 previous pages and over 2 hours later.
Host: BEZEQINT-BROADBAND (bezeqint.net)
79.182.0.0 - 79.182.255.255
79.182.0.0/16

My site gets measurable traffic from this Israeli ISP, with a few pests mixed in; broadband users pulling down my entire site, presumably to save on bandwidth fees.

keyplyr

9:23 pm on Feb 1, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was able to "take" the robots.txt file of the aforementioned site with my current agent (Experibot_v1). Are you sure its blocked?
Per web standards, all bots (allowed, blocked or otherwise) are allowed to "take" robots.txt. Why would I block a robot from getting the one file that is intended for robots?

lucy24

10:01 pm on Feb 1, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No "no-crawl list" needed

Of course we can always keep out unwanted visitors. But unwanted visitors who obey robots.txt are even better, because then they never make more than that one initial request. (This situation arises, when, for example, a law-abiding robot lives in a bad neighborhood and they're not big enough to merit poking a hole.) Even the most minimalist 403 is more work for the server than not getting any request in the first place.

keyplyr

10:23 pm on Feb 1, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FYI - as a general rule, I don't (and I don't know anyone else who does) allow private bots from a home broadband account, especially for some personal research project. This is a symbiotic deal. If your bot benefits me then you get my stuff :)

Oblivious

12:33 pm on Feb 2, 2016 (gmt 0)

10+ Year Member



That's only fair.
Regardless, thanks for pointing out the issues to be fixed.
This 34 message thread spans 2 pages: 34