Forum Moderators: open

Message Too Old, No Replies

Search37

         

keyplyr

7:56 pm on Aug 27, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




UA: Search37/1.2 (http://www.search37.com; info@search37.com)
Protocol: HTTP/1.1
Robots.txt: Yes
Host: ovh.net
91.121.0.0 - 91.121.255.255
91.121.0.0/16

keyplyr

12:16 am on Aug 28, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They need an info page to tell site owners & webmasters:
1.) Who they are?
2.) What they're after at our sites?
3.) What they will do with the data they retrieve?
4.) Why we should allow them to take our property. How does it benefit us?

lucy24

7:56 pm on Aug 28, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To refresh my memory: In these posts, "robots.txt" means only that they asked for it, not necessarily that they honor its contents, or even ask before their first page request?

keyplyr

8:16 pm on Aug 28, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, robots.txt was requested, period.

As far as User Agent Documentation, IMO anything further would get into vague scenarios and interpretations. So whether the bot followed the intentions of the robots.txt directives or not, would be on a case by case validation.

This is why I personally think the robots.txt is outdated. The web has moved on and most agents have no use for it since it doesn't apply to them. It was never a standard despite the efforts toward making it that.