Forum Moderators: goodroi

Message Too Old, No Replies

“google sent me”

         

lucy24

3:43 am on Jan 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Has anyone else noticed random robots requesting robots.txt and giving “https://www.google.com/” as referer?

It started pretty suddenly in late September and has been ongoing since them. Random robots from assorted AWS neighborhoods, sporting humanoid but wildly antiquated UAs. My favorite--possibly theirs too, because it shows up a lot--is
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080219 Firefox/2.0.0.12 Navigator/9.0.0.6

Logged headers tell me they're especially concerned with getting a fresh copy:
Cache-Control: max-age=60

Just robots.txt. Never anything else. Do they not realize that sending a bogus referer with a robots.txt request is more likely to attract attention? Won't get them blocked, because I've got an ironclad policy of letting everyone see robots.txt, no exceptions, but honestly. This is silly.

keyplyr

5:37 am on Jan 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They likely are just filling in the referrer field to appear more legit. What other referrer field to pick?

lucy24

6:14 am on Jan 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They likely are just filling in the referrer field to appear more legit.

The phrase “worse than nothing” comes to mind. Requests for robots.txt aren't supposed to have a referer.

keyplyr

6:25 am on Jan 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Really? Oh yeah... guess your right.

lucy24

7:27 pm on Jan 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh yeah

You'd have to have a pretty phenomenal robots.txt for its contents to show up on a Google search :) Maybe if you filled it with ASCII art?

Another thing I've seen pretty often over the years is the auto-referer on all requests including robots.txt. Even from some reasonably legitimate robots.