Forum Moderators: open
Asks for robots.txt some of the time BUT ignores it ALL of the time.
{ long string of robots.txt + 403 pairs from assorted AWS IPs, ending in }
54.158.59.24 - - [28/Jul/2015:13:38:38 -0700] "GET /robots.txt HTTP/1.1" 200 569 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
54.158.59.24 - - [28/Jul/2015:13:38:39 -0700] "GET /ebooks/abbey/ HTTP/1.1" 403 1716 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
184.72.191.180 - - [28/Jul/2015:15:06:53 -0700] "GET /robots.txt HTTP/1.1" 200 580 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 (FlipboardProxy/1.1; +http://flipboard.com/browserproxy)"
{ long string of robots.txt ONLY from the same assorted IPs}
What happened between 13:38 and 15:06? Answer: I decided it can't hurt to try, because the only thing better than a blocked request is no request at all, and so added a comprehensive FlipboardProxy Disallow in robots.txt. Astonishingly, it seems to work.