Forum Moderators: open
hope they get their crawl tactics in better order
[edited by: lucy24 at 1:46 am (utc) on May 5, 2017]
Yes, they read Robots.txt, but then ignore it.
148.251.244.204[29/Apr/2017:15:58:27GET /robots.txt HTTP/1.1 200 709 - Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
148.251.244.204[29/Apr/2017:15:58:28GET /wp/?p=4176&buy-cephalexin-no-prescription HTTP/1.1 403 13 - Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)... User-agent: BLEXBot
Disallow: /...but the bot goes on to make requests anyway? we of course take any request to desist crawling any site... If this is the case for you please don't hesitate to contact us at customercare@webmeup.comSo if you don't want them to crawl your site, try emailing them. They will need your site's IP address in addition to your domain name.
I really find robots.txt near useless, totally ineffective and outdated guideline. Most bots either ignore it, or read it to find out where they are not supposed to go and then purposely go there. Robots.txt is the stop sign that everyone does a rolling stop through.Well that's pretty much always been the case. That's why it works so well for filtering. Good bots support it, bad bots don't. It's just that there were fewer bad bots a few years ago.
Since they haven't taken any measures to correct their crawl,
148.251.244.204 - - [08/May/2017:04:10:27 -0700] "GET /ebooks/hhtravel/023183.html HTTP/1.1" 404 1462 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)"
148.251.244.204 - - [08/May/2017:04:11:06 -0700] "GET /ebooks/hhtravel/023201.html HTTP/1.1" 404 1463 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)"
148.251.244.204 - - [08/May/2017:04:11:28 -0700] "GET /ebooks/hhtravel/023216.html HTTP/1.1" 404 1463 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)"
148.251.244.204 - - [08/May/2017:04:11:36 -0700] "GET /ebooks/hhtravel/023217.html HTTP/1.1" 404 1463 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)"
148.251.244.204 - - [08/May/2017:04:11:40 -0700] "GET /ebooks/hhtravel/023234.html HTTP/1.1" 404 1463 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)"
148.251.244.204 - - [08/May/2017:04:11:44 -0700] "GET /ebooks/hhtravel/023236.html HTTP/1.1" 404 1463 "-" "Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)"
... which, incidentally, suggests that somebody out there has some really, really useless URLs. (They're not dates, which would have been the only possible justification.)