OK, so someone did a Twitter post pointing to a file I have on the web. Not a big deal, but the file is a professional one, and I'd like to not encourage its access by social media.
So no sweat, I assume. In my robots.txt I have
User-agent: Twitterbot/1.0
Disallow: /
I also have Twitter IP adresses denied in my .htaccess file. As in
deny from 199.59.148.0/22
But Twitterbot keeps hammering me with
199.59.148.209 - - [26/Mar/2015:08:46:01 -0500] "GET /robots.txt HTTP/1.1" 200 454 "-" "Twitterbot/1.0"
199.59.148.209 - - [26/Mar/2015:08:46:01 -0500] "HEAD /myfile.htm HTTP/1.1" 403 - "-" "Twitterbot/1.0"
As in, I read your robots.txt file which tells me to go away, but I'll still poke at your file anyway, even if I can't get to it. Over and over and over and over. At least it's not demanding in lot of bandwidth, but it's getting kind of crazy. Go away already!
Is there a more compelling way to tell Twitterbot to go away?