|Twitter for Testing Bot Blocking|
Here's a simple Twitter tip for bot blockers.
Thanks to the swarm of link chasing bots that use Twitter's API you can easily test your bot blocker instantly with a single tweet. Just tweet a full URL to the site where you want to test your bot blocking and within seconds bots will start knocking on your door and continue to trickle in for a while. Many seem to cache the results so trying to get them to come to your site repeatedly in s short period of time requires directing them to different pages per tweet.
Hope this little trick helps some people when they're testing some new blocking filters because I've found it to be an invaluable tool to be able summon bots on demand.
Not only that, they've exposed some new hosts I didn't have blocked, a bonus! :)
Not that anyone seemed to be interested in the best tip for testing your bot blocker on demand, but Twitter also claims that Twitterbot honors robots.txt!
|URL crawling |
Twitter's crawler will respect robots.txt when scanning URLs. If a page with card markup is blocked, no card will be shown. If an image URL is blocked, no thumbnail or photo will be shown.
Twitter uses the User-Agent of Twitterbot/1.0, which can be used to create an exception in your robots.txt file. For example, here is a robots.txt which disallows crawling for all robots except Twitter's fetcher:
Side note: I also use Twitter Bootstrap [twitter.github.com] which I highly recommend for building responsive design sites in validated HTML 5. It was easy to learn and I deployed it same day.
Never needed to do any testing. Just identified the parasites and blocked as needed. Human traffic from Twitter varies from double-digit to occasional triple-digit daily uniques. Twitter and Facebook have grown into very nice traffic sources yielding an increasing ROI.
Cheers for the tip - just tweeted a unique URL on a very quiet domain and noted there were a few IPs that hit the server immediately, all within the space of 5 seconds. (agents: TweetmemeBot, UnwindFetchor, Twitterbot, Butterfly, "JS-Kit URL Resolver")
I presume these bots must subscribe to the realtime twitter stream, with the follow-on stragglers periodically querying the public API to find new URLs to munch on.
I've never used Twitter before, was just interested to see the bot activity.
Also, the more followers you have, the more retweets you'll get. This generates a wider reach to all their followers, etc. etc. which in turn generates more parasite bot hits.
With my 21k+ followers, when I post a link I'll immediately see over 2 dozen non-human UAs, and another dozen or so withing the next 20 minutes, all blocked.
Every once in a while I see a new one.