Page is a not externally linkable
- Social Media
-- Twitter
---- Twitter's Real Time URL Fetcher: SpiderDuck


Pfui - 11:42 am on Nov 16, 2011 (gmt 0)


I've seen Twitter's "spiderduck" subdomain/bot since at least the beginning of August. Here's what it looks like, with two different UAs from two different domains always hitting simultaneously on Nov. 12th --

spiderduck01.dmz1.twitter.com [projecthoneypot.org...]
Twitterbot/1.0

09:57:34 /robots.txt

-- BUT --

User-agent: *
Disallow: /

-- is promptly ignored by its fellow traveler(s):

r-199-59-149-10.twttr.com [projecthoneypot.org...]
Twitterbot/0.1

09:57:34 /filename.html
10:34:57 /filename.html

Thee-plus months' of hits show the exact same one-two punch pattern where Twitterbot/1.0 only requests robots.txt and Twitterbot/0.1 never does (& always ignores same).

FWIW: I'm content to leave my Disallows and bot-blocks as-is because I've yet to see any benefit from Twitter crawling/extracting/whatevering my content "to improve Twitter products."


Thread source:: http://www.webmasterworld.com/twitter/4387539.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com