homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Twitter Chasing Bots
How many bots are chasing your tweets?

 1:32 am on May 8, 2010 (gmt 0)

You think people are really reading those tweets of yours?

Maybe, or maybe not, but there's a whole herd of bots that jump on them right away!

Here's just the few things that followed a link back to my site within a few hours after tweeting a link.

184.73.85.* "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: Gecko/20070914 Firefox/"

From the Amazon AWS

67.202.7.* "HEAD /..." "PycURL/7.18.2"

Something else from the Amazon AWS validating URIs with a HEAD request "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Et Tu Googlebot?

217.144.236.* "HEAD /..." 301 252 "-" "ytndemo bergum@yahoo-inc.com"

Something from Yahoo validating the URIs with a HEAD request

128.242.241.* "HEAD /..." "Twitterbot/0.1"

It appears Twitter actually checks the URIs with their own HEAD request

38.113.234.* "Voyager/1.0"

Good old Voyager poking around

204.236.153.* "HEAD /..." "JS-Kit URL Resolver, [js-kit.com...]

Guess what, it's checking the URI too...

85.114.136.* "Mozilla/5.0 (compatible; Windows NT 6.0) Gecko/20090624 Firefox/3.5 NjuiceBot"

Pfui posted about the NjuiceBot

216.24.142.* "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20091221 Firefox/3.5.7 OneRiot/1.0 (http://www.oneriot.com)"

Another social parasite

89.151.116.* "Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot"

And another social parasite

65.52.29.* "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

Something from MS...

70.37.65.* "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

Something else from MS...

64.13.147.* "Mozilla/5.0 (compatible; abby/1.0; +http://www.ellerdale.com/crawler.html)"

Another social leech from SVCOLO

174.129.151.* "HEAD /... " "@hourlypress"

Yet another AWS process checking URIs

184.73.204.* "HEAD /..." "Firefox"

Even more crap from Amazon AWS, yeah right, Firefox <snort>

67.202.5.* "kame-rt (support@backtype.com)"

Yet even more junk using Amazon AWS, it just keeps coming

74.112.128.* "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8"

Tweet powered search engine? oh gag...

173.13.167.* "Mozilla/5.0 (Windows; U; Windows NT 6.0; ru; rv: Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"

Something using a comcast business connection

174.129.119.* "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: Gecko/20070914 Firefox/"

More things from the septic tank of Amazon AWS

This is probably just the tip of the iceberg collected over just a few hours.

Who knows just how much junk is really chasing your tweets but ENOUGH already with all the leeches.




 3:55 pm on May 8, 2010 (gmt 0)


I know MS uses twitter to do geo-location on bing maps as well as integrate within bing search.. i'm sure others do the same.

Whats with all the amazon hate? some popular stuff runs on amazon ec2 from my reddit addiction to my wife's four square habits :)


 4:05 pm on May 8, 2010 (gmt 0)

just fyi: Amazon runs the backend of Twitter and bit.ly.


 4:21 pm on May 8, 2010 (gmt 0)

@ ByronM



 5:27 pm on May 8, 2010 (gmt 0)

@ByronM The web has changed a lot since this discussion: [webmasterworld.com...] and many of the AWS maggots seem to be oblivious to the existence of robots.txt. Now when you run a small website, a few pages here or there is very little. But when you run a large website with thousands or millions of webpages and the operators of some of these maggots decide to download the entire site, it is a big problem.



 5:47 pm on May 8, 2010 (gmt 0)

Your first two to four visitors following any bit.ly or tiny.cc link that you post to Twitter will almost certainly be bots of some sort - often arriving within tens of seconds after posting the link.


 7:18 pm on May 8, 2010 (gmt 0)

isn't that a good thing though? a big chunk of the webmasters who use twitter just automate it all anyway, i know i do. i tie my rss feeds to it. there's not much point doing that if the bots don't lap it up.


 7:49 pm on May 8, 2010 (gmt 0)

Most Twitter parasites I block by IP range, a couple by UA and I allow several that benefit me; just like every thing else online, it's a case by case thing.


 12:22 am on May 9, 2010 (gmt 0)

174. and 173. ranges are pretty new, i dont mean AWS, but are on the *&$% list to start with. I am looking at 50+++ sites report at this point and it ian't pretty.


 5:52 am on May 9, 2010 (gmt 0)

I christen these new findings "Recursive Twitter Disease", very dangerous to a website without a healthy immune system.

Question is, now that we've got a disease... is there a cure that DOESN'T involve cutting something off?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved