Forum Moderators: open

Message Too Old, No Replies

Mozilla/5.0 (compatible; Twitturls; http://twitturls.com)

         

GaryK

5:40 pm on Oct 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mozilla/5.0 (compatible; Twitturls; [twitturls.com)...]
174.143.213.nn
174-143-213-nn.static.cloud-ips.com
-----
OrgName: Rackspace.com, Ltd.
OrgID: RSPC
NetRange: 174.143.0.0 - 174.143.255.255
-----
ROBOTS.TXT? No
-----

This claims to be a site that scours Twitter for links posted in tweets. So how come it was crawling one of my sites? I don't know. I just know it won't be doing that again!

Pfui

4:08 am on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In addition to going where they're not supposed to, and relentlessly, Twitturls-related UAs/Hosts log-spam for their site, using the URL as referer and as UA:

Host: 74.126.19.nn.static.a2webhosting.com
UA: http://twitturls.com
robots.txt? NO
referer log-spam? YES: http://twitturls.com

Twitturls-related UAs/Hosts also misbehave in other ways. Here's the above, erm, twit on the same July day, hitting the same file rapid-fire, regardless of six-plus 403s; but this time using the UA in the OP and below:

07/08 12:24:22
07/08 12:24:22
07/08 12:24:22
07/08 12:24:22
07/08 12:24:22
07/08 12:24:22

And from last May (exact same Host; ditto as far back as December, 2008):

74.126.19.nn.static.a2webhosting.com
Mozilla/5.0 (compatible; Twitturls; +http://twitturls.com)
robots.txt? NO
referer log-spam? YES: http://twitturls.com

Note: Twitturls is related to Twitturly and the latter UA misbehaves the same way, from no-robots to going where no Tweet has gone before to log-spamming for its site. E.g.:

UA: Twitturly / v0.5 (from: .algx.net)
UA: Twitturly / v0.6 (from: .amazonaws.com)
robots.txt? NO
referer log-spam? YES: http://twitturly.com

Yep. Been watching -- and blocking -- these guys for a while:)

GaryK

4:19 am on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I love your sense of humor. Thanks for the info.

Will you please post the full UA for twitturly?

keyplyr

4:50 am on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



With the rise of Twitter, it seems that a new, immature breed of self-serving UAs has emerged. If your company Tweets and sees enough potential benefit, then certain concessions present themselves. Sometimes the deciding factor for me is... is it worth the code bloat to create a whitelist for yet another already blocked UA/method/IP range?

Pfui

7:15 am on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Gary: Thanks:) From my notes, I've seen these two UAs (each line is the complete UA):

Twitturly / v0.5
Twitturly / v0.6

@keyplyr: Thus far, I'd say almost ALL of the Twitter-related UAs I see either come from already-blocked hosts, typically server farms with long-standing histories of bad bot-running, or already-blocked bots.

Since the first of the year, I've found that 403'ing bots with "twit" in the UA doesn't affect real people from following the URLs mined from tweets by the bots. (YMMV) At first I tried a separate white list, but it got too unwieldy and time-consuming, and the overlaps too confusing.

And on the plus side, if I'm eyeballing logs when the Twitter bot pack attacks -- usually 10-15 Hosts and/or bots w/in 5 minutes to 1 filename; none with legit referers, of course -- I know someone's tweeted that page/link. Then it's easy enough to check search.twitter.com and see who said what.

GaryK

11:24 am on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks. I wonder why they include a leading and trailing space around the slash.