homepage Welcome to WebmasterWorld Guest from 54.204.127.191
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Pfui




msg:4373033
 11:16 pm on Oct 10, 2011 (gmt 0)

Quoting dstiles: "MSIE 6 is now deprecated by MS and rightly so. They no longer support it so any holes are exploitable." [webmasterworld.com...]

FWIW, I've got it blocked (as should we all:) but I wanted to pass along a same-second, one-two punch pattern that's cropped up at least three times in two days, twice from rarely-exploited Swedish ISPs, and once from always-hazardous telecomitalia.it

The pattern's a mash-up of site-specific search keywords and fake URIs and fake REFs with bad hex -- http://keywordA:keywordB@ -- via:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

HIT ONE:

URI:
//www.example.com/dir/filename.html+http%3A//keywordA%3AkeywordB%40
REF:
/dir/filename.html+http%3A//keywordA%3AkeywordB%40

HIT TWO:

URI:
//www.example.com/dir/filename.html
REF:
/dir/filename.html

Three different pairs of site-specific keywords routed to different, keyword-appropriate files by a faked URI+REF combo -- anyone else seeing this stuff?

 

dstiles




msg:4373380
 8:44 pm on Oct 11, 2011 (gmt 0)

Not specifically but a lot of bad stuff that comes in is simply rejected with 403 if I already have the IP range in my database so I would not usually see it unless it was very persistent.

By now I have a lot of server ranges (though by no means all) plus a lot of "bad" dynamic IP ranges - ie they have hit my sites with ill intentions, been blocked and then added to the database. I also record "good" ranges - those that have hit from a compromised (botnet) machine but from a range that is usually a good dynamic one (eg roadrunner dynamic, BT dynamic etc).

I do not reject all IE6 hits - it depends on the headers: certain combinations I have found presage scrapes or similar and those are rejected and ultimately added to the database either as "always bad" or "usually good" ranges..

In actuality bad hits on new IP ranges are recorded in the database anyway and used to block further immediate activity until, a few minutes or hours later, I get around to "ranging" the IPs for either a server block or a dynamic "accept unless" listing of the range.

I think a bare IE6 UA such as you give is anyway suspect as it should at some time in its life cycle have acquired MS upgrade info, I would have thought, especially on XP and later.

Pfui




msg:4373442
 11:25 pm on Oct 11, 2011 (gmt 0)

Here's another too-bare one, from a source that should do better...

urlc1.mail.mud.yahoo.com [209.191.87.214]
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

14:16:12 /
14:16:27 /

robots.txt? NO

dstiles




msg:4373748
 6:34 pm on Oct 12, 2011 (gmt 0)

The rDNS name suggests that is a mailbox for yahoo accounts so it's feasible that someone clicked on a link in an email.

On the other hand, I've always been suspicious of mud.yahoo. :(

Pfui




msg:4373796
 8:43 pm on Oct 12, 2011 (gmt 0)

Me, too. And that's why I think it's more likely a Yahoo thing because of its sibling subdomains:

From 09-19-11:

urlc3.mail.mud.yahoo.com [209.191.87.216]
07:49:35 /
07:49:57 /

From 02-04-11:

urlc4.mail.mud.yahoo.com [209.191.87.217]
18:53:42 /
18:53:43 /

Always double-hits. Never robots.txt. Always the block-worthy:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Pfui




msg:4373803
 8:59 pm on Oct 12, 2011 (gmt 0)

By the way, the curious, fake, translates-as --

http://keywordA:keywordB@

-- pattern in the OP appeared again, but this time one t-dialin.net machine repeated each faked URI=REF pattern shown by two prior machines. The odds of this being happenstance are nil. But what is it, I wonder?

HIT ONE:
URI:
//www.example.com/dir/filename1.html+http%3A//keywordA%3AkeywordB%40
REF:
/dir/filename1.html+http%3A//keywordA%3AkeywordB%40

HIT TWO:
URI:
//www.example.com/dir/filename1.html
REF:
/dir/filename1.html

HIT THREE:
URI:
//www.example.com/dir/filename2.html+http%3A//keywordC%3AkeywordD%40
REF:
/dir/filename2.html+http%3A//keywordC%3AkeywordD%40

HIT FOUR:
URI:
//www.example.com/dir/filename2.html
REF:
/dir/filename2.html

Again, all via:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

dstiles




msg:4373841
 10:26 pm on Oct 12, 2011 (gmt 0)

Only thing I can think of is a link checker, possibly running for a search directory? Or possibly getting its source URLs from a traditional SE?

Pfui




msg:4404157
 7:49 pm on Jan 5, 2012 (gmt 0)

R.I.P. --

MS declares IE6 Dead (tangor) [webmasterworld.com...]
U.S. IE6 Usage Drops Below 1 pct (engine) [webmasterworld.com...]

dstiles




msg:4404168
 8:41 pm on Jan 5, 2012 (gmt 0)

They stopped updating IE6 a couple of years or more ago. Around the time google got trojanned through it. :)

I still get a fair number of IE6 hits, some real humans. Three machines I have here cannot use any MS browser higher than 6 - they are windows 2000, which is itself obsolete and no longer updated. Thank goodness for firefox and linux! :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved