Microsoft URL Control - 6.00.8169 - Website Analytics - Tracking and Logging forum at WebmasterWorld

Forum Moderators: DixonJones

Message Too Old, No Replies

Microsoft URL Control - 6.00.8169

Help! We are getting hammered by this visitor!

guitarslinger

8:21 am on Sep 2, 2004 (gmt 0)

Hi guys - just been looking through the logs for one of our sites and I nearly fell off my chair!

We normally get around 1000 - 1700 pages views a day on one of our web sites - imagine my shock to look in on the stats this morning to find that yesterday we have had 11,000 plus page views in one day!

9700 of these were accounted to:
Microsoft URL Control - 6.00.8169

I've seen some other posts by searching the forum on this topic - but nothing telling me what I should do with it. Is it a good thing? Should I let it keep crawling my site like this - or should i set up a robots.txt exclusion? What entry would I need to put in the robots.txt (if Indeed i should be blocking this bot/site)?

Any thoughts would be greatly received.

Best wishes

Richard Thomas

guitarslinger

8:10 am on Sep 3, 2004 (gmt 0)

No ideas guys?

Lord Majestic

8:42 am on Sep 3, 2004 (gmt 0)

It could be spammers fishing for emails. I'd say put robots.txt up, but I would not hold my breath that this robot will follow it because user agent "Microsoft URL Control - 6.00.8169" implies that this is someone's amateurish VB app that does it. Any bot writer that keeps libraries default useragent deserves to get it from double barrels.

So I think this might be the case where banning bot is fully justified. Where is wilderness when you need him? ;)

guitarslinger

9:23 am on Sep 3, 2004 (gmt 0)

Thanks for the reply mate.

I think i've got the right line - I just save this in robots.txt in root dir?

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

All other bots will just ignore this and index the rest ok?

If it turns out that the url control bot ignores the robots txt - what can I do to stop this - i've had yet another day of 10000 page views..it appears that one page in particular is being targeted - product listing page with ten products on it - no e-mails to be found. All other pages appear to be having a normal level of activity.

Cheers

Richard

Lord Majestic

9:25 am on Sep 3, 2004 (gmt 0)

I'd ban all "User-agent: Microsoft URL Control" - using an exact version might be on oversight as versions change .

I highly doubt that this code supports robots.txt by default however - I believe (but could be wrong) its just a simple library which gets URL as instructed - extra code will be necessary to support robots.txt. If they are spammers then they would not care or even know what to do.

Oi, in fact - check logs, if this control did not even TRY to request robots.txt (it will be error 404 if its not present), then they don't support it!

guitarslinger

9:37 am on Sep 3, 2004 (gmt 0)

Hi - just checked logs again. Out of around 49000 hits in the last 48 hours only 99 were code 404.

G and Slurp have been through already several hundred hits a piece and i've never bothered with Robots.txt before so i'm assuming it's them that have created the 404s mainly.

So i guess the microsoft url control doesn't bother with robots.txt. If this is the case - how do I stop this bot from hammering me - it's completely screwing up my analysis and sales have dropped completely in the last 48 hours from this site and I was thinking whether this bot is slowing page load times etc as the server is busy dealing with the bot's requests rather than "real" peoples.

Arrrgh

Lord Majestic

9:41 am on Sep 3, 2004 (gmt 0)

You can deny requests by IP - have a look at Forum for "Search Engine Spider Identification" - [webmasterworld.com...] there should be plenty of posts there with instructions on how to ban clients by ip.

guitarslinger

9:43 am on Sep 3, 2004 (gmt 0)

Hi - just checking the logs again - 20,000 of the hits have come from a site called "Pacific Bell"

Ring any Bells...forgive the pun

But there doesn't appear to be any IP address for this.

I'm using webalizer

dcrombie

10:12 am on Sep 3, 2004 (gmt 0)

You'll need access to the raw logs to get the IP address - unless the full domain name appears in webalizer.

You also can't block 'rogue' robots using robots.txt - by definition they will ignore it.

You should block "Microsoft URL Control" using .htaccess on Apache or (I think) browsecap.ini on Windoze. Plenty of examples in WW.