Welcome to WebmasterWorld Guest from 54.205.96.97

Forum Moderators: mack

MSNBot scans incorrectly

Wrong links are followed, and wrong servers indexed

   
12:46 pm on Nov 22, 2005 (gmt 0)

10+ Year Member



Recently I discovered in my logfiles a pretty major error in the MSNBot. The situation is as follows:

I have links on my site in the form of:
out.asp?URL=http://www.externalsite.com/index.asp

First of all MSNBot tries to index "/www.externalsite.com/index.asp" on my server.

Second, which is worse, only seconds later it tries to index pages linked to from www.externalsite.com on my server. For example, on the external site this link exists:
www.externalsite.com/index.asp?id=123
and MSNBot tries to index this on my server:
/index.asp?id=123

Oh - MSN advised me to change the URL's of my links, because it confused their bot...

Has anyone seen this before, and am I right in concluding that these are errors in MSNBot?

2:20 pm on Nov 22, 2005 (gmt 0)

10+ Year Member



You may well be right and it is a bot problem, but I have noticed many many sites that have urls indexed on then. Be carful with the .asp?www.notmyurl.com stuff you may well end up getting banned as a scrapper. If it is trying to lift information from another server it may well find the other information and PUT it on your server.
8:26 am on Nov 23, 2005 (gmt 0)

10+ Year Member



I see many other sites using the same principle. I think that as long as my page is not absolutely loaded with these links I'll be fine - I have less than 10 links in my portfolio.

That last sentence I don't understand. Could you elaborate please?

5:19 pm on Nov 23, 2005 (gmt 0)

10+ Year Member



>If it is trying to lift information from another server it may well find the other information and PUT it on your server.

Ok I realy am not 100% about this but ... if the method you are using is innocent, it may just be the fact that many other use this method to grab pages on thier own server from others sites.

I have found a number of .asp?myurl pages in search engines that are basically my content with my url in its url after a question mark, but ... the content has been cached and thus indexed as part of thier site. Enter duplicate content problems.

To be honest I dont fully understand this issue thus will leave it there. If you or anyone knows how this?myurl thing works it may explain what I am trying to describe. What is the asp page doing with the?urlhere is it cacheing it locally?

9:31 am on Nov 24, 2005 (gmt 0)

10+ Year Member



No, it's a method I use to detect outgoing clicks from my site. All it does is redirect to the page specified in the URL, but along the way it gets recorded in my logfiles.

BTW - the bot in question has no reverse DNS specified, and since I emailed MSN about it it has been scanning my site around once every hour - more specifically it's been scanning those asp?URL= links only. I'm starting to suspect it's a new bot they're testing.

The IP is 65.55.246.35.

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month