homepage Welcome to WebmasterWorld Guest from 54.198.157.6
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
MSNBot scans incorrectly
Wrong links are followed, and wrong servers indexed
Zaphod Beeblebrox




msg:1532400
 12:46 pm on Nov 22, 2005 (gmt 0)

Recently I discovered in my logfiles a pretty major error in the MSNBot. The situation is as follows:

I have links on my site in the form of:
out.asp?URL=http://www.externalsite.com/index.asp

First of all MSNBot tries to index "/www.externalsite.com/index.asp" on my server.

Second, which is worse, only seconds later it tries to index pages linked to from www.externalsite.com on my server. For example, on the external site this link exists:
www.externalsite.com/index.asp?id=123
and MSNBot tries to index this on my server:
/index.asp?id=123

Oh - MSN advised me to change the URL's of my links, because it confused their bot...

Has anyone seen this before, and am I right in concluding that these are errors in MSNBot?

 

stinkfoot




msg:1532401
 2:20 pm on Nov 22, 2005 (gmt 0)

You may well be right and it is a bot problem, but I have noticed many many sites that have urls indexed on then. Be carful with the .asp?www.notmyurl.com stuff you may well end up getting banned as a scrapper. If it is trying to lift information from another server it may well find the other information and PUT it on your server.

Zaphod Beeblebrox




msg:1532402
 8:26 am on Nov 23, 2005 (gmt 0)

I see many other sites using the same principle. I think that as long as my page is not absolutely loaded with these links I'll be fine - I have less than 10 links in my portfolio.

That last sentence I don't understand. Could you elaborate please?

stinkfoot




msg:1532403
 5:19 pm on Nov 23, 2005 (gmt 0)

>If it is trying to lift information from another server it may well find the other information and PUT it on your server.

Ok I realy am not 100% about this but ... if the method you are using is innocent, it may just be the fact that many other use this method to grab pages on thier own server from others sites.

I have found a number of .asp?myurl pages in search engines that are basically my content with my url in its url after a question mark, but ... the content has been cached and thus indexed as part of thier site. Enter duplicate content problems.

To be honest I dont fully understand this issue thus will leave it there. If you or anyone knows how this?myurl thing works it may explain what I am trying to describe. What is the asp page doing with the?urlhere is it cacheing it locally?

Zaphod Beeblebrox




msg:1532404
 9:31 am on Nov 24, 2005 (gmt 0)

No, it's a method I use to detect outgoing clicks from my site. All it does is redirect to the page specified in the URL, but along the way it gets recorded in my logfiles.

BTW - the bot in question has no reverse DNS specified, and since I emailed MSN about it it has been scanning my site around once every hour - more specifically it's been scanning those asp?URL= links only. I'm starting to suspect it's a new bot they're testing.

The IP is 65.55.246.35.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved