Recently I discovered in my logfiles a pretty major error in the MSNBot. The situation is as follows:
I have links on my site in the form of: out.asp?URL=http://www.externalsite.com/index.asp
First of all MSNBot tries to index "/www.externalsite.com/index.asp" on my server.
Second, which is worse, only seconds later it tries to index pages linked to from www.externalsite.com on my server. For example, on the external site this link exists: www.externalsite.com/index.asp?id=123 and MSNBot tries to index this on my server: /index.asp?id=123
Oh - MSN advised me to change the URL's of my links, because it confused their bot...
Has anyone seen this before, and am I right in concluding that these are errors in MSNBot?
You may well be right and it is a bot problem, but I have noticed many many sites that have urls indexed on then. Be carful with the .asp?www.notmyurl.com stuff you may well end up getting banned as a scrapper. If it is trying to lift information from another server it may well find the other information and PUT it on your server.
>If it is trying to lift information from another server it may well find the other information and PUT it on your server.
Ok I realy am not 100% about this but ... if the method you are using is innocent, it may just be the fact that many other use this method to grab pages on thier own server from others sites.
I have found a number of .asp?myurl pages in search engines that are basically my content with my url in its url after a question mark, but ... the content has been cached and thus indexed as part of thier site. Enter duplicate content problems.
To be honest I dont fully understand this issue thus will leave it there. If you or anyone knows how this?myurl thing works it may explain what I am trying to describe. What is the asp page doing with the?urlhere is it cacheing it locally?
No, it's a method I use to detect outgoing clicks from my site. All it does is redirect to the page specified in the URL, but along the way it gets recorded in my logfiles.
BTW - the bot in question has no reverse DNS specified, and since I emailed MSN about it it has been scanning my site around once every hour - more specifically it's been scanning those asp?URL= links only. I'm starting to suspect it's a new bot they're testing.