Forum Moderators: mack

Message Too Old, No Replies

msnbot does not crawl

URL too long?

         

taps

11:36 am on Apr 5, 2006 (gmt 0)

10+ Year Member



Msnbot does not seem to crawl pages in my self written CMS. Yahoo and Google both crawl pages perfectly and add them to their index.

I recently put a forum and a blog onto some pages. Both are crawled and indexed by MSN. Thus the problem seems to be somewhere in my CMS.

Some facts:
- .htaccess and robots.txt are fine (at least I do think so)
- I'm using URLs like
www.widget.com/Keyword__kw.html
or
www.widget.com/this-is-the-title_123.html
- pages are validated - no html problems
- I have a sitemap containing links to any of my 3000 articles. Sitemap can be reached from the index page

On my largest site, some URLs are indexed - about 90 out of 3.000. No idea why. All other pages show the index page and maybe one or two more entries from my CMS.

I think the problem is about the URLs. Maybe they are too long or maybe MSN does not like underscores.

Any help will be appreciated.

[edited by: engine at 11:42 am (utc) on April 5, 2006]
[edit reason] TOS [/edit]

taps

3:24 pm on Apr 5, 2006 (gmt 0)

10+ Year Member



just to add that:
I had massive dupe content problems in Feb 2005. I fixed that last year. I don't see dupe entries in MSN. So that could not be a problem.

bumpaw

7:59 pm on Apr 11, 2006 (gmt 0)

10+ Year Member



I have a similar mystery. Out of 20 sites there is one that MSN will only take /index.php and the robots.txt file. The bot will only check these two files and no more. The site is in it's six month and Yahoo and Google spider index it as expected.

The site is fully dynamic PHP site and /index.php has all the Product Brands linked from it. There are way more brands than I would like, but the other two major SEs don't mind.