Forum Moderators: Robert Charlton & goodroi
J
index.html > index.asp
By changing the URL, you trip Google's duplicate content filter.
A Comment from googleguy:So what's the problem with a session id, and why doesn't Googlebot crawl them? Well, we don't just have one machine for crawling. Instead, there are lots of bot machines fetching pages in parallel. For a really large site, it's easily possible to have many different machines at Google fetch a page from that site. The problem is that the web server would serve up a different session-id to each machine! That means that you'd get the exact same page multiple times--only the url would be different. It's things like that which keep some search engines from crawling dynamic pages, and especially pages with session-ids.
Google can do some smart stuff looking for duplicates, and sometimes inferring about the url parameters, but in general it's best to play it safe and avoid session-ids whenever you can.
Google's Webmaster Technical Guidelines:
*Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.
*Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the
By changing the URL, you trip Google's duplicate content filter.
Very highly doubt this is a duplicate content filter, but rather it is the same as changing the path to every page on your site -- your site is 'new' again.
See Patent App. for details on 'site age'
IOW Changing the extention has the same effect as changing the actual name of every page.
If you are not too far removed from the changes, you might try to serve the new asp content at the old location with mod_rewrite.
Justin
[w3.org...]