RonnieG - 12:50 am on Nov 15, 2006 (gmt 0)
There is absolutely no way for G to know that these url's point to the same content (page) or a seperate copy of the same content (page).
True, not during the crawl, it doesn't. As a cloned copy of the same content on different absolute url/page is crawled, a separate docID would be assigned. For multiple relative urls pointing to the same page/absolute url, as long as the redirects are done properly and the absolute url is not masked, the result should be the same absolute url and docID for both, and G will simply replace the previous crawl results with the new results under the same docID. However, not everything G does is done during the crawl. Recognizing that a page is cloned/duplicate content is a separate off-line batch process. When it gets around to its off-line batch processing of the crawl results, G will finally see and recognize duplicate content on more than one docID, and only then will it decide what to do with one page or the other.