Regent - 9:56 pm on Oct 11, 2010 (gmt 0)
First, thanks for all your input. Yet the thread is starting to look like so many others I have seen on this same subject – how to use <base> and what does it do. After looking at what seemed like hundreds of posts – including debates in the Joomla forum and reading W3C’s explanation, it seems pretty clear that there is no consensus. And it is this very point that concerns me. By the way, the <base> tag is included in Joomla’s hearer jdoc include. Joomla always produces the <base> as the path/page you are currently on. I could hack it but I think it is correct.
The real issue here is how is Google interpreting the <base> tag.
It was the strange reportings in Google’s Webmaster Tools (WMT) account that prompted this thread. Here are some other meaningful notes on my observations:
* The site was a re-launch of a domain that has been around for many years (7+ years).
* The SEO pages were additions – brand new content written specifically for the site. These SEO pages do not have dup or near-dup content of any kind
* The re-launch took place in the May-June time frame (shortly after the May-Day update as I recall).
* The sitemap.xml file has had the correct SEO page path and file name since its relaunch. The sitemap.xml file was included in the site’s root and registered with Google.
* As of last week, WMT reported no SEO pages indexed, yet did report SEO pages under an incorrect path.
* Links to the SEO pages exist on every site page – in the footer as a SEF drop-up, 100% CSS menu. Links to the SEO pages also exist on one sub-menu.
* Links to the SEO pages are also on two other page of the site that produce right Urls for browsers but I suspect where Google got the incorrect path. These 2 pages are:
As you can see, these are pages for resetting a password or reminding a password for admin access. These are the only 2 pages on the site that could possibly produce what Google recorded in WMT. The base on each one of these pages is:
<base href="http://www.domain.com/component/user/reset.html" />
<base href="http://www.domain.com/component/user/remind.html" />
* The SEO page URLs that Google reported in WMT
example: www.domain.com/component/user/SEO_PAGE1.html. The browser resolves to www.domain.com/SEO_PAGE1.html
* As Tedster rightfully points out, Google could have gotten the incorrect SEO page URLs from some other source. Perhaps in an email. But it is pretty unlikely that these pages would have gotten out into Google’s spidering range as they are not very popular.
* The correct URLs for the SEO pages did not appear in any of WMT internal links. Only the incorrect path pages did, and they produced a 500 error.
* As of this writing, all SEO pages with correct URLs respond properly to an “info:” query. All have cache pages. But none of the SEO pages have PR value. The home page of the site is a PR4.
To make a long story short, everything on the outside (info: query, browser resolve to the correct URL) looks right. But everything on the inside of WMTs looks wrong (absence of the correct path/SEO_page URLs, reporting of wrong page/SEO_page URLs).
There are only a few possibilities:
1. Google WMT is mis-reporting
2. Google did find legitimate links to incorrect path/SEO_page URLs
3. Google does have a canonical issue and the way they handle <base> relative to the way browsers do.
[edited by: tedster at 9:59 pm (utc) on Oct 11, 2010]