Regent - 11:37 pm on Oct 8, 2010 (gmt 0)
I don't post much - rather read. So be gentle - pls. I am not sure I found something critical or just an anomalous condition in GoogleLand. So I am asking the community for a second opinion and corroboration. If my observation is widespread, this could be a significant issue form many.
Google webmaster tools reported several strange URLs that it reported as 404 error pages. A test with a http header checker confirmed the 404 error status. These pages happened to be critical SEO pages on a site. Further, Google was not reporting these pages with the correct path which would explain the 404 errors. But after checking the source code for the links to these SEO pages and performing a xenu crawl, I could not find how Google found path/pages that did not exist. Furthermore, the correct path to these SEO pages were in the sitemap.xml that had been registered with Google for months. Yet Google had no mention of the correct path/page in the webmaster tools crawl logs.
Google was reporting 404 errors for pages with paths that did not exist and was not reporting the correct path/page registered in the sitemap.xml file.
You should also know this is a Joomla website and all Joomla websites have a <base> tag that is dynamically created for each page and every page. The path and file name in the <base> tag is equal to the current page, which is the proper coding according to W3C.
The best way to convey what I saw is through an example.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<base href=http://example.com/path1/path2/doc1234.html />
FF and IE browsers resolved to http://example.com/seo_page.html.
The source code shows the following:
FF3.6 ==>> <a href="http://example.com/seo_page.html">
IE8 ==>> <a href="/seo_page.html">
Google cache page viewed in FF3.6 ==>> <a href="/seo_page.html">
But Google tried to find http://example.com/path1/path2/seo_page.html, which does not exist. Further, Google never found http://example.com/seo_page.html even though the URL was part of the sitemap and plenty of other site links to the target pages with correct path/page existed. Technically, Google was right - the browsers, both IE and FF, were wrong (but the code worked in the browser just fine). The code even worked in Google's cache page where the links reside. Even though there were 10 correctly coded links throughout the site for every 1 that had an incorrect path (based on the <base> tag) that Google saw, Google never reported the correct path/pages.
With all of Google's canonical challenges, do we have a real issue? Are browsers rendering code differently than Google is? Do the browsers consider the preceding "/" in each href to indicate a path/file from the domain level (as so many webmasters think it does without regard to <base>)?
Rather than pulling the trigger too fast and pronouncing a real problem, I am asking the WebmasterWorld community for corroboration. Has anyone else seen this issue - or were the stars aligned just right for me alone :-) ?