Regent - 3:27 pm on Oct 12, 2010 (gmt 0)
As Tedster has suspected, there is more to this story. After several hours of further research and experimentation, I believe I have figured out what has happened.
There are 2 conditions that have led to my confusion. For the sake of others that follow, I will outline them below.
First, Joomla is a CMS framework. Most of you know that but what may not be clear and was not obvious to me is that all URLs constructed by the framework are based on the paths and pages off the root domain. In other words, the framework creates output HTML with a link that looks like /path1/path2/page.html regardless of what page the link is on. Furthermore, in-content or navigation links are all relative (without a leading "/"). Joomla editors will produce a back-end link with "../page.html" or "page.html". The CMS engine converts either link to the proper front end output.
The condition is further complicated with a SEF engine. This engine uses re-write rules to convert URLs with parameters to nice looking SEF URLs.
To make a long story short(er), the <base> tag is a Joomla front-end output tag that has no affect on normal <a> tag links in Joomla because the CMS engine takes care of creating clean SEF links that have a complete path. So what threw me off was placing an <a> tag link like this: href="SEO_Page1.html" on a page that had a URL that looked like /path1/path2/page.html with a <base href=www.example.com/path1/path2/page.html">.
Now, about the /path1/path2/SEO_pages.html URLs that Google picked up. I had to go back to some of the original code to figure out this one. Turns out the Joomla CMS engine figures out <a> tags just fine. And for the most part, does a good job to make links SEF with some re-write rules in the .htaccess file. But what it does not do is clean up <option value="SEO_page.html"> links. Very early in the sites design, SEO page URLs were placed in a drop list in the footer. Well - you can see where this is going. The Joomla engine did not correct these <option> tag links and the <base> tag OR current page path took over. Bingo, URLs that lead to pages that did not exist. I can not explain how or why Google decided to crawl these obscure pages when it has not done a good job crawling sitemap.xml pages.
Now I am writing "Tedster, you were right" over and over. :-)
The moral of this story is two fold. When dealing with CMS, the back-end code gets manipulated by the CMS engine and re-write rules. Although there has been a long standing debate on the pros and cons of using <base> tags, in Joomla's case, <a> tags are manipulated and re-written such that its <base> tags have no affect. I am sure there is good reason as mentioned in earlier posts that Joomla decided to add a <base> tag to their output. The second lesson is that CMS systems may not handle <option> tags (or other link code) the same way as <a> tags. In Joomla's case, they are different and caused output issues depending on the page where code was placed.