Forum Moderators: Robert Charlton & goodroi
<base href="http://www.-----.com/" />
<a href="news/">News</a>
<a href="contact/">Contact</a>
Note the / stands after the tld and not before the link url.
Googlebot ignores the base href and spiders the website like this:
[-----.com...] ans so on: [-----.com...] etc.
Anybody else seeing this problem? It can cause a lot of bandwidth-usage on some occasions. Especially on sites with dynamic url's that are handled by one file that return a 200 OK on all addresses.
I know, the last solution is not good but it also shouldn't give a problem. Other spiders and browsers work fine.
For example, if the page is in the /news/ directory and it has a relative link to another page in the /news/ directory, the base tag above is telling the user agent not to use the /news/ directory in the file path, but to calculate it from the domain root. The correct value for the base href tag is the fully qualified absolute address of the page itself.
What kind of error recovery a bot might have at that point would only be guesswork for us, but technically it would have a error to cope with
Sometimes it can go recursive through a site and index url's like this: www.-----.com/news/contact/news/contact/news/contact/ etc.
Extra info: this is not something that is broken for ages, this is a new bug that I'm seeing more and more.
[edited by: NedProf at 2:22 pm (utc) on Jan. 3, 2007]