homepage Welcome to WebmasterWorld Guest from 184.73.52.98
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
<base href> ignored by Googlebot?
NedProf




msg:3205743
 2:20 pm on Jan 2, 2007 (gmt 0)

It looks like Googlebot has a problem with the base href element when this construction is used:


<base href="http://www.-----.com/" />
<a href="news/">News</a>
<a href="contact/">Contact</a>

Note the / stands after the tld and not before the link url.

Googlebot ignores the base href and spiders the website like this:
[-----.com...] ans so on: [-----.com...] etc.

Anybody else seeing this problem? It can cause a lot of bandwidth-usage on some occasions. Especially on sites with dynamic url's that are handled by one file that return a 200 OK on all addresses.

I know, the last solution is not good but it also shouldn't give a problem. Other spiders and browsers work fine.

 

NedProf




msg:3206706
 11:51 am on Jan 3, 2007 (gmt 0)

And I see an other victim of this bug, anybody else seeing this?

Fox_Mulder




msg:3206729
 12:34 pm on Jan 3, 2007 (gmt 0)

<base href="http://www.-----.com/" />

I have it like this for years and never had a problem.

tedster




msg:3206757
 1:11 pm on Jan 3, 2007 (gmt 0)

<base href="http://www.example.com/" /> can be a problem if you use it on a page that is in an interior directory.

For example, if the page is in the /news/ directory and it has a relative link to another page in the /news/ directory, the base tag above is telling the user agent not to use the /news/ directory in the file path, but to calculate it from the domain root. The correct value for the base href tag is the fully qualified absolute address of the page itself.

What kind of error recovery a bot might have at that point would only be guesswork for us, but technically it would have a error to cope with

NedProf




msg:3206793
 2:13 pm on Jan 3, 2007 (gmt 0)

tedster: you are totally right, and that's also exactly my point: all robots and browser do it right except Googlebot. Googlebot just ignores the base href element and calculates the wrong path.

Sometimes it can go recursive through a site and index url's like this: www.-----.com/news/contact/news/contact/news/contact/ etc.

Extra info: this is not something that is broken for ages, this is a new bug that I'm seeing more and more.

[edited by: NedProf at 2:22 pm (utc) on Jan. 3, 2007]

tedster




msg:3206831
 2:50 pm on Jan 3, 2007 (gmt 0)

I remember one report like this last summer, and I was not able to pin down any reason for it that the site could be reponsible for. Just make sure your server is returning a true 404 status code for those bad urls. Many .NET sites return a 302 to serve a custom error page -- that can spell trouble, especially if you combine it with this issue.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved