Forum Moderators: open

Message Too Old, No Replies

Googlebot is confused! - Appending static pages on dynamic dirs

Help me please :)

         

born2drv

9:19 am on Feb 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK I recently switched my URL's from ugly URL's like....
mysite.com/cgi-bin/search.cgi?directory=XXXXX
to
mysite.com/cat/subcat/subsubcat

There is a .htaccess file that does all this for me. Problem is, in my header and footers, I have links to "about_us.html", "shipping.html" ... etc. I had my CGI programmer make this .htacess file, so I don't know too much about it, I just told him I wanted every cgi-generated page to be made into subcats instead.

Now I got googlebot trying to GET pages like this

1889703: 64.68.82.35 - - [15/Feb/2003:20:46:56 -0500] "GET /cat/subcat/subsubcat/shipping.html HTTP/1.0" 200 62 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

This is bad, because if it's going to do this for every combination of page I have in the header/footer for every single URL, I'll end up with a TON of duplicate content and hits. (not good)

I don't want to change the links in the header/footer to absolute URL's like mysite.com/shipping.html but rather leave it as /shipping.html, both to conserve on file sizes and to make it easier to copy content to other domain names if/when that is required.

So how do I fix this problem? Do I have to 301 redirect all requests to mysite.com/wrongcat/shipping.html to mysite.com/shipping.html? Will that stop googlebot from trying to get it in the future? Do I just ignore this problem, because Google will realize it's all dupe content and stop indexing this files?

Why does Google even go to these incorrect URL's? You can obviously clickthrough to the right ones on the site.

Thanks for your help :)

Brett_Tabke

9:28 am on Feb 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Gbot supports the "/foo.html" type of addressing with a simple / at the start. Why not use that?

born2drv

9:34 am on Feb 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Brett for the quick reply. Is that all I need to do? Sounds simple enough.... I will try it.. if it works my programmer will be happy because he won't have any work to do :)

Thanks again....

Brett_Tabke

9:55 am on Feb 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If I understand correctly, that should be all you have to do b2d. I think it needs more explanation though.

What are the current links being generated in the html as?

If gbot is requesting /cat/subcat/subsubcat/shipping.html then what is the link showing as in the html?

born2drv

10:02 am on Feb 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Currently,the links are like this.....
<a href="shipping.html">Shipping</a>

they resolve to, and mouse over to:
[mysite.com...]

So I just changed all the the header and footer links to this....
<a href="/shipping.html">Shipping</a>

Hopefully that fixes it.

andreasfriedrich

12:10 pm on Feb 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlebot is not confused, itīs showing the correct behaviour!

Have a look at Path Information - The Base Element - <base href="http://www.domain.com/"> [webmasterworld.com] and Bag-O-Tricks for PHP II - some code snippets that should be helpful for all in creating dynamic sites [webmasterworld.com] for more information on how relative URIs are resolved.

Andreas