Googlebot is confused! - Appending static pages on dynamic dirs - (deprecated) Google News Archive forum at WebmasterWorld

OK I recently switched my URL's from ugly URL's like....
mysite.com/cgi-bin/search.cgi?directory=XXXXX
to
mysite.com/cat/subcat/subsubcat

There is a .htaccess file that does all this for me. Problem is, in my header and footers, I have links to "about_us.html", "shipping.html" ... etc. I had my CGI programmer make this .htacess file, so I don't know too much about it, I just told him I wanted every cgi-generated page to be made into subcats instead.

Now I got googlebot trying to GET pages like this

1889703: 64.68.82.35 - - [15/Feb/2003:20:46:56 -0500] "GET /cat/subcat/subsubcat/shipping.html HTTP/1.0" 200 62 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

This is bad, because if it's going to do this for every combination of page I have in the header/footer for every single URL, I'll end up with a TON of duplicate content and hits. (not good)

I don't want to change the links in the header/footer to absolute URL's like mysite.com/shipping.html but rather leave it as /shipping.html, both to conserve on file sizes and to make it easier to copy content to other domain names if/when that is required.

So how do I fix this problem? Do I have to 301 redirect all requests to mysite.com/wrongcat/shipping.html to mysite.com/shipping.html? Will that stop googlebot from trying to get it in the future? Do I just ignore this problem, because Google will realize it's all dupe content and stop indexing this files?

Why does Google even go to these incorrect URL's? You can obviously clickthrough to the right ones on the site.

Thanks for your help :)

Googlebot is confused! - Appending static pages on dynamic dirs

Help me please :)

born2drv

Brett_Tabke

born2drv

Brett_Tabke

born2drv

andreasfriedrich

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week