Forum Moderators: open
There is a .htaccess file that does all this for me. Problem is, in my header and footers, I have links to "about_us.html", "shipping.html" ... etc. I had my CGI programmer make this .htacess file, so I don't know too much about it, I just told him I wanted every cgi-generated page to be made into subcats instead.
Now I got googlebot trying to GET pages like this
1889703: 64.68.82.35 - - [15/Feb/2003:20:46:56 -0500] "GET /cat/subcat/subsubcat/shipping.html HTTP/1.0" 200 62 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
This is bad, because if it's going to do this for every combination of page I have in the header/footer for every single URL, I'll end up with a TON of duplicate content and hits. (not good)
I don't want to change the links in the header/footer to absolute URL's like mysite.com/shipping.html but rather leave it as /shipping.html, both to conserve on file sizes and to make it easier to copy content to other domain names if/when that is required.
So how do I fix this problem? Do I have to 301 redirect all requests to mysite.com/wrongcat/shipping.html to mysite.com/shipping.html? Will that stop googlebot from trying to get it in the future? Do I just ignore this problem, because Google will realize it's all dupe content and stop indexing this files?
Why does Google even go to these incorrect URL's? You can obviously clickthrough to the right ones on the site.
Thanks for your help :)
they resolve to, and mouse over to:
[mysite.com...]
So I just changed all the the header and footer links to this....
<a href="/shipping.html">Shipping</a>
Hopefully that fixes it.
Have a look at Path Information - The Base Element - <base href="http://www.domain.com/"> [webmasterworld.com] and Bag-O-Tricks for PHP II - some code snippets that should be helpful for all in creating dynamic sites [webmasterworld.com] for more information on how relative URIs are resolved.
Andreas