Forum Moderators: phranque
RewriteEngine on
#
# Redirect requests for index.html in any directory to "/" in the same directory
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(.+/)?index\.html\ HTTP
RewriteRule ^(.+/)?index\.html$ http://www.example.com/$1 [R=301,L]
#
I only use .html for page extensions. It appears that Google Webmaster Tools is trying to find index.htm links. (?) It says that the source is from my own page - but there is no such link on the source page. Is there a way to force both an index.html AND index.htm redirect to the root folder with the rewrite code?
RewriteEngine on
#
# Redirect requests for index.html in any directory to "/" in the same directory
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.example.com/$1 [R=301,L]
#
Jim
So, Mr. bone-head linker clicks on his bogus link to check it, sees a page, and goes away happy about his new link, because he failed to notice that his address bar changed to "/".
So now, all the search engines have to go through an extra step to credit PR/link-pop from that bogus link, and humans have to wait for the redirect every time they click that link.
If you'd let that bogus link 404, then Mr. bone-head *might* have noticed something wrong.
As a result, I usually hold off on "handling all possible cases" until I see a situation where the above behavior can be balanced against the worth of the link. If it's from some obscure blog or forum, I might ignore it, post a corrective blog comment, or drop the Webmaster an e-mail. On the other hand, if the link was from CNN's home page, you can bet I'd redirect it! :)
There are valid arguments both ways; As with blocking user-agents and "countries," this is one of those decisions that should be made in an informed manner by each Webmaster individually.
Jim
For a long time, I had my index.html pages indexed without a redirect. A few months ago, I implemented the redirect code to the root to avoid duplicate content. It appears that Googlebot is "making up" or "fishing for" the .htm extension when the links I have are specifically for just the root.
One thing I don't have on my site is a base href. Is this still being used? If so, what's the format and where do you put it?
I haven't added the base href in the head just yet.
I just ran xml-sitemap generator to create my sitemap, and what used to work really well - just spun in circles. It was trying to spider a structure like this:
examplepage.html/http://www.example.com/reallylongpathnamerepeatingseveraltimes/http://www.example.com/reallylongpathnamerepeatingseveraltimes/http://www.example.com/reallylongpathnamerepeatingseveraltimes/
I use SSIs on my page that use root relative links. Will adding the base href in the head help or hurt this? Also...should the base be
http://www.example.com
or
http://www.example.com/ ?
I would think the first choice because if all other links begin with a slash (/), wouldn't using the domain with the slash (/) create a double slash in the URL?
You should not have to "do anything extra" to use root-relative links on your site, it should just work. The <base href> stuff is not required.
If you do use <base href>, then as posted above, the value is the URL of the page you are adding the <base href> to -- i.e. the <base href> of this page is this page's URL.
I've only ever used <base href> once -- on the home page of one site, to help speed up recovery from a bad indexing problem I had accidentally created with a typo... :o
Jim
[edited by: jdMorgan at 11:40 am (utc) on Nov. 4, 2008]
Prior to making any linking changes, I linked to all my index.html pages. A thread in the Google forum said it was best to link to the root of folders(directories). So I made that change and created an .htaccess file to redirect.
Using document relative links within a site folder back to the root of the folder created the following link: href="./"
The simulators (and Google) appear to have a problem with ./
I've had a long stand ranking in Google for several years and Sunday I noticed traffic cut in half because of what it thinks my linking structure is. I started these changes (using ./) about 3 weeks ago. Even though I had a sitemap...when it tries to follow the links internally, it drops the folder name of the URL so obviously gets a 404.
On the handful of folders that I changed the directory structure to include the full path seem to be spidering fine in the simulators. I won't know if it fixes the sitemap generator until all document relative links are changed.
The frustrating thing is the site works perfectly in the browsers.
Canonical home page link: <a href="http://www.example.com/">
Canonical page link: <a href="http://www.example.com/page.html">
Canonical subdirectory index link: <a href="http://www.example.com/subdir1/">
Canonical subdirectory page link: <a href="http://www.example.com/subdir1/page.html">
Server-relative home-page link: <a href="/">
Server-relative page link: <a href="/page.html">
Server-relative subdirectory index page link: <a href="/subdir1/">
Server-relative subdirectory page link: <a href="/subdir1/page.html">
Page-relative link: <a href="page-in-this-directory.html">
Page-relative link: <a href="../page-in-directory-above-this-subdirectory.html">
Page-relative link: <a href="../../page-in-directory-two-levels-above-this-subdirectory.html">
Page-relative link: <a href="subdir/page-in-subdirectory-below-this-directory.html">
Page-relative link: <a href="subdir1/subdir2/page-in-subdirectory-two-levels-below-this-directory.html">
Also, check to be sure you don't have any mod_rewrite or mod_alias directives which are interfering, and be sure to completely-flush your browser cache before testing any new code or newly-uploaded pages.
Jim