homepage Welcome to WebmasterWorld Guest from 54.204.249.184
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
hiding .html file - duplicate content?
brnm98105




msg:4127999
 5:21 am on May 6, 2010 (gmt 0)

Hi,

I just reorganized my site

I created index.html files for everypage.

Instead of going to http://www.example.com/products.html

It now is

http://www.example.com/products/ which access a file called index.html within the products folder

It all works fine however I can still type http://www.example.com/products/index.html and see the same page.

Is this going to create a duplicate content problem. Some site I see using a similar structure lists when i do this it goes to their custom 404 page

Thanks in Advance

[edited by: jdMorgan at 3:59 pm (utc) on May 6, 2010]
[edit reason] example.com [/edit]

 

g1smd




msg:4128005
 5:57 am on May 6, 2010 (gmt 0)

There's just two steps to fix this.

Redirect requests for URL with index filename to URL ending in trailing slash.

Within your own site, link only to URL ending in trailing slash.

# Index Redirect back to bare folder URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

lavazza




msg:4128021
 7:14 am on May 6, 2010 (gmt 0)

I appreciate being able to read the solution, but... there's something I don't quite understand...

There's just two steps to fix this.
What is the problem that needs a fix?
brnm98105




msg:4128115
 1:13 pm on May 6, 2010 (gmt 0)

So i just add this to my htaccess file?

My current htaccess looks like this......

ErrorDocument 404 /404page.html
RewriteEngine on
rewritecond %{http_host} ^example.com [nc]
rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]

redirect 301 /products.html http://www.example.com/products/

This is what I rewrote it to look like.....

ErrorDocument 404 /404page.html
RewriteEngine on
rewritecond %{http_host} ^example.com [nc]
rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]

# Index Redirect back to bare folder URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

redirect 301 /products.html http://www.example.com/products/

Seems to work. I just want to make sure its correct.

[edited by: jdMorgan at 4:00 pm (utc) on May 6, 2010]
[edit reason] please use only example.com [/edit]

jdMorgan




msg:4128208
 4:12 pm on May 6, 2010 (gmt 0)

I'd suggest that you re-code your "Redirect" directive as a RewriteRule, so that you are not mixing mod_rewrite and mod_alias directives.

Otherwise, it is up to the server which module executes its directives first. Directives in .htaccess are processed on a per-module basis, and not strictly by order of appearance in the code. Apache .htaccess files are not scripts or sequentially-executed programs, but rather a list of directives. Each Apache module 'scans' the .htaccess file in turn, executing only the directives it understands. The order in which the modules scan the file is determined by the server configuration, which you typically do not control; Therefore a change made by your host could break mixed-mod_alias/mod_rewrite code by altering the execution order.

Having re-coded that line, you should next exactly-reverse the order of all three rules, so that the most-specific rule is first. This will prevent multiple redirects when a request arrives with more than one "problem" -- for example, your code will do two redirects in a row if a request arrives for "example.com/index.php"

Note that using [NC] on patterns in RewriteRules or RewriteConds is a waste of CPU time if that pattern contains no alphabetic characters.

If you do not use multiple subdomains, a much more comprehensive hostname canonicalization routine is

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]


I would recommend attention to proper casing -- mod_rewrite does not support "free-style" coding, and use of standard casing makes your code faster/easier for experienced reviewers to read. :)

Jim

g1smd




msg:4128297
 6:19 pm on May 6, 2010 (gmt 0)

The problem the "index redirect" is fixing, is simply one of Duplicate Content; stopping your index pages having two URLs that both return "200 OK" for the same content.

Let's see your final code with all the above suggestions in place.

Don't forget to add a # comment before each block of code. You'll thank yourself next year when you want to know what each bit is for.

brnm98105




msg:4128365
 7:49 pm on May 6, 2010 (gmt 0)

here is my final code. it seems to work. What do you think?

ErrorDocument 404 /404page.html
RewriteEngine on
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

# Index Redirect back to bare folder URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

redirect 301 /products.html http://www.example.com/products/

g1smd




msg:4128397
 8:12 pm on May 6, 2010 (gmt 0)

You missed at least three suggestions from preceding posts.

Two of those are important. Please check the preceding posts again in detail, especially post#4128208.

The issues are: Redirect should be RewriteRule, the rule order should be reversed so that 1 - 2 - 3 should be 3 - 2 - 1 , and please add # comments to each code block in the style of the single example already shown.

brnm98105




msg:4128456
 9:46 pm on May 6, 2010 (gmt 0)

I'm sorry i'm totally confused:( can you give me a basic code for a rewrite of a page I moved and renamed? I know how to do redirects but not rewrites. Thanks

So it should be

ErrorDocument 404 /404page.html
# Redirect pages
redirect 301 /products.html http://www.example.com/products/

# Index Redirect back to bare folder URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

# Rewrite Domain
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

g1smd




msg:4128465
 10:19 pm on May 6, 2010 (gmt 0)

The redirect that currently uses a Redirect directive, needs to be recoded using a RewriteRule directive, so that all three of your redirects use RewriteRule directives.

That is, you currently have three redirects. Two of those are coded using RewriteRule. One of those is coded using Redirect. All three need to use RewriteRule.

Your final # comment should read # REDIRECT non-canonical domain to www. These are all redirects. There are no rewrites here.

brnm98105




msg:4128473
 10:46 pm on May 6, 2010 (gmt 0)

G1smd I really appreciate your help:)

I used - redirect 301 /products.html http://www.example.com/products/ as an example

I have 30 pages that have been moved or deleted. Some are changing levels for example /example1.html is moving to /examples/example1/ a whole new area so a redirect makes sense doesnt it? especially with 30 pages. My original structure didn't make as much sense as my new structure. So would I have to do a rewrite for all 30 pages being moved or deleted?

Ive searched and search for an example of how to do a page rewrite and not finding an answer anywhere that makes sense to me.

Here is a new example with the change you recommend.

ErrorDocument 404 /404page.html
RewriteEngine on
# Redirect pages
redirect 301 /example.html http://www.example.com/example/blue/
redirect 301 /example.html http://www.example.com/example2/
redirect 301 /example.html http://www.example.com/example3/
redirect 301 /example1.html http://www.example.com/example1/
redirect 301 /example2.html http://www.example.com/example/

# Index Redirect back to bare folder URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

# REDIRECT non-canonical domain to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

jdMorgan




msg:4128492
 11:23 pm on May 6, 2010 (gmt 0)

We need better examples of these Redirects, showing paths that are diverse, accurate, and also meaningful to you.

The main problem I'm having is that the first three Redirects here all seek to redirect the same requested URL to three different destination URLs, so only the first would ever be invoked.

redirect 301 /example.html http://www.example.com/example/blue/
redirect 301 /example.html http://www.example.com/example2/
redirect 301 /example.html http://www.example.com/example3/
redirect 301 /example1.html http://www.example.com/example1/
redirect 301 /example2.html http://www.example.com/example/

You might understand that contributors here are leery of trying to help with poorly-described problems, and do not favor repeating themselves -- attention to detail is required for this reason and because a tiny error might well put you out of business.

As an example, though, the first Redirect directive above would be replaced with a RewriteRule like this:

RewriteRule ^example\.html$ http://www.example.com/example/blue/ [R=301,L]

Jim

jdMorgan




msg:4128506
 11:47 pm on May 6, 2010 (gmt 0)

I also want to comment that in my opinion, you're really going about this the hard way, and in addition creating a long-term maintenance nightmare with basically a bunch of directories containing one 'page' each.

It would likely be far easier in the long term to simple add one rule to rewrite extensionless URL requests to .html files, and another rule to redirect direct client requests for .html URLs back to the corresponding extensionless URLs.

This would make maintaining the site a lot easier over the long term, and is the way that most Webmasters choose to do this. See this thread on implementing extensionless URLs [webmasterworld.com].

Jim

brnm98105




msg:4128519
 12:22 am on May 7, 2010 (gmt 0)

I totally understand JD. Sometimes explaining the problem is hard. I will try in the future to explain a little more clearer.

So heres what I have an it WORKED. Is this correct and in the right order?

ErrorDocument 404 /404page.html
RewriteEngine on

# Redirect pages
RewriteRule ^example/examples1\.html$ http://www.example.com/examples1/ [R=301,L]

# Index Redirect back to bare folder URL
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.(html?|php)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(html?|php)$ http://www.example.com/$1 [R=301,L]

# REDIRECT non-canonical domain to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

[edited by: jdMorgan at 1:41 am (utc) on May 7, 2010]
[edit reason] example.com [/edit]

g1smd




msg:4128525
 12:25 am on May 7, 2010 (gmt 0)

I've searched and search for an example of how to do a page rewrite and not finding an answer anywhere that makes sense to me.

As I said before, you already have two redirects coded using "RewriteRule" (mod_Rewrite) code, so you'd just need to follow those as a guide, and stop using "Redirect" (mod_Alias) code.

Again, you have no page rewrites here. You have redirects. You need redirects. However you need them all to use "RewriteRule" code, not "Redirect" code.

jdMorgan




msg:4128540
 1:44 am on May 7, 2010 (gmt 0)

From our Apache Forum Library: What's the difference between external and internal redirects? [webmasterworld.com]

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved