homepage Welcome to WebmasterWorld Guest from 54.211.70.79
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Index file, Canonical Page & Loops
Various Codes - Which best to use?
actolearn




msg:4524258
 1:28 am on Dec 2, 2012 (gmt 0)

Hello there ~

Time for me to do some cleanup in my htaccess file. In researching I found some different code than what I've been using so my questions are below.

# index file to end with com that I've been using
RewriteCond %{THE_REQUEST} ^.*/index\.html
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]
.............

# RECENTLY found this one - it does away with the RewriteCond line above. Which one should I be using?
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]
_________________________________________________

# all non www to www - this is the one I use
RewriteCond %{HTTP_HOST} ^example.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
...................

# RECENTLY found this one - which is correct?
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]


ALSO, I've read about infinite loops, etc - but how can I tell if that's going on? Is there a way for me to check if something I've coded in my htaccess file is endlessly looping?

Thx for any help.

 

g1smd




msg:4524259
 2:09 am on Dec 2, 2012 (gmt 0)

Never use
.* at the beginning or in the middle of a RegEx pattern.

Make sure you escape all literal periods in patterns.

If you remove the
RewriteCond in the index redirect you will get an infinite loop.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html http://www.example.com/$1 [R=301,L]

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


You'll find this code, and a longer description, in several hundred other threads in this forum.

Use the Live HTTP Headers extension for Firefox to examine the server headers. An unwanted multiple step redirection chain will be easy to spot in the results. An infinite loop will result in a browser error message and no access to the content.

SevenCubed




msg:4524262
 2:25 am on Dec 2, 2012 (gmt 0)

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html http://www.example.com/$1 [R=301,L]


That one doesn't work. I've tried it over and over.

It redirects to:
example.com/subfolder instead of example.com/subfolder/

And when that happens it throws a 403 access denied message due to blocking directory browsing.

g1smd




msg:4524269
 2:43 am on Dec 2, 2012 (gmt 0)

There's nothing in those two rules that redirects to a new URL removing the trailing slash.

You must have another rule somewhere else in your site configuration that does that.

Use the Live HTTP Headers extension for Firefox to see what happens when you make a request to the server. I suspect you'll see an unwanted multiple step redirection chain.

SevenCubed




msg:4524270
 3:07 am on Dec 2, 2012 (gmt 0)

Okay thanks. I'll have to keep digging around.

I had a problem with another snippet from here too the other day but eventually figured it out. I came back here to offer a heads up but couldn't find the thread anymore :(

In that one the redirect from an old domain to a new one (301) was adding an additional trailing slash (example.com//subfolder/). I eventually figured out that the example posted didn't take into account the handling on the new domain when the redirects arrived there.

I don't have this problem right now but I like to follow along in this Apache forum to learn. I'll try to figure it out based on what you are saying, just not tonight, too tired.

Ha, I should have know better than to object against your code!

actolearn




msg:4524272
 3:36 am on Dec 2, 2012 (gmt 0)

Thank you, g1smd. I have installed Live HTTP Headers extension for Firefox to examine the server headers as you suggested. And I'll change my code as referenced above and research some more before coming back.

actolearn




msg:4525508
 1:25 am on Dec 6, 2012 (gmt 0)

I've provided some basic info/not sure how much is needed.
Ecommerce built from scratch, no CGI system.

Older product files and category urls have html exts
Newer product urls have php exts
ALL files have PHP info from DB table

Questions: All in correct order?

Slashes all over - are these correct?

(I've only included 1 of each instance since once I get these right, I can fix the rest.)

Installed Live HTTP Headers extension. Not sure I have any infinite loops since I don't know what to look for. Would I get a message "infinite loop"?

Any help is appreciated...

RewriteEngine on
# Use PHP5 as default
AddHandler application/x-httpd-php5 .php .html


# redirect after changed file name
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteRule ^oldfile$ "http\:\/\/www\.example\.com\/newfile\.html" [R=301,L]

# html file to php because I gave an old file wrong ext
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteRule ^file\.html$ "http\:\/\/www\.example\.com\/file\.php" [R=301,L]

# underscore to hyphen
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteRule ^file_name\.html$ "http\:\/\/www\.example\.com\/file-name\.html" [R=301,L]

# hiding index.html in url
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html http://www.example.com/$1 [R=301,L]

# redirect non-www to www done, done dec 2
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

# keep last
ErrorDocument 404 /notfound.html

lucy24




msg:4525538
 5:57 am on Dec 6, 2012 (gmt 0)

Not sure I have any infinite loops since I don't know what to look for. Would I get a message "infinite loop"?

You would get an error from your browser, not an Apache error. Exact wording depends on browser, but it would be something like "The browser has detected that {blah, blah} in a way that will never resolve." This type of error message can't come from the server, because each request is an island. Only the browser can tell that it has sent in ten identical requests and been redirected back where it started every time.

RewriteRule ^oldfile$ "http\:\/\/www\.example\.com\/newfile\.html"

Recurring mistake here.
#1 the target of a RewriteRule is literal text. You do not need to escape anything AND you do not use quotation marks.
#2 in mod_rewrite-- and most other mods-- you do not need to escape slashes. Literal slashes / only need to be escaped when the / character itself has syntactic meaning, as in javascript or a very few apache mods; it isn't part of standard RegEx-speak.
I don't know of any module where you would ever need to escape a literal colon.

Did you say you have more than one domain sharing the same htaccess? If not, you don't need all those HTTP_HOST conditions. In fact they are actively harmful, because they prevent redirecting to the canonical domain name in a single step.

Do you have many different filenames that are changing from underscore to hyphen? Will there be lots and lots of underscores in the same name, or just one? You can probably collapse it all into a single conditionlessrule:

RewriteRule ^([^_]+)_([^_]+)$ http://www.example.com/$1-$2 [R=301,L]

Even if there can be several in an URL-- but not vast numbers calling for php detour-- you can make a nest of rules, still conditionless:

RewriteRule ^([^_]+)_([^_]+)_([^_]+)_([^_]+)$ http://www.example.com/$1-$2-$3-$4 [R=301,L]

and so on for 3, 2 and 1 lowlines. But only do this if there are many, many URLs to change. Otherwise it is faster to list the specific URLs so mod_rewrite doesn't have to stop and evaluate everything it meets.

# html file to php because I gave an old file wrong ext
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteRule ^file\.html$ "http\:\/\/www\.example\.com\/file\.php" [R=301,L]

:: peering into crystal ball ::

Someone ::cough-cough:: is about to give you the sales pitch for extensionless URLs. (Not me. I'm of the "Go back inside and put some clothes on!" school ;))

# keep last
ErrorDocument 404 /notfound.html

On the contrary. This type of mechanical rule belongs at the beginning of your htaccess. Group them all together; I assume this is just a representative and you've also got-- at the very least-- a 403 as well. You may also want a 410, even if it's the same physical document as your custom 404.

Rosalind




msg:4525563
 3:45 pm on Dec 6, 2012 (gmt 0)

Xenu's Link Sleuth will tell you if you've got a redirection loop going on, and I find it catches things that won't necessarily be shown by the browser.

g1smd




msg:4525584
 6:25 pm on Dec 6, 2012 (gmt 0)

I wrote several long posts here this morning, but the server glitch has thrown them all away. Sorry.

I had hoped to pull the posts from cache and paste them back in, but it's not to be.

actolearn




msg:4525595
 7:04 pm on Dec 6, 2012 (gmt 0)

Just 1 domain and 1 htaccess at root. I have other error pages that bluehost has available but I've only made my own 404 which has a couple links for customer to follow and then whatever I put in my htaccess file re 301s.

Any problems I've had so far I am able to fix and looks fine in FF so I'll try some of these pages with chrome, IE and some others to see if working ok.

I don't know why I'm having such a hard time with this. I'll keep plugging away ...

actolearn




msg:4525599
 7:12 pm on Dec 6, 2012 (gmt 0)

to glsmd - Well now I really want to know how BAD things are. Maybe when you have time...

g1smd




msg:4525600
 7:12 pm on Dec 6, 2012 (gmt 0)

# html file to php because I gave an old file wrong ext
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteRule ^file\.html$ "http\:\/\/www\.example\.com\/file\.php" [R=301,L]


The quotes and escaping should be deleted from the rule target.

Do you need the RewriteCond? If I request example.com/file.php or www.example.com:80/file.php your rule will fail to redirect.

I can't remember all the other points I made.

lucy24




msg:4525712
 11:07 pm on Dec 6, 2012 (gmt 0)

... and double ah ha, as I realize that my (short) reply to your (long, ending in ...69, thank you very much browser history) reply also got eaten.

We didn't get the extensionless-URL sales pitch, surprise, but you did comment on the RewriteCond.

That was why I originally asked about number of domains: to confirm that it's all the same one, so the condition isn't needed. In my also-eaten follow-up I commented that that was what I'd meant by "actively harmful". The individual pages won't get redirected until after they've had the generic www redirect, so they will all get redirected twice.

actolearn




msg:4525724
 11:27 pm on Dec 6, 2012 (gmt 0)

Re Xenu Link Sleuth - Is it the same as W3C Link Chercker?

I used W3C Link Checker yesterday. It checked every single page on my site (took 26 mins, really went indepth). Results as follows:
"all 14 anchors validated
all status 301 = 200 OK
(This is a permanent redirect. The link should be updated.)" Links were already updated.

Any preferences?

Also, even though my code is wrong I can still get 200 ok's? Or is my htaccess code just inefficient but can still return correct pages on site? Or is my htaccess actually wrong?

Going to fix either way - just need to know what degree of worry I should be in...

lucy24




msg:4525739
 1:22 am on Dec 7, 2012 (gmt 0)

Same kind of thing, but you can install and run Xenu locally. Windows only. (Few years back, I had to install w3c locally for some reason that now escapes me. Command line. Runs out of Terminal. Major trauma. Pain. Running it is no problem-- except that I can't get it to ignore more than one link at a time-- but installation was agony.)

The problem is that a link checker can only check the links that it actually finds on your site. It doesn't do like search engines and willfully try variant names for the ### of it, and it doesn't know about all those other sites that insist on getting your name wrong.

So if you've fixed all your internal links, the link checker will come through as pure 200. This is good. But you still need those insurance-redirects for things like index.html and domain name. As well as for any pages that you really have moved, of course.

g1smd




msg:4525753
 2:29 am on Dec 7, 2012 (gmt 0)

Xenu LinkSleuth allows you to feed it a text file list of URLs to check.

That feature is very useful. The list should include valid and non-valid URLs.

actolearn




msg:4525763
 3:16 am on Dec 7, 2012 (gmt 0)

List of urls readily available from my website sitemap so can copy that real quick.

Non working and changed urls should show up as being handled properly or NOT...so I see the sense in that. I'll work on this tomorrow via Xenus Link Sleuth.

Thanks to all for your patience and help.

Will check back in case anyone has more info to add.

AC

actolearn




msg:4525765
 3:29 am on Dec 7, 2012 (gmt 0)

Just as an aside, I didn't have to install the W3C Link Checker. Just typed in my website url and it checked everything. In case anyone wants to try that it was easy enough. I did the indepth check, not the summary.

lucy24




msg:4525791
 6:03 am on Dec 7, 2012 (gmt 0)

List of urls readily available from my website sitemap so can copy that real quick.

I think g1's point was that you can feed it nonexistent URLs, or ones that you expect to get redirected. Or, to put it differently: I was wrong ;) In Xenu you can check links that aren't really there.

I only had to install w3c locally because I was doing something that couldn't be done online. Checking fragment links in other documents, I think. There was a period when w3c took robots.txt a little too seriously, so even if you temporarily deleted it they'd still say "Sorry, I wasn't allowed in here." Now they check on the fly. (You also have to make sure you're not physically blocking libwww-perl.)

actolearn




msg:4527613
 2:34 am on Dec 13, 2012 (gmt 0)

Hopefully I have my htaccess file in better condition. Please check it out and let me know if I've missed anything. This time I've numbered everything so if you use that number in your comments I can better understand.

I will eventually be doing more and probably a little differently but first wanted to learn the basics and get what I have cleaned up and condensed. Appreciate your help ~

1 domain
all files (incl. htaccess) at root
only have a separate image FOLDER at root

only kept %{HTTP_HOST} in 1 place
deleted escaping slashes and quotes in wrong places
Re-ordered placement so hopefully now in correct order
.......................................


RewriteEngine on
# Use PHP5 as default
AddHandler application/x-httpd-php5 .php .html


# 1 keep here at top
ErrorDocument 404 /notfound.html

# 2 added extension
RewriteRule ^file_one_xp002$ http://www.example.com/file_one_xp002.html [R=301,L]

# 3 redirect to correct url and added ext
RewriteRule ^old_named_file_onf001$ http://www.example.com/new_named_file_onf001.html [R=301,L]

# 4 redirect to php file
RewriteRule ^wrong-ext-file-wxf001\.html$ http://www.example.com/wrong-ext-file-wxf001.php [R=301,L]

# 5 underscore to hyphen
RewriteRule ^file_red\.html$ http://www.example.com/file-red.html [R=301,L]

# 6 underscore to hyphen
RewriteRule ^file_blue\.html$ http://www.example.com/file-blue.html [R=301,L]

# 7 underscore to hyphen
RewriteRule ^file_yellow\.html$ http://www.example.com/file-yellow.html [R=301,L]

# 8 my most IMPORTANT page, gallery page of new items, is it in right place
# underscore to hypen
RewriteRule ^file_new\.html$ http://www.example.com/file-new.html [R=301,L]

# 9 index.html part of url hidden on home page
# always keep above canonical code
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html http://www.example.com/$1 [R=301,L]

# 10 non-www to www
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

actolearn




msg:4527615
 2:37 am on Dec 13, 2012 (gmt 0)

..and if I understood correctly, I only needed
RewriteCond %{HTTP_HOST}
in one place? I've put at #10

lucy24




msg:4527649
 5:19 am on Dec 13, 2012 (gmt 0)

Yes, exactly. It should be the very last redirect. With all other redirects, the form of the incoming hostname doesn't matter, because the redirect itself regularizes them.

Rules 5 through 8 could be collapsed into a single rule, as noted about halfway up this thread. But if only four pages are involved, it probably does run faster if you list them by name. That way the server doesn't have to stop and check every single request.

Exception: If two or more filenames really do start the same, you could do something like

RewriteRule ^file_(red|blue|yellow)\.html http://www.example.com/file-$1.html [R=301,L]

And similarly if they end the same, like "red_file, blue_file" etc.

g1smd




msg:4527691
 8:17 am on Dec 13, 2012 (gmt 0)

The new htaccess code looks OK. Combining rules 5 to 7 would simplify a little, but isn't so important.

Leaving a blank line after each rule and commenting the code, as you have done here, makes the code a lot easier to understand. You'll thank yourself when you need to add something to the file in 2014.

The index rule has to check THE_REQUEST otherwise you get an infinite loop of redirects.

The non-www/www rule needs to check HTTP_HOST is anything other than the wanted value.

All the other rules are straight "single line of code" redirects and must appear before the index and non-www/www rules.

actolearn




msg:4527886
 7:52 pm on Dec 13, 2012 (gmt 0)

Rules 5 through 8 could be collapsed into a single rule, as noted about halfway up this thread. But if only four pages are involved, it probably does run faster if you list them by name. That way the server doesn't have to stop and check every single request.


I understand. I'll be doing more work on these (Rules 5-8) and will have more questions so for now wanted to keep separate.

Ok, assuming I understood correctly - my htaccess above looks good to go ~ Thanks lots to all. I'll be back with more in-depth questions.

g1smd




msg:4527920
 9:02 pm on Dec 13, 2012 (gmt 0)

The file looks OK. It is clear of the usual sort of stuff that gets corrected in posts here every few days.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved