homepage Welcome to WebmasterWorld Guest from 54.166.33.25
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
.htaccess infinite loop for non existent urls
ddriver



 
Msg#: 4594869 posted 7:24 pm on Jul 20, 2013 (gmt 0)

Hi everyone,

I'm working on my .htaccess to obtain an extensionless site. I've searched the forums and implemented some rules that are working, but when I type in a url that doesn't exist (e.g. mysite.com/dsfadfasdf ) it starts to go in an infinite loop.

If anyone would be so kind as to help me get rid of the loop and perhaps redirect to a page of my choosing (where I'll say the page doesn't exist) I'd really appreciate it!

Options +FollowSymLinks
Options +Indexes
RewriteEngine On
RewriteBase /

# Redirect www to non www
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

# Home page language
RewriteRule ^(en|ro)/?$ index.php?lang=$1 [L]

# For the profile page
RewriteRule ^(.+)\/accommodation-(.+)\/([0-9]+)\/(.+)$ location.php?id=$3&town=$2&hname=$4&lang=$1 [NC,L]

# Internally rewrite extensionless URL to corresponding .php
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(.*)$ $1.php?%1 [NC,L,QSA]

# Externally redirect (only) direct client requests for .php URLs to extensionless URLs:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.#?\ ]+\.php([#?][^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php http://example.com/$1 [R=301,L]


I believe the issue is in the last rule.

 

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 9:09 pm on Jul 20, 2013 (gmt 0)

Hi ddiver,

I'm seeing a few things, so I'll make notes below.

###

# I'm not 100% sure about +indexes... I usually
# turn them off explicitly not on, but if you need them
# for some reason, then "go with it" I just don't
# like exposing the contents of a directory if I don't have
# an index in it for whatever reason.
#
Options +Indexes +FollowSymLinks

# It's "nit picky" but on should be lowercase.
#
RewriteEngine on
RewriteBase /


# Externally redirect (only) direct client requests for .php URLs to extensionless URLs:
#
# Since all we really need to know is if the original request has
# .php in it, let's just check for that at the end of the rule.
#
# We should probably only have to use the NS flag on the rule, but
# I don't mind a bit of double coverage since I've had some "oddities"
# with mod_rewrite previously, so I left the condition.
#
# BTW: NS = No subrequest
#
RewriteCond %{THE_REQUEST} ^[A-Z]{0-9}\ /([^.]+)\.php
RewriteRule \.php$ http://www.example.com/%1 [R=301,L,NS]


# Redirect www to non www
#
# No need to "match and store everything" with (.*)
# on the left side of the rule since we can use
# %{REQUEST_URI} which is already set.
#
# Also an explicit negative match in the condition
# usually works better and we don't need the [NC]
# since any modern browser will correct that before
# making the request.
#
# Canonicalization should come after any other redirects.
#
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L]


# Home page language
#
# Internal rewrites should always come after all
# external redirects are complete.
#
# This one I'm a bit confused about since it's matching
# both /en and /en/... That seems like you're creating
# some dup. content and I'd probably pick one and redirect
# the other location to it so only one of /en or /en/ are
# available.
#
RewriteRule ^(en|ro)/?$ /index.php?lang=$1 [L]

# For the profile page
#
# An explicit or negative match is much more efficient
# than .* or .+ so I switched to negative matching patterns
# where I could and also used an explicit pattern at the start.
#
# [^/] = The ^ character in [] = NOT
# So, the above matches anything NOT A /
#
RewriteRule ^(en|ro)/([^/]+/)?accommodation-([^/]+)/([0-9]+)/([^\ ]+)$ /location.php?id=$4&town=$3&hname=$5&lang=$1 [NC,L]

# Internally rewrite extensionless URL to corresponding .php
#
# Only minor adjustments and I'm guessing you'll "get them"
# if you've read the other comments.
#
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^[^.]+$ %{REQUEST_URI}.php [NC,L,QSA]

ddriver



 
Msg#: 4594869 posted 10:16 pm on Jul 20, 2013 (gmt 0)

Hello JD_Toims,

Thanks for the reply.

What I'm seeing is that when using your code, I'm in in fact adding www to all my links.

Also, it's causing problems in other pages (not working - Chrome says "this page has a redirect loop"):

For example, the search results page which is "example.com/results?param1=4&param2=something".

If I add php to the file extension though, it works, but that's not how I want it So, "example.com/results.php?param1=..." is working with your code, but I don't want the php extension.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 10:32 pm on Jul 20, 2013 (gmt 0)

First, to remove the www all you have to do is remove it from what I have:

RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule .? http://example.com%{REQUEST_URI} [R=301,L]

I'll get back to you more in a minute when I've re-read all my code.

ADDED: Also remove the www from the right side of the 1st rule.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 10:36 pm on Jul 20, 2013 (gmt 0)

Second, empty your browser cache and try to access a .php page with only these two rules: (You should be redirected to the extensionless version, if not there's something silly I'm missing, but I don't see it right now.)

RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /([^.]+)\.php
RewriteRule \.php$ http://example.com/%1 [R=301,L,NS]

RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule .? http://example.com%{REQUEST_URI} [R=301,L]

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 10:50 pm on Jul 20, 2013 (gmt 0)

Third, if that works empty your browser cache again and add the other rules back in: (I edited the last one slightly, even though I shouldn't have had to.)

RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /([^.]+)\.php
RewriteRule \.php$ http://example.com/%1 [R=301,L,NS]

RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule .? http://example.com%{REQUEST_URI} [R=301,L]

RewriteRule ^(en|ro)/?$ /index.php?lang=$1 [L]

RewriteRule ^(en|ro)/([^/]+/)?accommodation-([^/]+)/([0-9]+)/([^\ ]+)$ /location.php?id=$4&town=$3&hname=$5&lang=$1 [NC,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ /$1.php [NC,L,QSA]

###

If that "throws an error" there's a "crazy work-around" I've had to use on a couple of boxes, but not all. IDK why I have to use this sometimes, but I do for some reason, so try this:

RewriteEngine on
RewriteCond %{THE_REQUEST} !\.php
RewriteRule \.php$ - [L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /([^.]+)\.php
RewriteRule ^([^.]+)\.php$ http://example.com/$1 [R=301,L,NS]

RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule .? http://example.com%{REQUEST_URI} [R=301,L]

RewriteRule ^(en|ro)/?$ /index.php?lang=$1 [L]

RewriteRule ^(en|ro)/([^/]+/)?accommodation-([^/]+)/([0-9]+)/([^\ ]+)$ /location.php?id=$4&town=$3&hname=$5&lang=$1 [NC,L]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ /$1.php [NC,L,QSA]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594869 posted 1:23 am on Jul 21, 2013 (gmt 0)

Disclaimer: I type slowly and tend to have multiple tabs open, so this post may appear to ignore one or more preceding posts.

empty your browser cache

If you have the option of testing somewhere other than your live site-- for example a test site or MAMP/WAMP --add this element to your config or htaccess:

ExpiresActive On
ExpiresByType text/html "access"

Where I say "text/html" add any extensions you actually use for pages. This means that if you've got compliant browsers, they will make a fresh request for each page every time. (You will still need to refresh or, worst case, empty the cache if you've tweaked the css. This has bitten me many times ;))

Options +Indexes
Do you really want this for your entire site? You might have something like an image directory that you want users to be able to paw through at will, but normally people don't want to give free access to all users everywhere. In practice, the option only kicks in if a directory has no named index file (index.html, index.php, whatever). But you never know where a human user might decide to snoop.

# Internally rewrite extensionless URL to corresponding .php
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(.*)$ $1.php?%1 [NC,L,QSA]


What are you trying to do here? The last two lines are identical to
RewriteRule ^(.*)$ $1.php [L]
with no second condition and no QSA, since "reappend the query" is mod_rewrite's default behavior anyway. And [NC] isn't needed since you are not matching literal text in the pattern. If you were matching literal text, you still wouldn't use [NC] in a rule creating an internal rewrite, because it creates the option of Duplicate Content. (Exception: [NC] is OK if the rewrite points to a php file that will issue a 301 or 404 on its own behalf when casing is wrong.)

A more serious problem is that the rule doesn't exclude requests that already end in .php. 99 times out of 100 you can express the pattern as
^([^.]+)$
The anchors are now essential, where before they weren't needed. The 100th time, of course, is if your directory names contain literal periods. This is perfectly legal-- see apache's own site for examples ;) --but you can save yourself a ### of a lot of trouble if you stick strictly to alphanumerics, lowlines and hyphens.

# Externally redirect (only) direct client requests for .php URLs to extensionless URLs:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*[^.#?\ ]+\.php([#?][^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*[^.]+)\.php http://example.com/$1 [R=301,L]

This is a little awkward. The pattern
([^/]+/)*[^.]+
means that the last part can theoretically contain slashes. In practice it will never happen, thanks to Regular Expressions being greedy by nature. But, again, if your directory names don't contain literal periods, you can express the whole pattern as simply [^.]+

Main Issue
The rules as given in the OP are in the wrong order. The overall grouping goes like this:

FIRST group rules in order of severity. That means that any access-control rules ([F] flag) come first. Then [G] if any. Then redirects ([R=301] flag). And finally the internal rewrites. And, er, super-finally, rules that don't change anything at all and don't set an [L] flag, such as cookies. These are rare.

THEN within each group, go from most specific to most general. Ordinarily that means your very last redirect is domain-name canonicalization: with or without www. The Condition here should say !^(example\.com)?$ if you're using the without-www form. There are further complications if you've got multiple domains passing through the same htaccess. Do you?

The second-to-last redirect is one the OP doesn't have: the "index.hmtl" or "index.php" redirect.

If you're going extensionless, there is an additional redirect for requests ending in .php. Now rule ordering becomes crucial, because
Requests for /blahblah/index.php have to get redirected to /blahblah/ alone
while
Requests for /blahblah/somename.php have to get redirected to /blahblah/somename

So the extension-redirect has to come after the index redirect but before the domain-name redirect.

ddriver



 
Msg#: 4594869 posted 3:02 pm on Jul 21, 2013 (gmt 0)

Hi again and thank you for your input.

JD_Toims,
Your "First" post: removing "www" worked, it's now a non-www site like I like it.
Your "Second" post: I tried accessing the site using only those two rules, emptied the cache, and also tried it with different browsers/devices. Accessing a php page such as example.com/login.php will not redirect to the extensionless version, and if I try to access the extensionless version directly, I get a not found 404 error.
Didn't go to the third step yet.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 6:17 pm on Jul 21, 2013 (gmt 0)

Okay, let's take the NS flag off and try those two again with an empty cache. I've been using a slight variation of that first rule for years to remove extensions, so it came out of a working .htaccess file and I'm not quite "getting" why it's not working for you.

BTW If you haven't noticed cache emptying although annoying when you do it 200 times a day is necessary when editing .htaccess files.

RewriteEngine on
RewriteCond %{THE_REQUEST} ^[A-Z]{3-9}\ /([^.]+)\.php
RewriteRule \.php$ http://example.com/%1 [R=301,L]

RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule .? http://example.com%{REQUEST_URI} [R=301,L]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4594869 posted 9:39 pm on Jul 21, 2013 (gmt 0)

How many .htaccess files are there on the site?

Are you using the one in the site root? Are there any others in any sub-folders?

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 3:29 am on Jul 22, 2013 (gmt 0)

The questions asked by g1smd are great, and Important since there are so many different ways an "undesired" or "unexpected" result can occur with mod_rewrite.

And, just so there's no question when I said the variations of the first rule I posted are "slight" the rule/condition I'm actually using, which is working to make multiple websites extensionless is:

RewriteCond %{THE_REQUEST} ^[A-Z]{0-9}\ /([^.]+)\.
RewriteRule \. http://www.example.com/%1 [R=301,L]

Basically, I added php after the . (dot) in the rule and condition, then end-anchored the rule when I posted it. So, for .php extensions, I don't see why it's not working as posted.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594869 posted 7:15 am on Jul 22, 2013 (gmt 0)

The questions asked by g1smd are great

By weird coincidence, I'm concurrently playing a quotations game on another forum, and the most recent entry was...

Oh, never mind.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 7:20 am on Jul 22, 2013 (gmt 0)

and the most recent entry was...

You can't leave a post and all of us readers hangin like that, it's totally mean spirited! LMAO!

ddriver



 
Msg#: 4594869 posted 7:53 am on Jul 22, 2013 (gmt 0)

Hi everyone,

Sorry for my late answers, but I think I'm in a different timezone and usually when so many hours pass before my answering it means I went to sleep.

I'm using the .htaccess in the root, there's one more htaccess but it's in a testing subdirectory so it shouldn't cause any problems.

It's the same story after removing the NS flag... I can access login.php, but no redirect to extensionless. And if I type extensionless, I get a not found 404. (Yes, I did clear the cache 1 million times).

I would also like to mention that (although wrong from what you're telling me) my original rules are indeed working as I want them to. I would just want to get rid of the loop which happpens when I access a page like example.com/tralala and there's no such page.

Lucy, you at one point you were asking me "what are you trying to do here?"

The rules I have in my htaccess were taken from this forum, from other threads and just slightly adapted to fit my site. I haven't written them myself.

But the ideea is that I have a php website.

I want it non-www.

For the home page, there's that rule saying that if I'm typing example.com/en go to example.com?lang=en and if I'm typing example.com/ro go to example.com?lang=ro.

There's the rule that's redirecting .php to extensionless if you type a .php url.

And finally there's the rule that if you type extensionless, it tells apache that what you're asking for is really php.

At least that's what I understand from my htaccess rules. I know it's not as optimized as possible or perhaps it's just plain wrong, but I was happy that I got it working. If I'm not mistaken, I took the php redirection rules from this forum (given as help to another more or less clueless guy by user jdMorgan).

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594869 posted 8:32 am on Jul 22, 2013 (gmt 0)

This thread has meandered a bit. Did you ever deal with this specific issue?

RewriteRule ^(.*)$ $1.php?%1 [NC,L,QSA]

Here, the rule has to exclude requests that already end in php. Otherwise it goes around in circles and the server ends up looking for
blahblah.php.php.php.php.php.php...

Have you now got your rules in the right order? First redirects, then rewrites.

The "what are you trying to do?" question was in response to the specific bit of code I quote:
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(.*)$ $1.php?%1 [NC,L,QSA]

As written, the rule seems to capture the query string only to reappend it-- in other words, to do exactly what would have happened automatically if the rule didn't mention the query at all.

Do your extensionless URLs ever have query strings?

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 8:35 am on Jul 22, 2013 (gmt 0)

but I was happy that I got it working.

If it's not broken, let's not fix it.

The rules/conditions I posted for you should be working (unless someone else sees a problem I'm missing) but what I posted doesn't seem to be working for you, so personally, if I were you I'd leave "working" and "well enough" alone.

I can't see or figure out why the info/code I posted for you is not working. Maybe someone else can find something I missed, but I've been dealing with mod_rewrite for nearly a decade (it's one of the things I "specialize" in) and I'm not seeing the issue with what I posted. Plus I'm using the code I have installed on multiple sites, but you say is not working for you, even though the sites I've installed essentially the same rules/conditions on are all fine, so my best advice is: If it's not broken, let's not try to fix it.

ddriver



 
Msg#: 4594869 posted 8:50 am on Jul 22, 2013 (gmt 0)

lucy24, yes, my extensionless urls do have query strings.

Example: my search results page
results.php?town=chicago&filter1=4star&filter2=blabla

extensionless version: results?town=chicago&filter1=4star&filter2=blabla

Also, I haven't changed the order of the rules yet because I'm still not sure that I understand the correct order.

And indeed, I'm missing the index.php/html redirect, because I can both access example.com and example.com/index which is duplicate content.

JD_Toims, don't get me wrong, I really appreciate your efforts in trying to optimize what I have. But if optimization just ends up breaking what's already working there's no point.

[edited by: phranque at 9:50 am (utc) on Jul 22, 2013]
[edit reason] please see stickymail [/edit]

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4594869 posted 10:12 am on Jul 22, 2013 (gmt 0)

if optimization just ends up breaking what's already working there's no point.

That's exactly what I was saying... If what you have isn't broken, then let's not fix it.

Basically, what I know and am saying in my most recent posts is: "I have most of the rules I posted working on one site or another, so if for some reason they're not working on yours, but what you have is working, then let's not change anything..."

Sorry I couldn't be more help.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594869 posted 4:26 pm on Jul 22, 2013 (gmt 0)

Also, I haven't changed the order of the rules yet because I'm still not sure that I understand the correct order.

See post about halfway up this thread, under main issue. Access control, then external redirect, then internal rewrite. Within each, go from specific to general. "External" here doesn't mean some other site, it just means the browser has to make a fresh request.

Clearly you don't want people using "/directory/index" at any time, ever. That's no better than /index.php

But if your extensionless URLs still have visible query strings, I really don't see what you're gaining. Unless you're planning to change your files to .jsp next week, and to .cgi the week after that, and so on. That's one of the two reasons for going extensionless. But for most people the primary one is to make prettier URLs-- and something with a query string is not going to be pretty.

Search results may not be the best example, though, since you wouldn't want those URLs to be indexed anyway. Probably not even the Search page at all. People don't come to a site for its terrific search page, unless the search is the site. The question is whether URLs for other pages will have queries.

ddriver



 
Msg#: 4594869 posted 4:45 pm on Jul 22, 2013 (gmt 0)

Hi lucy24,

I don't know which rules are access control, which rules are external redirects and which rules are internal rewrites. That's why I haven't re-ordered them. I'm not the expert in htaccess, that's why I'm here asking for help.

Query strings: search results page, back end interface.

The site is what it is, it's not the most advanced piece of technology, maybe some day there won't be any query strings left.

For the moment I just want to take care of that loop and duplicate content on home page where / is the same as /index

So what I'm asking for is that if you want to help, please be a bit more specific, and tell me which rule should go where, or better yet just copy and paste them for me in the right order.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4594869 posted 7:04 pm on Jul 22, 2013 (gmt 0)

I don't know which rules are access control, which rules are external redirects and which rules are internal rewrites.

Yes you do. Honest. You just may not know the terminology.

Access control = who is allowed to get into your site at all. No access = [F] flag. You may not happen to have any of these right now, at least not using mod_rewrite.

External redirect = [R] flag (or [R=301], but let's not muddy the waters with all the other things [R] can be) and/or "target" of rule starts in http://www.example.com/ (this creates a redirect even without the [R] flag).

Internal rewrite = [L] flag alone (or, rarely, no flag at all) and the "target" part of the rule doesn't start with http://www.example.com/

In functional terms:
access control = 403 = [F] = human user only sees page that says "Nuh-uh, nothing for you here, please go away" or whatever your custom 403 page says

redirect = 301/302 = [R] = address bar in human user's browser changes although they haven't done anything

rewrite = end user doesn't know that you've done anything, because it's all happening behind the scenes. Not even the googlebot knows when it's been rewritten.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4594869 posted 7:10 pm on Jul 22, 2013 (gmt 0)

Access contol rules use the [F] or [G] flag and block access to specific user agents, or block using other criteria.

External redirects have the [R=301,L] flag and cause the browser to make a new request for a different URL.

Internal rewrites have only the [L] flag.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved