Forum Moderators: phranque
Anyway, there are a number of different pages, and I had the activate working fine, but when I asked friends to test it, well i got back...."its not working".
So after playing around I realised what is happening is that for me being at the site it worked fine while testing. However if I shut down my browser (yeah been through them all this long night), and then click the activate link in the email the site sends (standard stuff)....well rather than rewriting to
example.com/activate/JHSJHJSJHKJD67J
it gives
example.com/activate/index.php
hence why things were not working. If i then keep that window open and click the email link again....it rewrites fine. So this has had me checking header code, httpd.conf, and trying and testing every permutation I can think of to get at least something working.
I figure there is looping in there somewhere but I really just cant figure it out. And am not smart enough to get my head around it, after cruising these forums and many others. So I am just posting the main bits of the htaccess, if you see code there that looks familiar it is probably because I picked it up here. Some of it has been butchered a bit in my playing around also.
Anyway if someone can point me in the right direction here, I would be very very grateful. I have read through so many docs etc now my head is about to burst, and I am really lost at this point.
So here is the guts of it, comments are in brackets:
RewriteEngine on
Options -MultiViews
# If no trailing slash
rewriteCond $1 !/$
# but exists as a directory when slash is added
rewriteCond %{REQUEST_FILENAME}/ -d
# Then add a trailing slash and redirect
rewriteRule (.+) http://www.example.com/$1/ [R=301,L]
(The above was a desperation attempt found on the forum here)
RewriteCond %{HTTP_HOST} !^www\.example.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^(.*) http://www.example.com/$1 [L,R=301]
# Error Docs
ErrorDocument 404 /error_404.php
ErrorDocument 403 /error_403.php
#block bad bots
RewriteCond %{HTTP_USER_AGENT} ^Alexibot [OR]
....
(A bunch of conditions in here)
....
RewriteRule ^(.*)$ [robotstxt.org...] [L]
RewriteCond %{HTTP_USER_AGENT} .*Twiceler.*
RewriteRule ^(.*)$ [cuil.com...] [L]
RewriteCond %{HTTP_USER_AGENT} .*Baiduspider.*
RewriteRule ^(.*)$ [www.baidu.com...] [L]
(These two hammered my site, below is where I think I am screwing up)
rewriteCond %{REQUEST_URI} ^category\.php$
RewriteRule ^category/(.*)/$ category.php?{%QUERY_STRING}&cat=$1 [L]
RewriteRule ^category/(.*)$ category.php?{%QUERY_STRING}&cat=$1 [L]
rewriteCond %{REQUEST_URI} ^news\.php$
RewriteRule ^news/(.*)/$ news.php?{%QUERY_STRING}&id=$1 [L]
RewriteRule ^news/(.*)$ news.php?{%QUERY_STRING}&id=$1 [L]
rewriteCond %{REQUEST_URI} ^activate\.php$
rewriteCond %{REQUEST_URI} !(\.¦/$)
RewriteRule ^activate/([0-9A-Z]+)/$ activate.php?{%QUERY_STRING}&code=$1? [L]
RewriteRule ^activate/([0-9A-Z]+)$ activate.php?{%QUERY_STRING}&code=$1? [L]
(for the above some more ad hoc try anything desperate coding)
#signup
#rewriteCond %{REQUEST_URI} ^quick_signup\.php$
#RewriteRule ^usr/register/$ http://www.example.com/quick_signup.php [NC,L]
#RewriteCond %{REQUEST_FILENAME} !index.php
RewriteRule ^register/?$ quick_signup.php [NC,L]
In some ways the hacked code above actually works better. So I am not really understanding why when I do things "properly" i get errors. eg if I use
RewriteCond %{request_filename}!-d
it gives me a syntax error. And when I use #RewriteCond %{REQUEST_URI} in this new code it didnt work at all, hence commented out.
So really I am not sure at all what is happening here. I have checked httpd.conf and that is bare bones. The htaccess runs as an include from there, but just to be safe I ran it from the directory with overide All. Even when I remove the index.php (2nd code block) I still get this occasional forcing to index.php.
My best reasoning is that there is some loop going on that is causing the intermittant problem. Apache looks for a real directory, cant find it and then defaults to index.php
The rewrite_log is nuts even on level 3, so I am having trouble following that through, and to be honest really understanding the log file.
I am confused about where to even go next with this. Should the ordering be the other way round? Force www last? I added the %{ENV:REDIRECT_STATUS} with the idea of trying to close down loops by only redirecting if it is not a redirect.
Any help on sorting this out would be very very gratefully appreciated. Here is a "formally" better iteration of my code.....but as mentioned if anything it works worse.
PS. The hacked code is pretty much at the point of trying anything...not good, but this is driving me nuts.
###############
RewriteEngine on
# Force www
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^(.*) http://www.example.com/$1 [R=301,L]
# For index.php, without parameters, only in the root:
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/$1 [R=301,L]
# Error Docs
ErrorDocument 404 /error_404.php
ErrorDocument 403 /error_403.php
#block bad bots
RewriteCond %{HTTP_USER_AGENT} ^Alexibot [OR]
RewriteCond %{HTTP_USER_AGENT} ^asterias [OR]
...
(more conditions to exclude scrapers etc in here)
...
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^(.*)$ [robotstxt.org...] [L]
RewriteCond %{HTTP_USER_AGENT} .*Twiceler.*
RewriteRule ^(.*)$ [cuil.com...] [L]
RewriteCond %{HTTP_USER_AGENT} .*Baiduspider.*
RewriteRule ^(.*)$ [www.baidu.com...] [L]
#RewriteCond %{REQUEST_URI} ^image/([0-9]+)/?$
#RewriteCond %{request_filename}!-d
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^image/([0-9]+)/?$ image_detail.php?{%QUERY_STRING}&id=$1 [NC,L]
#RewriteCond %{REQUEST_URI} ^category\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^category/([A-Za-z0-9-%]+)/?$ category.php?{%QUERY_STRING}&cat=$1 [NC,L]
#RewriteCond %{REQUEST_URI} ^search\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^search/([A-Za-z0-9-%]+)/?$ search.php?{%QUERY_STRING}&search=$1 [NC,L]
#RewriteCond %{REQUEST_URI} ^download\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^downloads/(.*)/?$ download.php?{%QUERY_STRING}&pass=$1 [NC,L]
#RewriteCond %{REQUEST_URI} ^news\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^news/([0-9]+)/?$ news.php?{%QUERY_STRING}&id=$1 [NC,L]
#RewriteCond %{REQUEST_URI} ^activate\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^activate/([A-Za-z0-9]+)/?$ activate.php?{%QUERY_STRING}&code=$1 [NC,L]
#RewriteCond %{REQUEST_URI} ^contact\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^info/Contact/Office/?$ contact.php [NC,L]
#RewriteCond %{REQUEST_URI} ^feedback\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^info/Contact/Feedback/?$ feedback.php [NC,L]
#RewriteCond %{REQUEST_URI} ^forgot_login\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^login/forgot/?$ forgot_login.php [NC,L]
#RewriteCond %{REQUEST_URI} ^basket\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^usr/basket/?$ basket.php [NC,L]
#RewriteCond %{REQUEST_URI} ^quick_signup\.php$
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^usr/register/?$ quick_signup.php [NC,L]
##############
For the sake of keeping this discussion focused, please reduce the code to one example each of robots control and internal script rewriting; Most contributors here will be put off by a big chunk of code, and trying to debug it "all at once" makes your job much more difficult as well. Divide and conquer.
One further comment: By redirecting unwanted robots to other URLs, you are simply making the problem worse by passing the problem along, and using up internet bandwidth in the process. The worst (malicious) robots won't even follow your redirect, as a matter of fact.
For robots.txt-compliant robots, Disallow them in robots.txt. For those which do not fetch and obey robots.txt, simply return a 403-Forbidden response and be done with it -- e.g. "RewriteRule ^ - [F]"
Jim
The real problem is that external links are being rewritten to index.php and I can't see where that is coming from at all.
As an example if I say to a friend online, check out this
http://www.example.com/x/y
they click the link and it goes to
http://www.example.com/x/index.php
the same for bookmarks and email links such as activate. However if I click the link while I am already at the site, then it works fine.
To be sure it wasnt something in the application headers I disabled them and still the same issue. I have checked through httpd.conf and disabled multiviews and so on just be certain.
Even checked the obvious that 1) there are not other htaccess's laying around and 2) that it is really reading the file.
So as far as I can the request is going to [example...]
apache thinks y is a file and cant find it, so serves up index.php
What I dont get is why it does this only on new requests and is fine otherwise. I also can't see where the issue is to cause this.
In short how do I get apache to recognise that it is a folder? If I use x/y/z/ then it just dumps to x/y/z/index.php which has just shifted the problem.
Any ideas on why the rewrite doesnt work on initial requests? my assumption was some looping is going on there, but still it seems strange to me. I would prefer it just to not work at all.
Thanks again for your time.
Scott.
The symptoms indicate that an internal rewrite is being 'exposed' to the client (browser) by a subsequent external redirect.
Other than the first two rules, these all appear to be in the correct order (external redirects first, followed by internal rewrites, with both groups ordered from most-specific pattern/condition to least-specific), so your rule order isn't the cause (as is usually the case we see here in this forum).
So my first suggestion would be to disable MultiViews (content-negotiation using Apache mod_negotiation) if they are enabled and if your your site does not need/depend on them. Use "Options -MultiViews" or combine it with your existing Options (if any) at the head of this .htaccess file.
There are plenty of other fixes and improvements needed, but none are critical to the current problem. So we can look at those later.
[added]
One more note: If you do use
RewriteCond %{REQUEST_URI} !^some_URL-path be sure to precede that URL-path with a slash or it won't work: RewriteCond %{REQUEST_URI} [b]!^/s[/b]ome_URL-path Jim
Is the user seeing the URL shown in the URL bar of the browser change to this new URL?
If so, then this is working as a redirect, not as a rewrite.
A rewrite takes an incoming URL request and fetches the content from an internal filepath that is different to that suggested by whatever was in the filepath part of the URL request. What that new internal filepath is, is not revealed to the browser. The internal rewrite is silent.
So, we're looking for some mechanism that could cause the DirectoryIndex target to be exposed through a redirect here. Unfortunately, there doesn't seem to be one in the code posted here, so it's either another .htaccess file (as mentioned in the first post), a config file (e.g. httpd.conf), or possibly one of the scripts themselves invoking an external redirect and exposing the filepath as a URL.
I wasn't able to do much with the last example URL provided, since it does not match any of the rules, but perhaps that's a telling clue if that example URL would actually invoke the problem...
Jim
Options +FollowSymLinks
Options -Indexes
Options -MultiViews
AllowOverride none
Order allow,deny
Allow from all
#AcceptPathInfo On
Include /etc/httpd/conf/.htaccess
So pretty much standard stuff there.
I tried the AcceptPathInfo before, On and Off, but no change either.
To be more specific g1smd, yes you are right. It is a hard redirect, basically the browser opens with http://www.example.com/x/y (shown in the address bar) then a forced redirect happens after a slight delay, the browser "clicks" and dumps to
http://www.example.com/x/index.php (in the address bar)
Worse, as I mentioned it seems to do this only when the browser is first opened, eg an email link, etc.
I have scanned through the rewrite log and was able to find/create the following. ( I cat dev'dnull the log file then opened the browser and exported the log to try and narrow down problem areas)
(here is a log excerpt with w/x/y replacing my paths, contact/index.php seems to just come from nowhere - to me at least, so I am guessing this is more an apache issue than mod_rewrite?)
[rid#8c5e878/initial] (3) [per-dir /w/x/y/web/] strip per-dir prefix: /w/x/y/web/js/menu_fns.js -> js/menu_fns.js
[rid#8c5e878/initial] (3) [per-dir /w/x/y/web/] applying pattern '^usr/register/?$' to uri 'js/menu_fns.js'
[rid#8c5e878/initial] (1) [per-dir /w/x/y/web/] pass through /w/x/y/web/js/menu_fns.js
[rid#8c4e838/initial] (3) [per-dir /w/x/y/web/] add path info postfix: /w/x/y/web/info -> /home/.sites/images/web/info/Contact/index.php
[rid#8c4e838/initial] (3) [per-dir /w/x/y/web/] strip per-dir prefix: /w/x/y/web/info/Contact/index.php -> info/Contact/index.php
I am not sure where to go next with this, running out of ideas on where to even look for the problem. All I can think to do now is to reload all apache modules - as I normally keep it bare bones, and have a dig around in apache.
Thanks again,
Scott
My reasoning being that the internal rewrite is happening then apache is doing a hard redirect when it can't find what it is looking for? But again I dont see why that should happen if the rewrite worked in the first place.
Thinking about it I might build another version on a different tree/port and see how that goes. I am completely out of ideas, and colleagues and friends I have asked seem as baffled as me with the standard answer along the lines of "but it shouldnt do that".
I have bare bones httpd.conf, no vhosts, ssl etc. Been through the app code to check no redirects are forced there and so on.
The only things I can think left to try are enable all modules and completely disable rewrites then see what happens. If that doesnt get me anywhere, then install/upgrade.
Out of ideas on this one.
CanonicalName settings, but I think that unlikely. Check it anyway.
As I have stated several times, you'd be well-served to remove most of the blocks of code from your file, and pare the file down to the simplest case that will demonstrate your problem. For example, it's likely that you can completely remove the "bad-bot" access control code, and reduce the internal rewrite rules to just one and still show the same problem. A smaller/simpler test case will simplify the identification of the root cause here.
Going through all the work to upgrade your server only to find out that the problem occurs on any Apache version is hardly worthwhile -- and it's quite likely that it will re=occur if you use the same config and .htaccess files. I think your time would be better spent with a 'focused' plan, rather than just changing as many things as possible.
Simplify your test case, don't make it more complex.
Jim