homepage Welcome to WebmasterWorld Guest from 67.202.56.112
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 37 message thread spans 2 pages: 37 ( [1] 2 > >     
index.php redirect breaking 404 pages.
boasting_j




msg:4520916
 11:47 pm on Nov 20, 2012 (gmt 0)

Howdy,

I have recently setup a index.php redirect through .htaccess. The idea here is to negate duplicate content issue that crops up when a site has both an index.php and / (homapage) getting indexed.

I used the technique listed here.

[askapache.com ]

It works great too. The one issue is, it breaks the 404 pages.

So if a user types in or goes to www.example.com/dafjkadbfda instead of serving the 404 page, what happens is the URL stays the same, in this case the broken one, and it severs the index.php page.

This in turn is opening another can of worms in that all those broken pages are coming up as duplicate content and meta. So while this is somewhat seo related, it does have to deal with the .htaccess. :) This has been an issue on many sites that I thought the .htaccess redirect worked on. Thanks in advance.

 

g1smd




msg:4521276
 9:59 pm on Nov 21, 2012 (gmt 0)

Post at least the redirect and the ErrorDocument code here, using example.com for the hostname.

The problem is that any request that is internally rewritten will be fulfilled by the index.php file. It is up to the PHP script to return valid content if it exists in the database for the requested URL -OR- return the correct HTTP status code and error message if there is no content to deliver.

Apache can send it's own 404 response only when it fails to find a file on the hard drive to handle the request. In this case it always finds a file: index.php and so Apache's work is done and there's no error in its part of the operation.

[edited by: g1smd at 10:23 pm (utc) on Nov 21, 2012]

boasting_j




msg:4521283
 10:11 pm on Nov 21, 2012 (gmt 0)

Oops, yes that would have been helpful.

Here is my original .htaccess

[*]Options -Indexes
ErrorDocument 403 /customerrors/403.html
ErrorDocument 401 /customerrors/401.html
ErrorDocument 400 /customerrors/400.html
ErrorDocument 500 /customerrors/500.html
ErrorDocument 404 /customerrors/404.html



And the one that is creating the issues.

RewriteEngine On
RewriteBase /
DirectoryIndex index.php
RewriteCond %{http_host} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://example.com/ [R=301,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
Options -Indexes
ErrorDocument 403 /customerrors/403.html
ErrorDocument 401 /customerrors/401.html
ErrorDocument 400 /customerrors/400.html
ErrorDocument 500 /customerrors/500.html
ErrorDocument 404 /customerrors/404.html


Possibly need to re-write this with the 404 pages in mind?

g1smd




msg:4521286
 10:28 pm on Nov 21, 2012 (gmt 0)

Add a blank line after each RewriteRule for human readability.

Get the DirectoryIndex and Options lines up to the top.

Change your non-www/www redirect to this:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


Add an extra RewriteCond condition in the ruleset that internally rewrites.
This will test that REQUEST_URI is NOT ^/customerrors/

boasting_j




msg:4521299
 10:54 pm on Nov 21, 2012 (gmt 0)

Thanks for the replay, it's much appreciated.

Everything is clear, except the last part. I'm not sure how to do this. Can you expand on this?

Add an extra RewriteCond condition in the ruleset that internally rewrites.
This will test that REQUEST_URI is NOT ^/customerrors/

lucy24




msg:4521302
 11:00 pm on Nov 21, 2012 (gmt 0)

Add an extra RewriteCond condition in the ruleset that internally rewrites.
This will test that REQUEST_URI is NOT ^/customerrors/

Isn't it too late then? A custom 404 page can only be (internally) requested if the server has previously returned a 404, and the all-purpose

RewriteRule . /index.php [L]

prevents that from happening in the first place. Besides, the error page does exist, so no rewrite would happen.

You can do one or the other-- use a custom error page OR send everything to index.php-- but it's hard to do both.

And what if someone requests a nonexistent image? Shouldn't the boilerplate at least be constrained to php and/or no extension? (Yes, I realize that isn't specific to the current question. It's built in. But it still seems like overkill.)

boasting_j




msg:4521306
 11:09 pm on Nov 21, 2012 (gmt 0)

Hi lucy,
If i understood your reply correctly(which it's entirely possible I don't) I don't want to send everything to the index.php page. I want to send requests for the index.php page to / (homepage). This is working, but breaking the 404 page. For some reason there are pages coming up such as

www.example.com/nfsjfsndgg

Instead of serving the 404 page it's serving the index.php page. Even worse, if i use google webmaster tools fetch request to check the page status, it's returning a 200 OK code instead of a 404 message. This is a problem. When I revert to the old .htaccess code, the 404 are back in place and working as they should.

g1smd




msg:4521339
 12:42 am on Nov 22, 2012 (gmt 0)

Place a blank line after each RewriteRule so you can see each ruleset (the rule and its associated conditions) more clearly.

Apache can send it's own 404 response only when it fails to find a file on the hard drive to handle the request. In this case it always finds a file: index.php (because you have rewritten the request and are now telling Apache to handle the request by serving the index.php file) and so Apache's work is done and there's no error in its part of the operation.

The PHP script should be programmed to return the error 404 HTTP status code and a human readable error message if there is no content to deliver for the current URL request.

Change the order of your redirecting rulesets around. The index redirect ruleset needs to be listed before the non-www/www redirect ruleset.

boasting_j




msg:4523022
 8:50 pm on Nov 27, 2012 (gmt 0)

@g1smd Thanks for your help. I was finally able to get it working. I had to comment out some of the codes from the initial workaround and do some rearranging. It looks like it is working 100%. Here is the sample of what worked and it kept the 404s working.


DirectoryIndex index.php

Options -Indexes

RewriteEngine On

RewriteBase /

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

ErrorDocument 404 /customerrors/404.html


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]


RewriteRule ^index\.php$ - [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
## RewriteRule . /index.php [L]

g1smd




msg:4523054
 10:40 pm on Nov 27, 2012 (gmt 0)

Get the ErrorDocument as the very first or very last line, not in the middle of all your RewriteRules.

RewriteBase / is the default and not needed to be stated.

The index redirect ruleset must be listed before the non-www/www redirect ruleset.

If you comment out a RewriteRule, the two conditions before it will apply to the next RewriteRule found in your htaccess file - unless you comment out the conditions too.

Sgt_Kickaxe




msg:4525920
 3:05 pm on Dec 7, 2012 (gmt 0)

3 things...

RewriteBase / is the default and not needed to be stated.

True, unless you have multiple domains in one cheap hosting account and this domain is not set to be the primary domain.

- Should you add [NC] to the end of the rewrite conditions to make them case insensitive?

- Is there a way to avoid the redirect to an error page? e.g. example.com/index.php/fdgidfgireg should 404 but instead it does a 301 to the www version first. Ideas?

g1smd




msg:4525928
 3:34 pm on Dec 7, 2012 (gmt 0)

Maybe add the [NC] flag for RewriteRules that redirect, but NEVER add it for RewriteRules that rewrite.

boasting_j




msg:4525936
 4:11 pm on Dec 7, 2012 (gmt 0)

- Is there a way to avoid the redirect to an error page? e.g. example.com/index.php/fdgidfgireg should 404 but instead it does a 301 to the www version first. Ideas?


Hmm I'd be interested to know as well. Seems like that would be the next step in this setup.

g1smd




msg:4525942
 4:55 pm on Dec 7, 2012 (gmt 0)

There is a way.

Stop the htaccess file from redirecting those requests and only those requests, by using a RewriteCond on the non-www/www canonicalisation ruleset, and then set up the PHP functionality at the beginning of the index.php file, as follows:

- if there is no page name found in the database matching the request, send 404 header and HTML page with error message,
- if the page name matches a valid database entry, but the wrong hostname was requested, send a 301 redirect to the correct URL,
- if the page name matches a valid database entry and the correct hostname was requested, send 200 OK status and the page of content.

There is no way to send the 404 status from the htaccess rules. You have to let the PHP script handle that part of the functionality, and specifically exclude the relevant htaccess rules from dealing with the request and redirecting it.

Sgt_Kickaxe




msg:4526222
 2:16 am on Dec 9, 2012 (gmt 0)

I'm encountering some odd behavior in wordpress when the following is placed above the index.php and www to non-www code. Ideas?

RedirectMatch permanent ^/some-old-url$ http://www.example.com/some-new-url

What happens is that http://example.com/index.php/some-old-url first redirects to http://www.example.com/some-old-url and then goes to http://www.example.com/some-new-url. Since the redirect matched shouldn't it go right to http://www.example.com/some-new-url?

The standard wordpress entry is further down the page and seems to be redirecting first...
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]


edit: RedirectMatch is used instead of Redirect due to host limitations

Sgt_Kickaxe




msg:4526235
 5:59 am on Dec 9, 2012 (gmt 0)

To clarify, the following works for visitors but it creates a double 301 redirect for the RedirectMatch if www is not present or /index.php/ is present in the request url(on wordpress sites).

# redirected url
RedirectMatch Permanent ^/some-old-url$ http://www.example.com/the-new-url

# remove index.php from urls
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /index\.php(/[^\ ]*)?\ HTTP/
RewriteRule ^index\.php(/(.*))?$ http://www.example.com/$1 [R=301,L]

# add www if it's missing
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

# wordpress basic
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]


In spending the past hour searching the net I've found that this is a problem other wordpress sites have too and the owners may not even be aware of it. I haven't found a solution yet.

From: /index.php/some-old-url
To: /some-old-url <----- why is this 301 happening with the above?
To: /the-new-url

lucy24




msg:4526248
 7:39 am on Dec 9, 2012 (gmt 0)

Ahem. How many times has g1 told you not to mix mod_alias (Redirect by that name) and mod_rewrite? You didn't listen, did you? Since they are different mods, it cannot possibly make any difference what order they're in. Each mod is handled as a unit, and the order of module execution is out of your control.

Change the RedirectMatch (mod_alias) to a RewriteRule and all should be copacetic.

Sgt_Kickaxe




msg:4526256
 8:59 am on Dec 9, 2012 (gmt 0)


RedirectMatch Permanent ^/some-old-url$ http\:\/\/www\.example\.com\/the\-new\-url

RewriteRule ^/some-old-url$ http\:\/\/www\.example\.com\/the\-new\-url [R=301,L]

Redirect 301 ^/some-old-url$ http\:\/\/www\.example\.com\/the\-new\-url


All give the same result, Lucy. I also asked the host for help and got the standard "looks good, should be working, we tweaked our end a tiny bit for you" reply only to end up with example.com//the-new-url. I called back and got the "oops, fixed that, all looks good" response a 2nd time... and I'm back to the results mentioned above.

My visitors don't care, but Googlebot is not following the double 301, yet. Is there ANYTHING else I can try?

edit: I tried various wordpress plugins, all give the same double redirect when dealing with a non-www/index.php situation at the same time.

g1smd




msg:4526262
 9:16 am on Dec 9, 2012 (gmt 0)

RewriteRule ^/some-old-url$ http\:\/\/www\.example\.com\/the\-new\-url [R=301,L]

The above rule will NEVER run because the leading slash will never match. Additionally, do not escape anything in the target URL. The target URL is literal.

RewriteRule ^some-old-url$ http://www.example.com/the-new-url [R=301,L]

This rule must appear near the beginning of the list of rules. The usual spot will be just after any rules that block specific malicious URL requests, and a long way before any index or non-www/www redirect.

What are the hosts "fixing" exactly? Sounds like they are adding rules to the main site configuration file that either contradict your htaccess rules or have syntax or other errors. If their rule uses Redirect or RedirectMatch then that is a big problem. Once you use RewriteRule for any of your rules you must use it for all of your rules.

Sgt_Kickaxe




msg:4526271
 9:58 am on Dec 9, 2012 (gmt 0)

I think the host temporarily migrated my site to a server running an older version of apache to be honest. I had to add the backslash for my rewriterule to work. After my second call and their second fix I had to remove it again. I vaguely remember an older version of apache requiring the backslash after ^ ?

This host offers a redirect service in their dashboard, I used it and got the escaped version of the target url right from there, go figure. That dashboard service didn't help resolve the double 301 either.

Is there any way I can see what rules the host is using in their configuration? I've tried exactly what you've posted, and a dozen variations of, and yes I've cleared cache but although the redirections work I'm seeing a double 301 redirect, still. The rewriterule is immediately after the rewriteengine on, which is at the top of the file. I removed most everything else and nothing changed. I'm not running any addons or modifications that impact redirects, no SEO plugins etc.

g1smd




msg:4526273
 10:21 am on Dec 9, 2012 (gmt 0)

Remove everything and test again on a clean browser to see the effect of any hidden rules. The only way to see their code is to get your host to send it to you in an email.

Control panels (CP, PLesk, etc) generate the absolute worst htaccess code there is. The unwanted escaping in the target URL is just one of the common offences against decent code.

The RegEx pattern matches the path part of the URL request, and is presented to the parser after it has been "localised" on a "per folder" basis. This means the leading slash, or the leading folders ending with a slash in the case of htaccess in a sub-folder, has already been stripped by the time mod_rewrite gets to look at it.

Sgt_Kickaxe




msg:4526280
 10:40 am on Dec 9, 2012 (gmt 0)

Thanks for all the help, unfortunately if I include only the above in an otherwise blank .htaccess file the server still returns a two step redirect. It first strips out the /index.php/ while retaining the old path and then redirects to the new url. Live HTTP headers say each step is given a separate 301, googlebot entries in my logs show the same.

I'll call the host once more, this isn't an htaccess issue anymore I don't think, at least not in my file. Sorry for the thread hijack!

g1smd




msg:4526333
 5:40 pm on Dec 9, 2012 (gmt 0)

If there's a redirect happening for some requests and none of your site folders (root or sub-folder) have a htaccess file with redirect code of any sort within, then it is either the WordPress configuration, a WordPress add-on or plug-in or a directive in the main httpd.conf (or equivalent) file for the site that is to blame.

Sgt_Kickaxe




msg:4526349
 7:02 pm on Dec 9, 2012 (gmt 0)

It's also not just double 301 redirects as mentioned above, if a page is 404 but the request comes in with /index.php/ the result is a 301 to the non /index.php/ version before a 404.

The only plugin, or folder with an htaccess file in it, was the mobile plugin. Unfortunately removing that didn't help, the host process each redirect request individually. I'm still trying to solve this.

- It seems that the server is processing conditional rewriterules before unconditional rewriterules regardless of order in htaccess, is there a safe condition to add to a bare rewriterule in order to *trick* this particular behavior?

lucy24




msg:4526404
 10:35 pm on Dec 9, 2012 (gmt 0)

Each module goes through the system from top to bottom. That means that for any given module, material in the config file will always be read and executed before material in .htaccess. Within any given config or htaccess file, RewriteRules are executed in linear order. Not even your host can change this.

Did you fine-tooth-comb your control-panel settings? Mine (not CP by name) doesn't have anything about index files but it does have a with/without www option.

(Aside: Yes, this can theoretically lead to a double redirect. But only if a user requests a renamed page with the wrong form of the domain name, which is not easy to do by accident, since bookmarks and browser history will record the correct name. They'd have to be typing from memory in one of those ### browsers that sticks www. onto the front of anything you type, on a site whose preferred form doesn't use www. In practice I just get the occasional 301-to-403 sequence, which is no skin off my nose.)

Double-check:
When you say "without index.php" you mean something like
www.example.com/directory/
getting 301-redirected (not silently rewritten) to
www.example.com/directory/index.php

?
Not
www.example.com/directory
getting 301-redirected to
www.example.com/directory/
which in turn gets rewritten (not redirected) to
www.example.com/directory/index.php

The second version is a completely different and unrelated situation so I want to be sure.

g1smd




msg:4526430
 12:06 am on Dec 10, 2012 (gmt 0)

No, he means
example.com/index.php/page-name being redirected to www.example.com/page-name
lucy24




msg:4526440
 12:23 am on Dec 10, 2012 (gmt 0)

Urk. Does WordPress have built-in redirecting business that's entirely independent of htaccess? Are CMSes in general designed only to work if you are too afraid of computers to ever touch anything with your own hands, so they'll break if you try to fine-tune them?

Which reminds me that I'd meant to go in search of a moderator to fix the typo in the thread title-- it brings up odd mental pictures-- but I think I've plagued them enough for one day.

Sgt_Kickaxe




msg:4526474
 5:14 am on Dec 10, 2012 (gmt 0)

Ack, beaking...

As g1smd said, it's about example.com/index.php/page-name being redirected to www.example.com/page-name

If the www is missing or the index.php/ is in the requested url, or both, the page redirects as it should, once via 301.

If either the www is missing or the index.php/ is in the requested url, or both, AND the request is to /old-page-name which I redirect to /page-name then the result is a double redirect. The server first redirects to www.example.com/old-page-name(adding www and removing index.php) and then redirects again immediately to www.example.com/page-name, both redirects via 301.

It's not a problem for my ursers, it's googlebot that's fumbling around with it in my logs and I imagine it's not helping rankings any.

I get the same problem without wordpress on this server. I've tried a lot of things, here is the latest .htaccess try...

Options -Indexes
Options +FollowSymLinks
Options -MultiViews
RewriteEngine on
RewriteBase / <--- required on this host

# permanently redirected page
RewriteRule ^some-old-url$ http://www.example.com/my-new-url [R=301,L]

# redirect requests containing index.php/ to their NON index.php/ version
rewritecond %{THE_REQUEST} ^[A-Z]+\ /index\.php(/[^\ ]*)?\ HTTP/
rewriterule ^index\.php(/(.*))?$ http://www.example.com$1 [R=301,L]

# add www if it is missing
rewritecond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
rewriterule (.*) http://www.example.com/$2 [R=301,L]



... yet example.com/index.php/some-old-url still redirects twice to get to www.example.com/my-new-url when the first rule *should* have caught it all in one step. Is the server ignoring anything after the .php with my first rule like it does when there are parameters?

e.g. www.example.com/some-old-url?para=12345 is redirected by the first rule despite not matching it exactly. I'm not an expert, scratching for ideas here...

lucy24




msg:4526522
 7:21 am on Dec 10, 2012 (gmt 0)

# redirect requests containing index.php/ to their NON index.php/ version
rewritecond %{THE_REQUEST} ^[A-Z]+\ /index\.php(/[^\ ]*)?\ HTTP/
rewriterule ^index\.php(/(.*))?$ http://www.example.com$1 [R=301,L]

This will only work on requests in the form
www.example.com/index.php{possibly-more-garbage-here}
not for anything like
www.example.com/directory/index.php{et cetera}.
Is that all you need?

(/(.*))?
Since you're not capturing the post-slash bit separately, you don't need parentheses: a plain
(/.*)?
will do.

(/[^\ ]*)?
The character you're groping for is \S as in
(/\S*)?
Remember that in RegEx, if \a means something, \A almost always means [^\a]. \S technically means "no whitespace of any kind" but in this context it doesn't matter, because you're in strict single-line mode, and other space-type things like tabs or non-breaking spaces don't occur.

In fact all you need in the REQUEST line is
\S*
because the Rule itself has already specified that if anything comes after "index.php" it has to start with a slash.

example.com/index.php/some-old-url still redirects twice

What's the intervening step? Live Headers, or whatever you're using, will say.

g1smd




msg:4526528
 7:46 am on Dec 10, 2012 (gmt 0)

msg:4526235 mentioned the path part of the 3 URLs, but didn't clarify the requested hostname for the middle step.

This 37 message thread spans 2 pages: 37 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved