homepage Welcome to WebmasterWorld Guest from 54.234.147.84
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Help with correct htaccess redirect
chadhaajay




msg:4608890
 7:18 am on Sep 11, 2013 (gmt 0)

I have the following code in my htacess file

Options +FollowSymLinks All -Indexes
RewriteEngine on

RewriteRule article-([0-9]+)\.html$ question.php?ID=$1 [L]
RewriteRule category-([0-9]+)\.html$ category.php?catID=$1 [L]

It works as expected and it has helped us to change dynamic links such as

oursite.com/question.php?ID=11

to

oursite.com/article-10.html

however the issue is that following types of links also work

oursite.com/index.php/article-10.html
oursite.com/index.php/images/images/article-10.html

What can we do to stop working of such links so that search engines do not consider them as duplicate pages? We simply want any of such links to throw a 404 error. Please help!

 

lucy24




msg:4608913
 8:42 am on Sep 11, 2013 (gmt 0)

Options +FollowSymLinks All -Indexes

Aack! Aack! Aaack!

:: runs around in horror for a few minutes ::

:: detour to apache dot org to make sure I'm not imagining things ::

2.2:
Normally, if multiple Options could apply to a directory, then the most specific one is used and others are ignored; the options are not merged. (See how sections are merged.) However if all the options on the Options directive are preceded by a + or - symbol, the options are merged. Any options preceded by a + are added to the options currently in force, and any options preceded by a - are removed from the options currently in force.
Warning

Mixing Options with a + or - with those without is not valid syntax, and is likely to cause unexpected results.


2.4:
Normally, if multiple Options could apply to a directory, then the most specific one is used and others are ignored; the options are not merged. (See how sections are merged.) However if all the options on the Options directive are preceded by a + or - symbol, the options are merged. Any options preceded by a + are added to the options currently in force, and any options preceded by a - are removed from the options currently in force.
Note

Mixing Options with a + or - with those without is not valid syntax, and will be rejected during server startup by the syntax check with an abort.


At best "unwanted results". At worst, server death. (Query: Why is a "Note" more dire than a "Warning"?) What version are you on?

If you don't mean "All", don't say "All". List the ones you do mean. Whether you're on 2.2 or 2.4, FollowSymLinks is included within All.

What can we do to stop working of such links

Your RewriteRules don't have opening anchors. You're lucky nobody has yet asked for
www.example.com/index.php/images/images/ajkgeajwrwke/moregarbage.bzzt/article-10.html

Preferred form:
RewriteRule ^article-([0-9]+)\.html$ /question.php?ID=$1 [L]

Anchor the pattern. Use a leading slash in the target. And if your server is OK with \d instead of [0-9], that will save you three bytes ;)

chadhaajay




msg:4608935
 9:38 am on Sep 11, 2013 (gmt 0)

Hi lucy24,

Thanks for suggesting the changes to add opening anchor (^) and to remove

Options +FollowSymLinks All -Indexes

but now it displays the contents of index.php page in browser when trying to open the url below

yoursite.com/index.php/article-11.html

It should throw a 404 error instead of displaying contents of index.php page.

lucy24




msg:4608958
 11:01 am on Sep 11, 2013 (gmt 0)

It should throw a 404 error instead of displaying contents of index.php page.

Actually, it shouldn't. But now we're in AcceptPathInfo [httpd.apache.org] territory, which is a whole nother issue.

AcceptPathInfo off
=
if the request consists of a real filename with other stuff attached to the end, the request gets a 404.

AcceptPathInfo on
=
if the request consists of a real filename with other stuff attached to the end, the extra stuff is ignored and the valid part of the request is served up.

I had to go look this up, because I find the names counterintuitive and I always get it backward.

AcceptPathInfo default (this is, as the name suggests, the default position)
=
it depends on the extension. Dynamic extensions like .php will probably accept additional garbage; static ones like .html won't.

"pathinfo" = the URL path itself, not the query if any.

This is one of a long list of problems that you don't have to think about until you have to think about them. If you are getting real requests in the form
filename.php/more-stuff-here

you can either redirect to
filename.php

or you can slap down an unequivocal [F]. It depends on the circumstances.

g1smd




msg:4608966
 11:56 am on Sep 11, 2013 (gmt 0)

Rather than requesting
example.com/index.php/article-11.html
you should be requesting
example.com/article-11.html

There is no reason whatsoever for "index.php" to be part of your new URLs.


Back to the original question...

You have implemented the rewrite, such that when a friendly URL is requested, mod_rewrite rewrites the request internally and then invokes question.php or article.php and passes the page name to it as a parameter.

The other part of the job is that you need a RewriteCond/RewriteRule pair that detects when there's an external URL request directly asking for a PHP file with parameters. This rule sends back a redirect response telling the browser to make another request, but this time for the new URL.

lucy24




msg:4609081
 7:07 pm on Sep 11, 2013 (gmt 0)

Rather than requesting
example.com/index.php/article-11.html
you should be requesting
example.com/article-11.html

... and if you make your "index.xtn" redirect without a closing anchor, then the extra-path issue won't arise, because anything even containing "index.xtn" will be redirected. Since it's a redirect rather than a rewrite, this is a perfectly legitimate solution.

chadhaajay




msg:4609179
 5:54 am on Sep 12, 2013 (gmt 0)

Thanks for all the help! We now have the following code in our htaccess file and it seems to work very well. Please have a look and suggest if anything is wrong or should be improved.

# Turn mod_rewrite on
RewriteEngine on

# Redirect non-www urls to www
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteRule ^article-([0-9]+)\.html$ question.php?ID=$1 [L]
RewriteRule ^category-([0-9]+)\.html$ category.php?catID=$1 [L]

# Added "AcceptPathInfo off" to prevent urls like index.php/article-xx.html
<Files "index.php">
ForceType application/x-httpd-php
AcceptPathInfo off
</Files>

[edited by: phranque at 8:19 am (utc) on Sep 12, 2013]
[edit reason] Please Use Example.com [webmasterworld.com] [/edit]

lucy24




msg:4609199
 8:18 am on Sep 12, 2013 (gmt 0)

RewriteRule ^article-([0-9]+)\.html$ question.php?ID=$1 [L]

Use a leading / slash in the target of any internal rewrite:

/question.php et cetera.

Have you ever used URLs in the form
question.php?ID=blahblah

If yes, you will need another set of rules to redirect old requests using the now-wrong form. This is to avoid duplicate content; it won't affect page output. But if you have never actually had URLs with query string, and all your RewriteRules are working properly, you may not need to worry about the redirect at all. If nobody suspects that it exists, nobody will ask for it by name :)

g1smd




msg:4609221
 9:40 am on Sep 12, 2013 (gmt 0)

Minor change from
RewriteCond %{HTTP_HOST} !^www\.example\.com
to
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
is useful.

It then redirects all non-canonical hostname requests, not just a small selection.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved