Welcome to WebmasterWorld Guest from 23.20.75.214

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Not able to avoid external redirect

     
4:48 pm on Jun 5, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts: 34
votes: 0


In my .htaccess file, I have this code:


# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

AuthName xyz.com
AuthUserFile /home/localtig/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/localtig/public_html/_vti_pvt/service.grp

Options +FollowSymLinks
RewriteEngine on
RewriteCond $1 !(^index\.php|\.(gif|jpe?g|ico|css)|^robots\.txt)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
#RewriteCond %{REQUEST_URL} !=/favicon.ico
RewriteRule ^(.*)$ http://xyz.com/index.php?q=$1 [L,QSA]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.xyz.com/$1 [R=301,L]


This should redirect a url like

http://subdomain.xyz.com/sample-page


to

http://www.xyz.com/index.php?q=sample-page


...while retaining the URL in the address bar as

http://subdomain.xyz.com/sample-page


The problem is that this same .htaccess file acts differently on 2 machines.

One of them has version 2.2.13 of apache, and another has 1.3.37. On the former, the redirect is becoming external (i. e. the URL in the address bar is changing), while in the latter (1.3.37), it works just fine.

I do not even know if (or think) the version is a problem, since the 2.2.13 version, for another site hosted on it using the same httpd.conf file and with exactly the same directives, works just fine for all rewrites.

Does anyone have a clue what is going wrong? I'll be grateful for any help!
5:53 pm on June 5, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


This is a problem we warn about every single day in this forum.

The solution is very simple.

List the redirect block of code BEFORE the rewrite block of code.

Remove the domain name from the target of the rewrite.

Leave the domain name intact on the target of the redirect.


Also, "REQUEST_URL" should be "REQUEST_URI" I believe.
7:39 pm on June 5, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> Also, "REQUEST_URL" should be "REQUEST_URI" I believe.

Yes, it should be "REQUEST_URI". But that RewriteCond isn't needed at all, because any .ico filetype is already excluded by the first rewritecond.

Jim
6:27 am on June 6, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts: 34
votes: 0


Thank you so much, g1smd. And I was thinking this would be one of *those* problems that will never have a simple solution. I just removed the domain name and it worked. That is why I love webmasterworld.com so much. The best of folks hang out here.

And jdMorgan, thanks for that suggestion. I changed that to REQUEST_URI.

I did not understand this line, however - it would be great if you could give an example:

"List the redirect block of code BEFORE the rewrite block of code."

Thank you so much again.
6:36 am on June 6, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


If you had added comments to your blocks of code, the first comment would have said:

# Internally rewrite incoming URL requests to the script file.


and the second comment would have said:

# Externally redirect URL requests with index.php to remove filepath from URL.


Now that you know what each block of code does, the instruction should be a little more clear.
1:20 pm on June 6, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts: 34
votes: 0


Thanks again, g1smd. Have a good day.
4:40 pm on June 6, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Let's see the final code!
4:56 pm on June 6, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts: 34
votes: 0


Here goes:


# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

AuthName xyz.com
AuthUserFile /home/localtig/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/localtig/public_html/_vti_pvt/service.grp

Options +FollowSymLinks

RewriteEngine on
RewriteCond $1 !(^index\.php|\.(gif|jpe?g|ico|css)|^robots\.txt)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?q=$1 [L,QSA]

#below to redirect www.xyz.com/folder1/folder2/.../foldern/index.php to www.xyz.com/folder1/folder2/../foldern/
#was written to redirect www.xyz.com/index.php to www.xyz.com/ for Google to index only the latter
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ /$1 [R=301,L]


Another small clarification. REQUEST_URI in apache is just the $_SERVER['PHP_SELF'] of PHP, right? Since $_SERVER['REQUEST_URI'] of PHP includes the query string, too?
4:57 pm on June 6, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts: 34
votes: 0


And while you suggested I list the redirect part before the rewrite part, this was working, so I did not dare touch it :). I added the comments, though, so you know what I was trying to do.
3:11 pm on June 7, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


If you do not put the redirect first, then your code will 'expose' the internal /index.php filepath to search engines, and hut your search engine rankings by creating duplicate content.

Do not ignore this advice. If you are hesitant to re-arrange the rules, then by all means, do so "temporarily" and then test the results thoroughly. But do not ignore the advice; you got it from someone with a lot of experience and knowledge...

Jim
4:13 am on June 8, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts: 34
votes: 0


Thanks, jdMorgan. I made the change suggested by g1smd and you, and luckily everything seems to work fine (so far):


# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

AuthName xyz.com
AuthUserFile /home/xyz/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/xyz/public_html/_vti_pvt/service.grp

Options +FollowSymLinks

#below to redirect www.xyz.com/folder1/folder2/.../foldern/index.php to www.xyz.com/folder1/folder2/../foldern/
#was written to redirect www.xyz.com/index.php to www.xyz.com/ for Google to index only the latter
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ /$1 [R=301,L]

RewriteEngine on
RewriteCond $1 !(^index\.php|\.(gif|jpe?g|ico|css)|^robots\.txt)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?q=$1 [L,QSA]
7:24 am on June 8, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Add comments to the last block of code to describe the rewrite!
1:01 pm on June 9, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts: 34
votes: 0



# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

AuthName xyz.com
AuthUserFile /home/xyz/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/xyz/public_html/_vti_pvt/service.grp

Options +FollowSymLinks

#below to redirect www.xyz.com/folder1/folder2/.../foldern/index.php to www.xyz.com/folder1/folder2/../foldern/
#was written to redirect www.xyz.com/index.php to www.xyz.com/ for Google to index only the latter
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ /$1 [R=301,L]

#redirect *.xyz.com/abc-def to *.xyz.com/index.php?q=abc-def where abc-def is not index.php or an image/css/ico file or robots.txt
RewriteEngine on
RewriteCond $1 !(^index\.php|\.(gif|jpe?g|ico|css)|^robots\.txt)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?q=$1 [L,QSA]


I think that is a little more explanatory. Thank you so much for all the tips!
2:44 pm on June 9, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Still needs a bit of clean-up... Several syntax problems, "non-optimal" coding, and directives out of order, likely to cause failures:

# Don't list FrontPage or .htaccess files in auto-generated directory index pages
IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*
#
# Access Control
Order deny,allow
#
<Limit GET POST>
Deny from all
Allow from all
</Limit>
#
<LimitExcept GET POST>
Deny from all
</LimitExcept>
#
# Authentication/authorization
AuthName xyz.com
AuthUserFile /home/xyz/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/xyz/public_html/_vti_pvt/service.grp
#
# Set required option to enable mod_rewrite
Options +FollowSymLinks
#
# Enable the rewriting engine
RewriteEngine on
#
# Externally redirect direct client requests for URL-path
# /<any subdirectories>/index.php<optional query and/or fragment> to URL
# www.example.com/<any subdirectories>/<optional query and/or fragment> so
# that Google indexes only the latter
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php([?#][^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1%1 [R=301,L]
#
# Internally rewrite requests for URL-path /abc-def to internal filepath /index.php?q=abc-def
# where abc-def does not resolve to a physically-existing file or directory, and excluding
# index.php, image/css/ico files, or robots.txt to avoid unnecessary file-exists checks
RewriteCond $1 !(^index\.php|\.(gif|jpe?g|ico|css)|^robots\.txt)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?q=$1 [QSA,L]
#
# -end-

You should also consider adding a domain canonicalization rule. For example, redirect all requests for hostnames which are not exactly equal to "www.example.com" to hostname "www.example.com". This rule would follow the first the rule above, as it is a less specific redirect than the first rule, and all external redirects must generally precede any internal rewrites.

Jim
6:09 am on June 10, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts: 34
votes: 0


Wow, that's totally professional work - sets the benchmarks for me hereon :).

I was wondering if I can do away with these chunk of lines at the top:


# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

AuthName xyz.com
AuthUserFile /home/xyz/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/xyz/public_html/_vti_pvt/service.grp

Options +FollowSymLinks


I did not put them there - they seem to have come by default through cPanel, using which I created the account.

Also, another small clarification. REQUEST_URI in apache is just the $_SERVER['PHP_SELF'] of PHP, right (and not the $_SERVER['REQUEST_URI'] of PHP)? Since $_SERVER['REQUEST_URI'] of PHP includes the query string, too?
8:11 am on June 10, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


REQUEST_URI is the path part of the literal "GET /somepath?some-params HTTP/1.1" request sent by the browser.

The "IndexIgnore" rule stops people reading your .htaccess file and other configuration files from the web.

The "deny" rule for PUT and DELETE requests stops hackers messing with your site.

The _vti stuff is for uploading files directly from Frontpage. You can # Comment just those two lines out to see if anything bad happens.

The "Options" line is usually required for correct server operation.
8:20 am on June 10, 2010 (gmt 0)

New User

5+ Year Member

joined:Dec 19, 2008
posts:34
votes: 0


Thanks a lot, g1smd. This thread has cleared up a lot of things for me, and I am grateful you and jdMorgan chose to take time off so generously to share your knowledge.