Forum Moderators: phranque

Message Too Old, No Replies

Characters and spaces being oddly escaped in urls?

Drupal 7 running in subdirectory.

         

velcrobots

4:12 am on Oct 2, 2011 (gmt 0)

10+ Year Member



I have Drupal 7 running in a subdirectory of my domain (the folder is called "drupal.").

[ainonline.com...]

This morning I took the new Drupal site live, and put an htaccess folder in the root to redirect to the subfolder. I also have an htaccess in the Drupal root to hide the "drupal."

Since then, any characters like apostrophes or spaces are being escaped, but sort of weird. You can test this by using the search box in my header with more than one word or quotes, or by clicking on a link that has a special character, like an apostrophe.

Here is my root htaccess:

Options -Indexes
Options +FollowSymLinks
RewriteEngine on


# stuff to let through (ignore)
RewriteCond %{REQUEST_URI} "/openx/" [OR]
RewriteCond %{REQUEST_URI} "/typo3/" [OR]
RewriteCond %{REQUEST_URI} "/oa/"
RewriteRule (.*) $1 [L]


# Redirect all user to without WWW
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

# Serve Drupal from sub directory in web root
RewriteRule ^$ drupal/index.php [L]
RewriteCond %{DOCUMENT_ROOT}/drupal%{REQUEST_URI} -f
RewriteRule .* drupal/$0 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* drupal/index.php?q=$0 [QSA]


And here is the htaccess in the Drupal root:

#
# Apache/PHP/Drupal settings:
#

# Protect files and directories from prying eyes.
<FilesMatch "\.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|
theme|tpl(\.php)?|xtmpl)$|^(\..*|Entries.*|Repository|Root|Tag|Template)$">
Order allow,deny
</FilesMatch>

# Don't show directory listings for URLs which map to a directory.
Options -Indexes

# Follow symbolic links in this directory.
Options +FollowSymLinks

# Make Drupal handle any 404 errors.
ErrorDocument 404 /index.php

# Force simple error message for requests for non-existent favicon.ico.
<Files favicon.ico>
# There is no end quote below, for compatibility with Apache 1.3.
ErrorDocument 404 "The requested file favicon.ico was not found.
</Files>

# Set the default handler.
DirectoryIndex index.php index.html index.htm

# Override PHP settings that cannot be changed at runtime. See
# sites/default/default.settings.php and drupal_initialize_variables() in
# includes/bootstrap.inc for settings that can be changed at runtime.

# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
php_flag magic_quotes_gpc off
php_flag magic_quotes_sybase off
php_flag register_globals off
php_flag session.auto_start off
php_value mbstring.http_input pass
php_value mbstring.http_output pass
php_flag mbstring.encoding_translation off
</IfModule>

# Requires mod_expires to be enabled.
<IfModule mod_expires.c>
# Enable expirations.
ExpiresActive On

# Cache all files for 2 weeks after access (A).
ExpiresDefault A1209600

<FilesMatch \.php$>
# Do not allow PHP scripts to be cached unless they explicitly send cache
# headers themselves. Otherwise all scripts would have to overwrite the
# headers set by mod_expires if they want another caching behavior. This may
# fail if an error occurs early in the bootstrap process, and it may cause
# problems if a non-Drupal PHP file is installed in a subdirectory.
ExpiresActive Off
</FilesMatch>
</IfModule>

# Various rewrite rules.
<IfModule mod_rewrite.c>
RewriteEngine on

# Block access to "hidden" directories whose names begin with a period. This
# includes directories used by version control systems such as Subversion or
# Git to store control files. Files whose names begin with a period, as well
# as the control files used by CVS, are protected by the FilesMatch directive
# above.
#
# NOTE: This only works when mod_rewrite is loaded. Without mod_rewrite, it is
# not possible to block access to entire directories from .htaccess, because
# <DirectoryMatch> is not allowed here.
#
# If you do not have mod_rewrite installed, you should remove these
# directories from your webroot or otherwise protect them from being
# downloaded.
RewriteRule "(^|/)\." - [F]

# If your site can be accessed both with and without the 'www.' prefix, you
# can use one of the following settings to redirect users to your preferred
# URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
#
# To redirect all users to access the site WITH the 'www.' prefix,
# (http://example.com/... will be redirected to http://www.example.com/...)
# uncomment the following:
# RewriteCond %{HTTP_HOST} !^www\. [NC]
# RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
#
# To redirect all users to access the site WITHOUT the 'www.' prefix,
# (http://www.example.com/... will be redirected to http://example.com/...)
# uncomment the following:
# RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
# RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

# Modify the RewriteBase if you are using Drupal in a subdirectory or in a
# VirtualDocumentRoot and the rewrite rules are not working properly.
# For example if your site is at http://example.com/drupal uncomment and
# modify the following line:
RewriteBase /drupal
#
# If your site is running in a VirtualDocumentRoot at http://example.com/,
# uncomment the following line:
# RewriteBase /


# Pass all requests not referring directly to files in the filesystem to
# index.php. Clean URLs are handled in drupal_environment_initialize().
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^ index.php [L]

# Rules to correctly serve gzip compressed CSS and JS files.
# Requires both mod_rewrite and mod_headers to be enabled.
<IfModule mod_headers.c>
# Serve gzip compressed CSS files if they exist and the client accepts gzip.
RewriteCond %{HTTP:Accept-encoding} gzip
RewriteCond %{REQUEST_FILENAME}\.gz -s
RewriteRule ^(.*)\.css $1\.css\.gz [QSA]

# Serve gzip compressed JS files if they exist and the client accepts gzip.
RewriteCond %{HTTP:Accept-encoding} gzip
RewriteCond %{REQUEST_FILENAME}\.gz -s
RewriteRule ^(.*)\.js $1\.js\.gz [QSA]

# Serve correct content types, and prevent mod_deflate double gzip.
RewriteRule \.css\.gz$ - [T=text/css,E=no-gzip:1]
RewriteRule \.js\.gz$ - [T=text/javascript,E=no-gzip:1]

<FilesMatch "(\.js\.gz|\.css\.gz)$">
# Serve correct encoding type.
Header append Content-Encoding gzip
# Force proxies to cache gzipped & non-gzipped css/js files separately.
Header append Vary Accept-Encoding
</FilesMatch>
</IfModule>
</IfModule>

[edited by: tedster at 5:34 pm (utc) on Oct 2, 2011]
[edit reason] added line break to prevent side-scroll [/edit]

g1smd

6:44 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The root file is processed first. The root htaccess already internally rewrites incoming URL requests to the PHP script file for content delivery.

By the time the folder htaccess file is processed it's too late to block that request or do a different rewrite.

You need to combine everything into one file and put it in the root.

lucy24

8:27 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And while he's doing that, can you translate some stuff for me? ;)

RewriteCond %{REQUEST_URI} "/openx/" [OR] 
RewriteCond %{REQUEST_URI} "/typo3/" [OR]
RewriteCond %{REQUEST_URI} "/oa/"
RewriteRule (.*) $1 [L]


Why doesn't this lead to an infinite loop? (For that matter, what's the capturing for? Wouldn't - do the same thing?)

# Redirect all user to without WWW 
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]


? ? ?
:: sob ::
I will never understand this language.

RewriteCond %{DOCUMENT_ROOT}/drupal%{REQUEST_URI} -f 
RewriteRule .* drupal/$0 [L]


Ditto. :(

Now, about those horribly long lines in the first post... (Nothing wrong with them in .htaccess. In the browser it's a different matter.)

g1smd

8:48 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The (.*) $1 [L] notation confused me the first time I saw it, but amazingly Apache seems to "know" that this substitution simply means "Keep calm and carry on".

jdMorgan explained it here at one point, but I forget the details. It may simply be that the very earliest vesions of Apache didn't yet feature the "null" substition character, as in .* - [L] we see now.

You would know
RewriteCond %{REQUEST_URI} "/openx/" [OR]
RewriteCond %{REQUEST_URI} "/typo3/" [OR]
RewriteCond %{REQUEST_URI} "/oa/"
RewriteRule (.*) $1 [L]
as

RewriteCond %{REQUEST_URI} ^/(openx|typo3|oa)/$
RewriteRule . - [L]


which is the way I would do it.



RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
The above code says "If the requested hostname begins 'www.' then capture everthing after the 'www.' in %1 for later re-use."

This allows that one rule to redirect
www.example.com
to
example.com
AND
www.foobar.com
to
foobar.com
but there's a downside. Requests with appended port number, such as
www.example.com:80
will not have the port number stripped by the redirect. Incorrect casing of hostname will also not be fixed, but that's not such a major issue.



RewriteCond %{DOCUMENT_ROOT}/drupal%{REQUEST_URI} -f
This does exactly what it says, namely testing "is there a physical file in the /drupal/ folder matching the requested URL path?"

A request for /drupal/thatfile.ext will be rewritten to fetch content from a script, unless that exact named resource already exists as a physical file.


One issue with the OP code though. In Apache, $0 isn't defined. The requested path will need to be captured using ( ) and $1 used in the rule target.

lucy24

10:08 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

Does ^ followed by nothing mean the same as . ?

Heh. Auto-linking goes severaly haywire when it's presented with the %{blahblah} formulation. Good thing the old php/bb2 trick still works here.

RewriteCond %{DOCUMENT_ROOT}/drupal%{REQUEST_URI} -f
RewriteRule .* drupal/$0 [L]

This does exactly what it says, namely testing "is there a physical file in the /drupal/ folder matching the requested URL path?"

A request for /drupal/thatfile.ext will be rewritten to fetch content from a script, unless that exact named resource already exists as a physical file.

"If it exists, go there, and if not, stay in this htaccess where you will receive further instructions" ?

velcrobots

12:32 pm on Oct 2, 2011 (gmt 0)

10+ Year Member



The root file is processed first. The root htaccess already internally rewrites incoming URL requests to the PHP script file for content delivery.

By the time the folder htaccess file is processed it's too late to block that request or do a different rewrite.

You need to combine everything into one file and put it in the root.


So you're saying that the htaccess in the drupal folder isn't even being processed? I followed instructions to the letter from drupal.org on this.

Along the same lines, where do redirects go? I have a bunch I need to put in, and I can't get them to work.

I will also never understand this language.

g1smd

1:38 pm on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Redirects must always be processed before rewrites (because redirects tell the browser to make a request for a new URL, but a rewrite maps an external URL request to an internal filepath to actually deliver the content).

This means that redirects need to go in the .htaccess file in the root folder and ahead of any rules which perform a rewrite function. Rewrites can go in the root .htaccess file or in a folder .htaccess file, but the rewriting rules in the root .htaccess file need to be "aware" that the folder exists and exclude processing any requests for URLs that map to it - so it's much easier to put all of the rules in the one file in the root and be done with it.

If a request is processed as a rewrite in the root .htaccess file and then as a redirect in the folder .htaccess file (actually as a redirect of any type, in later rules in any .htaccess file), the redirect will expose the previously rewritten internal file path back out on to the web as a new URL. That would be a disaster.

Request
example.com/pagename
and expect to be redirected to
www.example.com/pagename
with an internal rewrite fetching the content from
/index.php/?page=pagename
inside the server.

If the rules are in the wrong order, the request for
example.com/pagename
is rewritten to the internal filepath at
/index.php?page=pagename
ready to serve the content, but then a redirect is issued and the browser is told to make a new request for
www.example.com/index.php?page=pagename
- exposing the internal filepath back out on to the web as a new URL.

So much for the site using search engine friendly URLs then. Every click within the site sees the user redirected to an unfriendly URL!

The bit that people take a while to fathom out, is that
RewriteRule
can be configured to generate an external redirect, telling the browser to make a new request for a new URL, or it can be configured to process an external URL request and rewrite it to an internal file for content delivery. The syntax for both is only slightly different.

velcrobots

2:09 pm on Oct 2, 2011 (gmt 0)

10+ Year Member



First off, thank you all so much for these responses. These rules and .htaccess files in general absolutely terrify me.

So just to confirm:

I have a handful of redirects (mainly RSS feeds from our old site that drive an iPhone app). I'm going to enter them in the root htaccess file, before 'RewriteEngine On' (right after 'Options +FollowSymLinks' right?

g1smd

3:52 pm on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteEngine On goes right at the beginning, before all of the rules.

RewriteRules which block access for certain requests are first.

Redirects using RewriteRule come next.

Rewrites using RewriteRule are last.

lucy24

7:25 pm on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a handful of redirects (mainly RSS feeds from our old site that drive an iPhone app). I'm going to enter them in the root htaccess file, before 'RewriteEngine On' (right after 'Options +FollowSymLinks' right?

The word "redirect" means two different things.

It means the external-redirect function (301 or 302), which can be achieved either by mod_alias or by mod_rewrite. And possibly by some other processes that we need not talk about.

And it means the word Redirect or RedirectMatch, which belongs exclusively to mod_alias. This is not a really ideal name, since its aliasing functions can't be used in .htaccess. But it can't be helped.

No matter what function mod_rewrite is performing, it can't move a muscle until RewriteEngine is On. Someone else will explain why this line is necessary. I suspect it's something boring and historical that Apache never got around to getting rid of.