Forum Moderators: phranque

Message Too Old, No Replies

www AND no www - consistent urls

Don't blame it all on modx!

         

fruitwerks

3:10 am on May 25, 2010 (gmt 0)

10+ Year Member



Hello all! I was told to come and ask about my issue here :) Quick background...

Debian Lenny
Apache 2.x
Modx 1.0.3

I am the server admin / full root access

---

I had a request to do the following:

    If a user lands on the site using www. - all links (sans a few special ones / secure areas) will have the www.

    If a user lands on the site without www. - all the links will be domain.foo/page-blah (no www.)


- Now this works almost perfectly with the below config. Although some paths are not correct and sub-folder pages only work on the initial page load. So if we click on a link for domain.foo/article/story1.php - the page doesn't load (modx redirects it to the index as said to for an error). But then if you look at the link again it is now domain.foo/article/article/story1.php. This can go on forever (well didn't try it) but it doesn't fix itself!

So if I add the base href in the head of the document as required for modx, I have to choose www or no www. I am here because I was told that this issue is not specific to modx. And early on I felt that some apache magic would be required.

If you are thinking I need a ServerAlias or redirect match, that is not what I am after, please read my post again. Even if you don't have an answer, but notice something stupid or outdated in my conf, let me know :) I have not done this for a few years.

Thanks Everyone!

host config

<virtualhost *:80>
DocumentRoot "/var/www/somesite.com/"
<Directory "/var/www/somesite.com/">
allow from all
Options FollowSymLinks Indexes
AllowOverride all
DirectoryIndex index.php
</Directory>
RewriteEngine On
RewriteRule ^/?secure/(.*) https://%{SERVER_NAME}/secure/$1 [R,L]
RewriteRule ^/?signup/(.*) https://%{SERVER_NAME}/signup/$1 [R,L]
ServerName somesite.com
CustomLog /var/log/apache2/somesite.com-access.log combined
ErrorLog /var/log/apache2/somesite.com-error.log
LogLevel warn
</VirtualHost>

<virtualhost *:443>
DocumentRoot "/var/www/somesite.com/"
<Directory "/var/www/somesite.com/">
allow from all
Options FollowSymLinks Indexes
AllowOverride all
DirectoryIndex index.php
</Directory>
RewriteEngine On
RewriteMap lc int:tolower
RewriteCond %{REQUEST_URI} [A-Z]
RewriteCond %{REQUEST_URI} !^/cartoweb3
RewriteCond %{REQUEST_URI} !^/tmp
RewriteCond %{REQUEST_URI} !^/ms_tmp
RewriteRule (.*) ${lc:$1} [R=301,L]
ServerName somesite.com
CustomLog /var/log/apache2/somesite.com-access.log combined
ErrorLog /var/log/apache2/somesite.com-error.log
LogLevel warn
SSLEngine on
SSLCertificateFile /etc/apache2/ssl/somesite.com.crt
SSLCertificateKeyFile /etc/apache2/ssl/www.somesite.com.key
SSLCertificateChainFile /etc/apache2/ssl/gd_bundle.crt
</VirtualHost>


.htaccess

RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA,NC]
FileETag none
AddType text/x-component .htc
php_flag register_globals Off
php_flag zlib.output_compression On
php_value zlib.output_compression_level 9
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4\.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
SetOutputFilter DEFLATE
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ \
no-gzip dont-vary
SetEnvIfNoCase Request_URI \
\.(?:exe|t?gz|zip|bz2|sit|rar)$ \
no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary
ErrorDocument 403 /

jdMorgan

4:06 am on May 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is in the requirements, as this requires that you use/generate only page- or server-relative links in the HTML, you cannot use "base href", *and* you're intentionally creating two Web sites (www and non-www) to compete with each other for inbound links and ranking.

Therefore, the requirements are ill-advised. Pick "www" or "non-www" for your canonical hostname, set that as your ServerName, and link to only that canonical hostname. Redirect all other variants of subdomains, casing, FQDN-format, and appended port numbers to the single correct hostname for your site.

See our Google Library for threads on canonicalization like the well-named thread "Duplicate Content - Get it right or perish."

To avoid the removal of "www" by your present config file rules, you could use %{HTTP_HOST} instead of %{SERVER_NAME}, but only if %{HTTP_HOST} is non-blank (which will be the case for any true HTTP/1.0 requests to your server). However, I post this with the feeling that I might be assisting in an SEO-suicide here...

RewriteCond %{HTTP_HOST} ^(.+)$
RewriteRule ^/((secure|signup)/.*)$ https://%1/$1 [R=302,,L]

Several performance tweaks are possible for your code above, but that should wait pending resolution of the 'big problem.'

Really, the problem is in the requirements, and I hope you (your company) re-considers this folly.

Best,
Jim

fruitwerks

4:32 pm on May 25, 2010 (gmt 0)

10+ Year Member



Thanks jd! I agree it should be one or the other, I need to convince the client. Although I did find this interesting article pertaining to what you brought up.

[seomoz.org...]

The site has good rankings - depending on what you type in, we are usually in the top 5! Anyway I am leaning more towards a fix in the modx system itself. We already use a plugin that parses urls, but it does not support exactly what I am after. It could easily be modified, but that is a bit beyond my scope.

I don't think much of the above configs are going to change, so it would be safe to assist in the tweaks you mentioned.

Thanks All!

jdMorgan

8:48 pm on May 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, two more tweaks:

Eliminate unneeded RewriteCond from lowercasing rule by using the RewriteRule pattern:

RewriteCond %{REQUEST_URI} !^/cartoweb3
RewriteCond %{REQUEST_URI} !^/tmp
RewriteCond %{REQUEST_URI} !^/ms_tmp
RewriteRule ^([^A-Z]*[A-Z].*)$ ${lc:$1} [R=301,L]

Note that in config files (which are "compiled" on server restart, it is faster to use separate RewriteConds in mod_rewrite code. This is in contrast to mod_rewrite in .htaccess, where the "local OR" used to combine RewriteConds is faster.

Don't check file-exists to prevent rewriting index.php to itself, and don't waste resources checking disk for filetypes that the script itself cannot generate:

RewriteCond $1 !^index\.php$
RewriteCond $1 !\.(gif|jpe?g|png|ico|css|js|swf|flv|pdf)$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA,NC]

The exclusion of index.php itself and of the list of filetypes from this rule prevents calling the OS to check the filesystem *at least twice* for each and every request to your server. This one mod can noticeably improve your page-load time.

You may add to that exclusion list if you like. Put the filetypes in order of most- to least-frequently requested for best performance.

Obviously, the site currently "works" without any exclusions, and you will reach a point of diminishing returns; Significant benefit is to be had even if you only include the most-frequently-requested filetypes, and there's really no use trying to add them all.

Jim

fruitwerks

1:48 am on May 26, 2010 (gmt 0)

10+ Year Member



Thanks for the info - although I am a bit confused on exactly what you suggest I change. Is it possible to move everything into the httpd conf? I know htaccess is for those who don't always have admin access. I'm not a pro, but I can do quite a bit. I ask because I see things in htaccess files that you never really see in the main httpd confs.

Thanks!

jdMorgan

1:55 am on May 26, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I simply suggest you change the two rules shown above in the way shown above, the first one in your config file and the second in your .htaccess.

You can move your rules from .htaccess to your config file if you like, but be aware that the syntax does change slightly, as does the "internal operation" of rule processing. This latter is because rules in config context are processed during the URL-to-filepath translation phase of the API, while rules in .htaccess are processed during the Fix-up phase.

Jim

fruitwerks

9:58 pm on May 31, 2010 (gmt 0)

10+ Year Member



ok hope I got this right - can someone verify? I haven't tested extensively, but everything appears to be working


NameVirtualHost *:443

<virtualhost *:80>
DocumentRoot "/var/www/foodomain.com/"
<Directory "/var/www/foodomain.com/">
allow from all
Options FollowSymLinks Indexes
AllowOverride all
DirectoryIndex index.php
</Directory>
ServerAlias www.foodomain.com
RewriteEngine On
RewriteRule ^/?secure/(.*) https://%{HTTP_HOST}/secure/$1 [R,L]
RewriteRule ^/?signup/(.*) https://%{HTTP_HOST}/signup/$1 [R,L]
ServerName foodomain.com
CustomLog /var/log/apache2/foodomain.com-access.log combined
ErrorLog /var/log/apache2/foodomain.com-error.log
LogLevel warn
</VirtualHost>

<virtualhost *:443>
DocumentRoot "/var/www/foodomain.com/"
<Directory "/var/www/foodomain.com/">
allow from all
Options FollowSymLinks Indexes
AllowOverride all
DirectoryIndex index.php
</Directory>
ServerAlias www.foodomain.com
RewriteEngine On
RewriteRule ^([^A-Z]*[A-Z].*)$ ${lc:$1} [R=301,L]
ServerName foodomain.com
CustomLog /var/log/apache2/foodomain.com-access.log combined
ErrorLog /var/log/apache2/foodomain.com-error.log
LogLevel warn
SSLEngine on
SSLCertificateFile /etc/apache2/ssl/foodomain.com.crt
SSLCertificateKeyFile /etc/apache2/ssl/www.foodomain.com.key
SSLCertificateChainFile /etc/apache2/ssl/gd_bundle.crt
</VirtualHost>

<VirtualHost *:80>
ServerName foodomain.net
ServerAlias www.foodomain.net
RedirectMatch 301 (.*) [foodomain.com$1...]
</VirtualHost>


Thanks!

g1smd

7:36 am on Jun 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Be aware the [R,L] generates a 302 redirect.

Specify [R=301,L] if a 301 redirect is required, and [R=302,L] to remind you there is a 302 redirect there.

Try not to use Redirect or RedirectMatch, use only RewriteRule for all of your rules.