Forum Moderators: phranque

Message Too Old, No Replies

Rewrite / Redirect problem

How to exclude some file types from redirecting?

         

KoenG

3:03 pm on Nov 1, 2007 (gmt 0)

10+ Year Member



Hello, I hope someone can help me with this problem.
I am having a main domain (www.example.com) with a php application. I also created subdomains for the non-standard languages of my application (eg. nl.example.com and it.example.com for Dutch and Italian). In the subdomains I placed the xml-sitemaps pointing to the sub-domains and some error pages.
Now I want to redirect all requests to the main domain except for the xml-files, txt-files and image files.
The html files are redirected with the language parameter added.
This works fine but the access to http://nl.example/ gives a 403 error...
How can I solve this?
Thanks in advance!

Here is my .htaccess file in nl.example.com:

ErrorDocument 400 /error400.html
ErrorDocument 401 /error401.html
ErrorDocument 403 /error403.html
ErrorDocument 404 /error404.html
ErrorDocument 500 /error500.html
RewriteEngine on
RewriteCond %{REQUEST_URI}!\.[^(xml¦txt¦ico¦gif¦png¦jpg)]$
RewriteRule ^hotel([^.]+).html$ http://www.example.com/hotel$1.html?Lang=nl [L]
Redirect ^/(.*) http://www.example.com/$1!\.[^(html¦xml¦txt¦ico¦gif¦png¦jpg)]$

[edited by: KoenG at 3:05 pm (utc) on Nov. 1, 2007]

[edited by: jdMorgan at 3:23 pm (utc) on Nov. 1, 2007]
[edit reason] example.com [/edit]

jdMorgan

3:20 pm on Nov 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This code is invalid, and it looks like you're trying to write or copy code without reading the documentation [httpd.apache.org]. That can be quite dangerous to the health of your site.

Your first rule should look like this:


RewriteCond %{REQUEST_URI} !\.(xml¦txt¦ico¦gif¦png¦jpg)$
RewriteRule ^hotel([^.]+)\.html$ http://www.example.com/hotel$1.html?Lang=nl [R=301,L]

The Redirect directive is also invalid, but I cannot tell what the intent of it is and so cannot offer any comments.

Jim

[edited by: jdMorgan at 3:21 pm (utc) on Nov. 1, 2007]

KoenG

8:52 pm on Nov 1, 2007 (gmt 0)

10+ Year Member



Thanks jdMorgan for the help.

The RewriteRule was to forward the specific hotel#*$!X.html pages to the main site with the language parameter added.
But this does not redirect all hits on the sub-domain. And that's where my problem lies: all other hits must be redirected except for the hits on .xml, .txt, ... pages. I tried the redirect with a regex expression but that doesn't work obviously.
When I access nl.example.com/sitemap.xml everything goes right.
When I access nl.example.com/hotel123.html everything goes as I expected.
When I access nl.example.com/ I get a 403 error message.
So all requests except for html¦xml¦txt¦ico¦gif¦png¦jpg must be redirect completely to the main domain.

jdMorgan

9:25 pm on Nov 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are these subdomains hosted in the same 'account' as the main domain, and if so, how is that set up -- Do you have separate subdirectories for each subdomain. And if so, how was the subdomain-to-subdirectory 'mapping' implemented?

Jim

jdMorgan

9:42 pm on Nov 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If I understand what you need, then this is the correct construct. The second rule will not be applied if the first rule is applied, so you can view the second rule as an "else" clause:

# If not specific filetypes, redirect 'hotel' URLs to main domain
RewriteCond %{REQUEST_URI} !\.(xml¦txt¦ico¦gif¦png¦jpg)$
RewriteRule ^hotel([^.]+)\.html$ http://www.example.com/hotel$1.html?Lang=nl [R=301,L]
# Else redirect all others
RewriteCond %{REQUEST_URI} !\.(html¦xml¦txt¦ico¦gif¦png¦jpg)$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Replace all broken pipe "¦" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

[edited by: jdMorgan at 9:46 pm (utc) on Nov. 1, 2007]

KoenG

10:23 pm on Nov 1, 2007 (gmt 0)

10+ Year Member



Hi Jim,

Thanks for the help.
All hits on / work fine now.
But a hit on nl.example.com/sitemap.xml is also forwarded to www.example.com/sitemap.xml

It is my specific intention to have separate sitemaps for every language (== subdomain) as the search engines will visit the hotel pages on the subdomain and will be forwarded to the main domain with the correct language parameter, thus resulting in indexed pages for every language and better SERPs per language.

Everything is hosted at the same provider and the sub domains are subdirectories of the main domain (handled with CPanel).

Maybe it's better with the real world example (with your last suggestions effective):
http://nl.example.com/ goes to the main page of main domain, correct now.
http://nl.example.com/sitemap.xml shows the sitemap of the main domain and not the subdomain (not what I intended)
http://nl.example.com/hotel46265.html goes to the Dutch translation of the request page on the main domain, correct.

[edited by: jdMorgan at 10:34 pm (utc) on Nov. 1, 2007]
[edit reason] example.com [/edit]

jdMorgan

10:35 pm on Nov 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Be sure to completely flush your browser cache before testing any new code.
Also be sure you replaced all the "¦" pipe characters as described above.

Jim

KoenG

10:42 pm on Nov 1, 2007 (gmt 0)

10+ Year Member



Silly me... I overlooked the pipes...
Everything works now!
Thanks a lot!

g1smd

2:12 am on Nov 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am not sure of the validity of supplying a sitemap that simply lists a bunch of URLs in a sub-domain, all of which simply redirect somewhere else when accessed.

What is this trying to achieve? I think that it is a level of complexity that is entirely uneccesary.

KoenG

8:33 am on Nov 12, 2007 (gmt 0)

10+ Year Member



I have a multilingual site. When a visitor arrives, I first do a browser language check and an internal language parameter check. If no internal language parameter is given, I use the browser language (default English) as the language to show results in.

This system works fine, except that GoogleBot has no browser language, thus only 'sees' the English pages when crawling my site.
It is my experience that SERPS for local languages are better when requested from a localized Google site (e.g. google.nl in Dutch).

And that's why I created this 'complicated' system... I serve Google sitemaps on a (localized) sub domain. When GoogleBot visits pages on the sub domain they redirect to the main application domain including the language parameter resulting in localized pages in the indexes.

This system is running for some weeks now with success. So far I have not yet been penalized for duplicate content and the SERPS are better indeed for other languages than English.