Forum Moderators: phranque

Message Too Old, No Replies

Problems with old to new URLs

         

speedyone

7:48 pm on Oct 6, 2009 (gmt 0)

10+ Year Member



Hello,

I have been having some issues with a rewrite for a few days. The issues is we have sites all over the world and the Dev team did the site one way and discovered that it was not working very well for the google crawl. They updated the site to a whole new site and now I have to figure out how to fix this.

This is what we had before

[thesite.com...]

the /us/ was being added to the url via the app (string).

In the us it will change to /en-us/. This is an easy fix. You just add the following and bam done.

RewriteRule ^/us/?(.*) /$1 [R=301,L]

However this is where I have the problem. We also have all these domains.

[thesite.com...] = /de-de/
[thesite.com...] = /en-us/
[thesite.com...] = /fr-fr/
[thesite.com...] = /es-es/
[thesite.com...] = /ru-ru/

So what would happen is as you can tell I would end up in a loop. It will remove the second and then add -**/ at the end.

I have tried to put in an exclude but it is not catching it. I am at an imp-ass.. Any ideas how to do this?

TheMadScientist

10:58 pm on Oct 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1.) It does not look like what you have will works as is, so I suggest the following.

2.) It is always advised to use the full URL when performing an external redirect.

3.) In the httpd.conf file you will need the leading / on the left side of the rule, but in the .htaccess you will not.

# .htaccess ruleset:
RewriteRule ^us/(.*) http://www.example.com/en-us/$1 [R=301,L]
RewriteRule ^([a-z]{2})/(.*) http://www.example.com/$1-$1/$2 [R=301,L]

# httpd.conf ruleset:
RewriteRule ^us/(.*) http://www.example.com/en-us/$1 [R=301,L]
RewriteRule ^([a-z]{2})/(.*) http://www.example.com/$1-$1/$2 [R=301,L]

g1smd

10:59 pm on Oct 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you have a 301 redirect you should include the protocol and domain name in the target of the redirect.

Look at the documentation for RewriteCond. You can then test the URL request and only run the following rule for specific requests.

TheMadScientist

12:04 am on Oct 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Uh, copy and paste is my nemesis sometimes, the following is what I meant previously:

# httpd.conf ruleset:
RewriteRule ^/us/(.*) http://www.example.com/en-us/$1 [R=301,L]
RewriteRule ^/([a-z]{2})/(.*) http://www.example.com/$1-$1/$2 [R=301,L]

TheMadScientist

12:40 am on Oct 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In looking at this one more time, mainly, because I'm bored, I noticed something in the set of URLs I didn't notice previously...

http://www.example.com/en/ = /en-us/

RewriteRule ^(us¦en)/(.*) http://www.example.com/en-us/$2 [R=301,L]

The ¦ means OR and should be a solid bar, not broken, so make sure you edit it if copying and pasting.

I'm also not sure as to why a condition is suggested, since the redirects can be accomplished in two rules, which must match prior to a condition being tested, so it seems to me adding a condition would just add processing and decrease efficiency, but maybe I'm missing something and someone could provide a more efficient example than mine.

speedyone

2:12 am on Oct 7, 2009 (gmt 0)

10+ Year Member



Thanks guys,

Between what you said and some other things I read later on in the day I was able to get this all working how it should. Thanks for the help

g1smd

4:08 pm on Oct 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Post the final code for a check up!

There's sometimes subtle changes that can be made that avoid various 'well known' problems.

speedyone

10:03 pm on Oct 12, 2009 (gmt 0)

10+ Year Member



Sorry I have been really busy and had to move on to other projects. Just saw this and this is what i found to work.

RewriteCond %{REQUEST_URI} !^/en-gb [NC]
RewriteCond %{REQUEST_URI} !^/en-us [NC]
RewriteRule ^/en/?(.*) http://www.example.com/en-gb/ [R=301,L]

RewriteCond %{REQUEST_URI} !^/de-de [NC]
RewriteRule ^/de/?(.*) http://www.example.com/de-de/ [R=301,L]

RewriteCond %{REQUEST_URI} !^/fr-fr [NC]
RewriteRule ^/fr/?(.*) http://www.example.com/fr-fr/ [R=301,L]

RewriteCond %{REQUEST_URI} !^/es-es [NC]
RewriteRule ^/es/?(.*) http://www.example.com/es-es/ [R=301,L]

RewriteCond %{REQUEST_URI} !^/ru-ru [NC]
RewriteRule ^/ru/?(.*) http://www.example.com/ru-ru/ [R=301,L]

g1smd

11:13 pm on Oct 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A possible simplification just based on the 'logic' of your code:

RewriteCond %{REQUEST_URI} !^/en-gb [NC]
RewriteCond %{REQUEST_URI} !^/en-us [NC]
RewriteRule ^/en/? http://www.example.com/en-gb/ [R=301,L]
#
RewriteCond %{REQUEST_URI} !^/$1-$1 [NC]
RewriteCond %{REQUEST_URI} !^/en- [NC]
RewriteRule ^/([a-z]{2})/? http://www.example.com/$1-$1/ [R=301,L]

OR

RewriteCond %{REQUEST_URI} !^/en-gb [NC]
RewriteCond %{REQUEST_URI} !^/en-us [NC]
RewriteRule ^/en/? http://www.example.com/en-gb/ [R=301,L]
#
RewriteCond %{REQUEST_URI} !^/$1-$1 [NC]
RewriteCond %{REQUEST_URI} !^/en- [NC]
RewriteRule ^/(de¦fr¦es¦ru)/? http://www.example.com/$1-$1/ [R=301,L]

There's also no need for the (.*) backreference as you don't re-use that data anywhere.

speedyone

6:52 am on Oct 13, 2009 (gmt 0)

10+ Year Member



Oh yeah.. didn't even think about that. I will apply them in the am when I am back at work. I am all about making things easy.

jdMorgan

12:50 pm on Oct 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> RewriteCond %{REQUEST_URI} !^/$1-$1 [NC]

Unfortunately, this isn't likely to work because you can't use variables on the right side of a RewriteCond. It's one of those things that I really wish *did* work, as there is no way to do variable-to-variable compares inside mod_rewrite except to use atomic back-references (if supported by your regex library) and take advantage of commutativity (e.g. if A+A = A+B, then A=B). In this case, doing that would be more complex than the brute-force string compare, so brute-force is both simpler and more portable from server to server (and/or from version to version).

It seems to me that the rules could still be optimized in a different way to eliminate most or all RewriteConds with a better RewriteRule pattern, but the 'requirements' for the rules have apparently changed since the first post, and I can't tell what they really are -- for example, the path following the language-codes has apparently now been dropped, whereas it was preserved by the earlier code.

Jim

g1smd

12:58 pm on Oct 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Jim!

speedyone

1:17 pm on Oct 13, 2009 (gmt 0)

10+ Year Member



Jim,

You are right. The requirement has changed. With the full rewrite of the site they had decided they didn't want to retain about 50% of the site. With that they decided they wanted most to just end up at the main site (index) of the site. So i dropped the part to hold or retain any of the url that was entered. I want people to get the site not a 404. So far this has been working great for us.

jdMorgan

2:53 pm on Oct 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That approach has some potentially-serious search ranking repercussions. I suggest that you use a 404 page in conformance to the HTTP protocol if you wish to happily co-exist with search engines. Make it pretty, make it helpful, etc., but you should not redirect missing page requests directly to your home page if you care about search ranking. See "duplicate content" and "infinite URL-space" threads in the search engine forums here (esp. the Google forum) for more info.

A 404 error page should briefly (and somewhat apologetically) acknowledge and describe the problem, and then offer text links to your home page, site map, major category pages, and site search facility as applicable. Keep the list of links short to avoid confusion -- no more than seven.

A long-delay on-page meta-refresh from the 404 error page to your home page is acceptable. Do not try to make it fast, as it will be handled as a 302 redirect if you do -- with disastrous results in the SERPs. Instead, follow the suggestion above, and allow plenty of time for the user to read the information presented and to make an informed decision (eight to 15 seconds minimum). If you do use a meta-refresh, the page should say so ('warn' the user that he/she'll be redirected after xx seconds).

Jim