Forum Moderators: phranque

Message Too Old, No Replies

After 6 months of trying i give up and ask for support

         

Volcomstar

5:03 am on Jan 28, 2012 (gmt 0)

10+ Year Member



First of all oh my god! It's 6 months that i'm trying to find a way to make this simple rewrite. Yeah, probably 99% of request in this forum are defined simple by their authors but trust me, it's simple!

Hi, i want to do 2 things.

---------------------
ONE
---------------------
Remove trailing slahs and in the same time consolidate non-www to www. At the moment i use these two functions.

# www consolidation
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301]

# remove trailing slash
RewriteCond %{HTTP_HOST} ^(www.)?example\.com$ [NC]
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]


It works with the exception of the URL below. It's not rewritten properly.

example.com/en/hello == to ==> www.example.com/en/hello

It becomes http://www.example.com/http://www.example.com/en/hello//hello

---------------------
TWO
---------------------
I have a multi-language website like follows:

  • example.com/en
  • example.com/de
  • example.com/fr

    /en /de and /fr of course are all "fake-directories". I have a fully working rules in my htaccess to support this system.

    example.com/en/cats
    example.com/de/cats
    example.com/fr/cats
    example.com/en/dogs
    example.com/de/dogs
    example.com/fr/dogs

    And so on... the only problem is WITH THE HOME PAGE! The home page should be example.com/en but i'm forced to use example.com/en/index because i really don't understand how to force a "fake-directory" to open a file without showing it in the URL.

    RewriteRule ^/en$ /en/index [L]

    Why it gives me 404 error? :(
  • lucy24

    6:17 am on Jan 28, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    ... and that's why [R=301] must always be accompanied by [L]. It may seem intuitively obvious that when you redirect, the new URL gets kicked outside to start all over again. But in fact it doesn't; it carries on through mod_rewrite, possibly being exposed to further rewrites or redirects.
    # www consolidation
    RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
    RewriteRule ^(.*)$ http://www.example.com/$1 [R=301]

    # remove trailing slash
    RewriteCond %{HTTP_HOST} ^(www.)?example\.com$ [NC]
    RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]

    OK, stop right there. You rewrote to with-www in the first rule, so why are you suddenly including (www.)? in the second rule? There should no longer be any without-www forms. (Incidentally the literal period should be escaped-- but if there is anything other than a period in that location, it would never reach your domain anyway.)

    In fact the rewrite should cover all possibilities:

    %{HTTP_HOST} !(^www\.example\.com$)?

    meaning "exactly www.example.com or exactly nothing" (the "or nothing" is something involving http 1.0 that you can look up if you want to) to make sure you also intercept weird requests containing a port number. And this should be your very last redirect, to pick up only those requests that have not already been redirected for other reasons.

    It works with the exception of the URL below. It's not rewritten properly.

    example.com/en/hello == to ==> www.example.com/en/hello

    It becomes http://www.example.com/http://www.example.com/en/hello//hello

    That isn't the only exception. It's just the only one you happened to find while testing. By rule #1, example.com/en/hello becomes www.example.com/en/hello. But it doesn't leave your htaccess yet; it continues through all your other Rules. Rule#2 doesn't apply, since there is no trailing slash. But anything could be lurking in the rest of the Rules.

    RewriteRule ^/en$ /en/index [L]

    Why it gives me 404 error?

    Whew. That's the easy one. Because there is no such page as www.example.com//en --and that's what your Pattern looks for. Leave off the leading slash.

    But why are you rewriting to a filename ending in /index without extension ? That's not a "real" page, so you're going to have to rewrite and/or redirect all over again in order to end up at something that can serve content.

    Incidentally, ^(.*)$ is always redundant. By default, Regular Expressions start as soon as they can and continue as long as they can. The opening and/or closing anchors are only necessary when you are matching some specific text in the beginning or ending location.

    g1smd

    7:53 am on Jan 28, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    %{HTTP_HOST} !(^www\.example\.com$)?

    should be

    %{HTTP_HOST} !^(www\.example\.com)?$

    The "remove slash" redirect must be listed first otherwise a non-www request with slash will generate an unwanted multiple step redirection chain.

    lucy24

    8:28 am on Jan 28, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Thanks. It didn't look right when I typed it but I couldn't figure out what was in the wrong place.

    Volcomstar

    2:25 pm on Jan 28, 2012 (gmt 0)

    10+ Year Member



    Thank your for your clear explanation. Basically i'm rewriting an already rewroted rule so i changed them and now it's working. I also used the L flag properly. Now my rules are better and shorter.

    About problem n°.2 i still don't get the point. The home should be example.com/index.php but i rewrote it like follows:

    1) example.com/index.php == to ==> example.com/index
    2) example.com/index == to ==> example.com/en/index
    3) example.com/en/index == to ==> example.com/en

    1 and 2 are fine. I fail at 3 because of 404. I think that i've understood what you said. I should use a more specific rule instead of ^(.*)$ but how?

    lucy24

    11:45 pm on Jan 28, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    There are two different things. The URL that the human user sees, and the path to where your page really lives. They might be the same (if you have a static site with no rewrites) or they might be entirely different.

    The "pattern" side of RewriteRule sees only the path. Not the domain (HOST). Not the query. If your page is called

    www.example.com/directory/pagename.php
    or
    example.com/directory/pagename.php
    or
    www.example.com:8080/directory/pagename.php
    or
    www.example.com/directory/pagename.php?q=foo&x=bar&y=nono
    and so on

    RewriteRule sees

    directory/pagename\.php
    or, with anchors,
    ^directory/pagename\.php$

    Now, the other way around. If your RewriteRule says

    directory/pagename\.php

    (without anchors) it will work for

    www.example.com/directory/pagename.php
    and
    www.example.com/directory/pagename.php2
    and
    www.example.com/directory2/directory/pagename.php
    and
    www.example.com/directory2/directory3/directory/pagename.php
    and
    www.example.com/directory123/directory/pagename.php?q=foo&x=bar&y=nono
    and so on.

    Now watch! If your RewriteRule says

    /directory/

    it will NOT match for

    www.example.com/directory/

    because the first / slash after the domain name (HOST) is not seen by the RewriteRule. A rule that says

    /directory/

    will only work for

    www.example.com/directory2{/optional more things here}/directory/
    ______

    But the real problem is that you are using the word "rewrite" in a way that sounds as if you really mean "redirect". Or you are doing it backward.

    I assume you want to do this:

    User types or clicks
    www.example.com
    or
    www.example.com/
    (the slash after the domain name does not matter if there is nothing after it).

    You give them the content of real page
    www.example.com/index.php
    OR real page
    www.example.com/en/index.php

    Is english your "default" language? If so, you first need to REDIRECT from

    www.example.com/en(/(index\.php)?)?$
    to
    www.example.com/

    and then REWRITE to the page that has the real content.
    ____

    When you are working with mod_rewrite it is often helpful to say in plain English what you want to do. What do you want the user to see? Where does your page really live? After you have figured out what you want to do, then you can make the Rules.

    g1smd

    1:05 am on Jan 29, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    The plain English account also should avoid words like "goes to" as this is ambiguous/meaningless.

    Describe the problem using the requested URL with hostname and the target URL with hostname when talking about an external redirect.

    Describe the problem using the requested URL with hostname and the internal folder path/filename (without hostname) when talking about an internal rewrite.

    Volcomstar

    1:51 am on Jan 29, 2012 (gmt 0)

    10+ Year Member



    Thank you again for your explanation. Yes, i forgot to mention that first of all i use redirect so that:

    example.com == redirects to ==> example.com/en/index

    I use EN as default language. Now the only problem is the *damned* :D "index". I must remove and make it accessible through example/en because of search engines.

    On Google Webmaster Tools i have:

  • example.com/en/index
  • example.com/fr/index
  • example.com/de/index

    And it's a total mess because for Google and all other search engines the root of my website is example.com/en/index. They stop crawling my website because all other pages are on the same level. I mean...

    example.com/en/index
    example.com/en/cats
    example.com/en/dogs
    example.com/en/silvioberlusconi

    For Google it should be:

    example.com/en/index
    example.com/en/index/cats
    example.com/en/index/dogs
    example.com/en/index/silvioberlusconi

    That's why i really need to remove the index :( I hope that it's clear
  • lucy24

    9:16 am on Jan 29, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    I don't think the "root" is the problem. I think the problem is that your redirects and rewrites are all tangled up, so google isn't following your links.

    That naked /index should never be there at all. You need to do two things. STOP redirecting to it, and make sure that nothing on your site links to anything called "index". (Unless you have a directory called /index/ but that would be a completely different issue.)

    Search-engine robots and humans have one thing in common. They can't see through rewrites. Google doesn't "know" that

    example.com
    and
    example.com/cats

    are "really "

    example.com/en/index.php
    and
    example.com/en/cats.php

    unless you have linked and/or redirected that way.

    If english is your site's default language, neither humans nor google should ever see /en/ at all. It's different if you start with a splash screen where users have to pick a language before they can go anywhere else in the site. Then you might have matching "branches" for all languages, with no default. But it doesn't sound as if you have that kind of design.

    So...

    #1 Decide what you want people to see in their address bar at every point.
    #2 Fix all your internal links so they match these URLs-- even if you know that it isn't the "real" location of the page.
    #3 Use mod_rewrite to redirect anyone who comes to a page using the wrong name-- again, even if what they asked for is the "real" address of the page.
    #4 Use mod_rewrite to rewrite all those visible names to the "real" location of the material.

    g1smd

    9:40 am on Jan 29, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    By '"real" location' in the above, read that as meaning the actual internal server filepath, not what might be shown in the browser's address bar.

    I find the term "real location" to be ambiguous. There are two "real locations". One of those locations is defined by the URL used to access the information. This is the location "on the web". The other is the internal filepath "inside the server", which may or may not be directly exposed as a URL back out on to the web.

    londrum

    10:29 am on Jan 29, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    maybe you could try a php solution? you just need to add a little snippet of code to the top of all your pages

    this is what i use to redirect
    www.example.com/index.html
    to
    www.example.com/

    it should be pretty easy to rewrite the function to remove any mention of index/ from the URL

    function redirect_index_url() {

    if ( preg_match('#(.*)index\.(html|php)$#', $_SERVER['REQUEST_URI'], $captures ) ) {

    header('HTTP/1.1 301 Moved Permanently');

    header('Location: '.$captures[1]); } }

    redirect_index_url();

    lucy24

    12:15 pm on Jan 29, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    There are two "real locations".

    Three, at least. The third is the one that goes server/stuff/morestuff/userspace/blahblah/otherstuff/domain/ It shows up sometimes in your error logs. Or, worst case, in your browser's address bar if you said something to make mod_rewrite mad :(

    And that's not even getting into the real, real location which is something like "this splodge of electromagnetic charges on a chunk of silicon". Ugh.

    In any case, a name ending in "/index" is neither fish nor fowl. It's got to go.

    Volcomstar

    5:57 pm on Jan 30, 2012 (gmt 0)

    10+ Year Member



    Ouch nothing to do. I tryed tens of rewrite rules with no success. Let's start from the basis:

    We know that example.com/en/index is fully working. Now we want to hide the "index" and make it accessible from example.com/en. What's the rule?

    RewriteRule ^en$ /en/index [L]


    The result is a 404. Please note that i also tryed with the extension and removing all related rules:

    RewriteRule ^en$ /en/index.php [L]


    And also with:

    RewriteRule ^en$ /index.php [L]


    Same result: 404. At this point i don't get the point. Why is it happening and how can i rewrite this URL?

    g1smd

    6:06 pm on Jan 30, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    The left hand side of the rule should match the path part of the requested URL - what the user typed into the browser address bar, or what is in the href part of the link the user clicked on.

    The right hand side of the rule should state the physical path and filename that actually contains the content. This location will be stated without any hostname; it is an internal location.

    Volcomstar

    6:50 pm on Jan 30, 2012 (gmt 0)

    10+ Year Member



    Uhm ye now it's more clear. So it should be:

    RewriteRule ^/en$ /index.php [L]


    But the result is still a 404. At this point i don't know what to say. I'm trying with an empty htaccess. There's only the above rule. Any workaround or mod rewrite checker?

    g1smd

    6:52 pm on Jan 30, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Did you add
    RewriteEngine On
    before it?

    Does requesting
    example.com/index.php
    show the content?

    Your code above would probably work if it were located in the httpd.conf file but it will not work in htaccess without a small but very important change. You need to remove the leading slash from your RegEx pattern. Path information is "localised" on a "per-directory" basis before being presented to mod_rewrite.

    lucy24

    11:09 pm on Jan 30, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    For testing purposes, change the plain [L] to [R=301,L]. When you get the 404, what does your browser's address bar say? The object here is to find out if the rewrite is happening; a quick-and-easy way to test is to make it into a redirect.

    g1smd

    11:12 pm on Jan 30, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    That is... temporarily make it into a redirect for testing purposes.

    Volcomstar

    5:54 pm on Jan 31, 2012 (gmt 0)

    10+ Year Member



    Finally! It works!

    On my website there's cache control and i also cache all pages compressed. Today i noticed that my cache directory was full with thousand of unknown files. For sure there's a problem with my cache-script. I removed all files and disabled the script and now it works.

    I want to say you all a big THANK YOU. You have been really helpful with your detailed explanations. Great great great! :D

    g1smd

    9:29 pm on Jan 31, 2012 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Make sure you test URLs that should work as well as URLs that should not work. Make sure that all requests result in the correct response.

    Volcomstar

    9:42 pm on Jan 31, 2012 (gmt 0)

    10+ Year Member



    Yep :) this night i'm going to add some 301
    This forum is cool xD