Forum Moderators: phranque

Message Too Old, No Replies

How to use mod rewrite

I read the apache help section... still confused

         

TimGlad

5:16 pm on Dec 11, 2010 (gmt 0)

10+ Year Member



Simple question I think!

My Website is run via AJAX client-side and PHP/MySql server-side.
So once you go to the index.php page you tecnically never go anywhere else on the website.

So what RewriteRule would get me cool urls like:

example.com/section/page/1

To always go through index.php and show the above but be redirected to:

example.com/index.php?section=whatever&page=whatever&,,,etc.

But if user used example.com it would still go to:

example.com/index.php just no query string or even script name?

g1smd

6:31 pm on Dec 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is important to understand what a rewrite actually does.

It takes an incoming URL request and then fetches the content from a different path inside the server.

You decide what URLs you want by including those in the links on your pages. You can make the URL format whatever you want it to be. It is links that "define" URLs.

The pattern in the RewriteRule then needs to match that incoming URL request and extract the elements to be used in the internal path part of the internal request.

So a pattern and rule like
RewriteRule ^([0-9]+)/([0-9]+)$ /index.php?section=$1&page=$2 [L]

would match a URL request like
example.com/2345/123244
and fetch the content from
/index.php?section=2345&page=123244
inside the server.

That example is for page URLs consisting of a number of digits, followed by a slash, followed by a number of digits. You can make the format whatever you want, and you adjust the RegEx pattern to suit.

In order for the whole thing to work, you also need to have the index.php script return a page of content when page and section numbers are missing, i.e. the root URL request. It also needs to return a 404 header when a non-valid combination of section and page numbers is requested.

TimGlad

8:28 pm on Dec 11, 2010 (gmt 0)

10+ Year Member



So if I get what your saying:

RewriteRule ^Blog/WebHosting/([0-9]+)$ /index.php?section=$1&title=$2&page=$3 [L]

Would send: example.com/Blog/WebHosting/1

(To or as) on the server: index.php?section=$1&title=$2 &page=$3

but the end-user would only see example.com/Blog/WebHosting/1 in their browser address. Or am I missing some regexp for string literals? Could I use ^\w+/\w+/([0-9]+)$ which in regexp means word/word/integers if I am not mistaken?

And I could:

<?php
$section = $_REQUEST['section'];
$title = $_REQUEST['title'];
$page = $_REQUEST['page'];
?>

as normal?

And hyperlinks could be written?:

<a href="example.com/Blog/WebHosting/1">WebHosting Page 1</a>

TimGlad

9:47 pm on Dec 11, 2010 (gmt 0)

10+ Year Member



Well, to answer my own reply :)

RewriteRule ^([a-zA-Z]+)/([a-zA-Z]+)/([0-9]+)$ /index.php?section=$1&title=$2&page=$3 [L]

Actually worked!


As far as 404 errors, I'm assuming you mean if some user trys to play around with the url. I already have a custom 404 page with a link back to the index page. and if what they do does match the pattern but not a page in the database I can redirect with php to the 404 page. Is that what you meant?

Thanks g1smd

g1smd

6:53 pm on Dec 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, if user requests a non-valid URL that is not mapped to the script, your normal 404 handling will cope with it.

If the user requests a non-valid URL that is mapped to the index.php script the script needs to notice that and directly send the correct 404 HTTP header itself.

TimGlad

6:49 pm on Dec 13, 2010 (gmt 0)

10+ Year Member



If pattern does match but page not in the DB the script sends them to a default page. On a static Website this would be the "home" page!

jdMorgan

1:40 am on Dec 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



... and that would be "SEO suicide." Return proper "HTTP signalling" -- a 404 in this case, or kiss your search rankings goodbye... Returning any 200-OK response when a 404 is called for essentially makes the URL-space on your server infinite. And search engines do not respond well to that at all. They test for this flaw, and will arbitrarily limit the number of URLs they are willing to crawl on your site if they find that your server returns incorrect HTTP response codes.

Your script must return a proper 404 error response, and not redirect to anywhere.

Jim

TimGlad

12:03 pm on Dec 16, 2010 (gmt 0)

10+ Year Member



and that would be "SEO suicide." Return proper "HTTP signalling" -- a 404 in this case, or kiss your search rankings goodbye... Returning any 200-OK response when a 404 is called for essentially makes the URL-space on your server infinite. And search engines do not respond well to that at all. They test for this flaw, and will arbitrarily limit the number of URLs they are willing to crawl on your site if they find that your server returns incorrect HTTP response codes.

Your script must return a proper 404 error response, and not redirect to anywhere.

Jim


I don't think I get what you mean. If the url example.com/main/title/1/somethingelse

Goes to a custom 404 page because that page doesn't exist on the server is exactly what should happen!

if the site has changed and an old bookmark that a human visitor might have kept matches the rewrite pattern but that content no longer exists it is proper to send them to a starting point in the website!
Since robots (good ones) follow the sitemap.xml or the links that are on in the dynamically created pages they will always find the proper content to index.

I don't understand what you mean by...

Your script must return a proper 404 error response, and not redirect to anywhere.


Exactly what is a "Proper 404 Response"? The crappy default 404 error that apache sends back to the browser by default? Which doesn't really have an explanation as to why they got it or away back except via the browser back button!

g1smd

12:59 pm on Dec 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is all in this wording:

If pattern does match but page not in the DB the script sends them to a default page.

How EXACTLY does it "send" a user to the default page?

On a static Website this would be the "home" page!

Using the home page content when there is an error is confusing to visitors and searchengines alike. It should be avoided.

jdMorgan

4:49 pm on Dec 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Exactly what is a "Proper 404 Response"?

Your script, upon detecting parameters which are missing or incorrect and deciding that it cannot serve a 'real page of content' in response to that request, must output an HTTP status response header of "404-Not Found" along with any response-body (i.e. page of content) that you wish the user to see. The status header is received and interpreted by HTTP clients, regardless of the HTML content sent along with it.

This status response must be directly-output by your script, or by another script included by your main script, or by "echoing" content from a static file. What must be avoided at all costs is the invocation of any kind of URL redirect to an error page, as the result of this would be first a 301 or 302 redirect status response, followed by a 200-OK response as the redirected-to page is then requested and served.

With that, the client sees that *any* incorrect extensionless URL requested from your site returns a 30x redirect and then a 200-OK, so therefore, the URL-space of your site will be seen as 'infinite' -- Search engines will arbitrarily limit the depth and frequency of their crawls, and your rankings may suffer. Plus, smart competitors who notice this vulnerability may exploit it to further sink your ship.

So the bottom line is that your content-handler, whether it is your script or Apache's own handler, must correctly handle all contingencies once it is invoked. And it must do so in strict compliance with the HTTP protocol specification -- Response codes have specific meanings, and are interpreted by clients in specific ways. So, when the client is a search engine, extra care is needed.

Jim

TimGlad

1:32 am on Dec 20, 2010 (gmt 0)

10+ Year Member



Ah! I think I haven't been perfectly clear.

My rule:
RewriteRule ^([a-zA-Z]+)/([a-zA-Z]+)/([0-9]+)$ /index.php?section=$1&title=$2&page=$3 [L]

Works fine. My script uses PHP's global $_REQUEST and $SERVER variable arrays to retrieve and parse the URL.

The first thing my script does is using PHP's global $_SERVER array of variables, is to check if the url has a query string, if it doesn't
$req_uri = $_SERVER['REQUEST_URI'];
in other words if someone navigated to the index.php page the variable $req_uri would be "/" so the script creates the "home" page which in the case of my Website could also be reached with the URL
example.com/main/WebHosting/1
where "main" is parsed as the "section" and "Webhosting" is parsed as the "title" and "1" is parsed as the page number.

When it comes to the section in the script to output the content section of the script in goes to the database and retrieves the data in the main table titled Webhosting with the page number of 1 and ouputs it in the correct place.

Since the CMS has set the WebHosting page to be the "default" page. The script serves up this content if the user navigates to the index page or through the /main/WebHosting/1 query!

What I meant by
"if url matches pattern" but page doesn't exist in the database
is simply if someone typed in example.com/main/web/1 the script will parse that and check the DB and find that "web" is not in the database as a title and the script will create the default page because the pattern did match but the content wasn't in the DB.

Thus if the Website changes but a human has created a bookmark based upon a URL that no longer exists he doesn't get an 404 error just because his bookmark is out of date.

However, if someone types in a query that doesn't match the pattern i.e. example.com/main/WebHosting/somethingelse/1 the server redirects him to error.htm which has a link in it back to the index.php page.

By the way, I checked and the server does send the proper 404 error message back to the client before it outputs error.htm. The access log doesn't list the error.htm as output via a "200" code or any "301" or "302, it is part of the server config file
ErrorDocument 404 /error.htm


That's why I got confused by the comment that this was
"SEO Suicide"
, Google actually recommends that you have a custom error page with a link back to the index page so that users aren't confronted with the default 404 error message that the server will output without a link back to the website!

My script doesn't have to output a 404 header because the server does it!
If a user attempts to navigate by rewriting the query section of the URL!
Now if it is true that robots might try to test a Website by putting in a query that doesn't exist than it will get the 404 error page just like anyone else and it can then navigate back through the proper link provided! Although I have reviewed my access logs a lot to design the analytic portion of my CMS ,I have never seen a robot (good or bad) create a false url to test my Website!

Of course up to now I haven't used the cool rewrite rules to create user friendly URL's

g1smd

9:11 am on Dec 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



However, if someone types in a query that doesn't match the pattern i.e. example.com/main/WebHosting/somethingelse/1 the server redirects him to error.htm which has a link in it back to the index.php page.

The use of the word "redirect" here is problematical. The server action must NOT be a redirect. A redirect tells the browser to request a new URL in a new HTTP transaction, and does so by sending a 301, 302 or 307 code to the browser.

When it comes to the section in the script to output the content section of the script in goes to the database and retrieves the data in the main table titled Webhosting with the page number of 1 and outputs it in the correct place.

If there is no matching entry in the database, the script should immediately send the correct 404 HTTP headers. After that, you can send whatever human-readable content you like. The key is that the PHP script generates this error for non-valid requests that get as far as asking for content from the database, but which cannot be served.