Forum Moderators: phranque

Message Too Old, No Replies

Modrewrite and 301 redirects for dynamic pages

Using mod rewrite and 301 redirects for dynamic pages

         

Mulith

2:08 pm on Apr 4, 2010 (gmt 0)

10+ Year Member



Hi all,

Seen a few similar threads but can't quite find one that matches my situation.

We currently use mod rewrite as follows:
RewriteCond %{REQUEST_URI} !Search_Results.php
RewriteRule ^([0-9]+)/([0-9]+)/([0-9]+)/([0-9]+)/([0-9]+)/([0-9]+)/?$ Search_Results_Mod.php?county_id=$1&price=$2&sector=$3&profit=$4&sub_sector=$5&revenue=$6 [L]

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Browse/([^.]+)/Counties/([^.]+)/([^.]+)\.php$ Search_Results_Mod.php?pageno=$1&county_id=$2 [L]

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Browse/([^.]+)/Regions/([^.]+)/([^.]+)\.php$ Search_Results_Mod.php?pageno=$1&regions=$2 [L]

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Browse/([^.]+)/Sectors/([^.]+)/([^.]+)\.php$ Search_Results_Mod.php?pageno=$1&sector=$2 [L]

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Browse/([^.]+)/Sub_Sectors/([^.]+)/([^.]+)\.php$ Search_Results_Mod.php?pageno=$1&sub_sector=$2 [L]

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Browse/([^.]+)/Categories/([^.]+)/([^.]+)\.php$ Search_Results_Mod_Cat.php?pageno=$1&Choice0Value=$2&count=1 [L]

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Browse/([^.]+)/County/([^.]+)/Category/([^.]+)/([^.]+)\.php$ Search_Results_Mod.php?pageno=$1&county_id=$2&Choice0Value=$3&count=1 [L]


As you've probably already figured out this creates static pages from a dynamic page, so for example:
/Browse/1/Categories/5-4-2/Bed_and_Breakfasts_for_Sale.php is constructed using elements from our database like "/1/" which refers to the first page of results and "/5-4-2/" which stands for sector 5, sub sector 4 within the sector and category 2 within that sub sector. For reference purposes our fifth sector is Leisure and the fourth sub sector within Leisure is "Hotels and Holiday Accomodation".

You'll also see that another rewrite rule exists above that adds in "/County/*/" when referring to a geographical location in the UK.

Now that you have an idea of how our site works here is what we are hoping to achieve:

We want to simplify the URL to something like "/buy/Bed_and_Breakfasts_For_Sale.php"

When displaying listings in a certain location, like England, the url will look like this:
"/buy/Bed_and_Breakfasts_For_Sale_in_England.php"

So basically we add the location on the end.

Yes we could change the rewrite rules and wait to have the pages reindexed but quite frankely we can't afford to wait another 3 months for them to all be reindexed.

Is there a 301 redirect rule that we can use that would catch all of our old dynamically created URLs and direct then to the new URL format?

Many thanks in advance.

jdMorgan

3:19 pm on Apr 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> As you've probably already figured out this creates static pages from a dynamic page, so for example:

To clarify further discussion, no it most certainly does not do that. This code detects client requests for 'static-looking' URLs, and rewrites those requests to a script filepath inside the server so that content can be generated and served for those URLs.

Any other interpretation of this function is likely to cause confusion and trouble. It is critical to understand the URL-to-filepath translation (as used here) and URL-to-URL redirect functions of mod_rewrite before embarking on a complex project such as you describe here. Confusion between the meanings and significances of URLs and filepaths --which are not at all the same things-- usually leads to difficulty and (often-costly) errors.

URLs are defined and created by their publication on HTML Web pages, and exist as soon as they are published. URLs have no meaning inside a Web server.

Files and filepaths are created and defined inside a server, and have no utility out on the Web.

The most basic function of a Web server is to translate or "associate" client requested URLs to internal server filepaths, and this association does not exist without the server.

URLs and filepaths are two entirely-different and separate "content-location" methods, with different scopes and meanings.

A redirect creates an HTTP response for a given URL-request that terminates the current HTTP transaction by sending a response to the client that says, "The content you asked for has moved. Please ask for it again at this new URL." The client *may* then re-request the desired content using the new URL provided in the server's prior redirect response.

In contrast, an internal rewrite simply accepts a client-requested URL, and modifies the server's default URL-to-filepath mapping so that the requested content is served from a different filepath than would normally be used. This function occurs entirely within the context of the current HTTP transaction, and --if properly implemented-- the client is completely unaware that it is occurring.

Hopefully, that will clarify things a bit and make mod_rewrite's functions easier to understand.

The rules you posted are "non-optimal" and you'll likely see a performance improvement by correcting the patterns so that the "negative-match look-ahead" sub-patterns are correct. Also, you've got un-anchored patterns and missing character-escaping in the RewriteConds. However, the "RewriteCond %{REQUEST_URI} !Search_Results_Mod.php" appears to be utterly un-necessary, since no rewriterule pattern matches "Search_Results_Mod.php" and therefore, the RewriteConds won't even be processed for requests for that URL-path, so this latter problem is moot.

So for your second rule above, for example, I'd suggest:

RewriteRule ^Browse/([^/]+)/Counties/([^/]+)/([^.]+)\.php$ Search_Results_Mod.php?pageno=$1&county_id=$2 [L]

and the RewriteCond is not needed at all.

Since you are planning to re-do your URLs at this point, let me suggest that now would be an excellent time to 'go extensionless' with your URLs, as that trailing ".php" bit on every URL only costs you bandwidth and 'exposes' the site's internal technology to clients, neither of which are beneficial. In short, the .php on the end of URL-paths accomplishes nothing and can be eliminated.

To your main question: A more complex aspect of the project at hand is that by removing the "/1/Categories/5-4-2/" URL-path-part, you essentially remove the capability of your script to look up anything in your database -- It can only work with the information provided in the client-requested URL, as passed from your mod_rewrite rules in the name/value pairs in the substitution filepaths.

Therefore, in addition to the suggestions above, you will also need to modify your script so that it no longer uses the /1 and /5-4-2 URL-path-parts as database lookup keys, but instead does the lookup based on the "page name" itself. This rather changes the focus of your question, so you may wish to consider all of the above, and then re-post additional questions.

Jim

Mulith

5:38 pm on Apr 4, 2010 (gmt 0)

10+ Year Member



Thank you for the clarification.

My knowledge of mod rewrite is pretty poor but learning as I go. Removing the extension is definately needed, I agree.

With regards to removing the "Browse/1/Categories/5-4-2/" part: I know it's possible to do it without all the numbers and use strings instead. My competitor uses /uk/search/Guest-Houses-and-Bed-and-Breakfast-Businesses-for-sale.

I think they use the text after the last "/" and before "-Businesses-for-sale" to search for a matching category(after substituting the dashes for spaces) in the database and then display listings from that category. Anything after -Businesses-for-sale- is interpreted as the location with UK being default. And finally if the page number is greater than 1 then a dash and the page number are displayed like /uk/search/Guest-Houses-and-Bed-and-Breakfast-Businesses-for-sale-2

We currently get the page number using this function:
function GetPageNo() 
{
$location = $_SERVER['REQUEST_URI'];
$firstpart = "/Browse/";
$location = str_replace("/Browse/", "", $location);
$pos = strpos($location, "/");
$len = strlen($location);
$lastpart = substr($location, $pos, $len);

$PageNumber = substr($location, 0, $pos);

return $PageNumber;
}


The other variables are extracted in a similar way. We would be aiming to get the name of the category from the URL itself ans display listings that match.

Surely we wouldn't need to do an individual 301 redirect for each of the pages indexed?

g1smd

7:33 pm on Apr 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you change the site to use new URLs, you must redirect requests for old URLs to the new URLs, to preserve the traffic from incoming links and to force searchengines to update their database.

You'll also need to redirect any direct access attempts requesting the index.php script using URL requests with parameters to instead use the correct URL for that content.

These redirects will likely need a database lookup handled by the script because .htaccess has no way to associate parameters with names.

jdMorgan

1:48 am on Apr 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Surely we wouldn't need to do an individual 301 redirect for each of the pages indexed?

No. Given your example, all requested URL-paths beginning with "Browse/" could easily be rewritten to your script. Look into the regular expressions pattern-matching used by mod_rewrite -- learning its capabilities will likely help you to 'architect' your new URL-plan.

Also, be aware that URL-paths on the Web and Apache are case-sensitive -- domains are not, but URL-paths are. For this reason, you may wish to make all of the non-page-title parts of your URLs (e.g. the "Browse/Categories/Regions/Counties/Sectors" parts) all-lowercase (you could also make them all-uppercase, but that would look really ugly). However, this allows a simple conversion of mixed-case to all-lowercase when a URL is mis-typed, while it would require a database lookup to translate from a mis-cased and mixed-case URL to a correctly-cased mixed-case URL.

One reason that this this is important is that for any given 'page' of content on your site, one and only one URL should be able to access it directly: All other variations of protocol, hostname, port number, and URL-path should be 301-redirected to that single canonical URL. If this is not done, the multiple URLs will essentially 'compete' with each other for ranking and links -- which is the same thing as voluntarily encouraging the same number of 'real' competitors on the Web...

Jim

Mulith

10:45 am on Apr 6, 2010 (gmt 0)

10+ Year Member



Ok, read up a bit more on mod_rewite and I think I understand a bit better now.

I can get my site to output the new paths like:

/buy/Pubs-For-Sale
or /buy/Pubs-For-Sale-in-England

If there wasn't a rewrite rule in place at the moment I'd try something like:



RewriteRule ^Buy/([^.]+)-For-Sale([^.]+)([^.]+)$ Search_Results_Mod.php?title=$1&location=$2&pageno=$3&count=1 [L,R=301]



But not sure how to do it based on the already rewritten rule. The variables that need to be extracted from the current path are highlighted in bold:

/Browse/1/Categories/5-2-6/Pubs_for_Sale.php

The "1" is the page number and ideally I'd like to only display it if greater than 1. The "Pubs" string will match the name of a category in our database. The path uses UK as default location but i'd also like to be able to use locations like:

/buy/Pubs-For-Sale-in-London

There are only about 10 to 15 pages that have external links to them or have PR, maybe it would be easier to put static 301 redirect in place for each one of them and then wait for google to index the other 13000 pages again.

jdMorgan

11:51 am on Apr 6, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Again, internally rewrite (do not redirect) all of the old "/Browse/1" URLs to your script, and internally rewrite all new "/buy/Pubs-For-Sale" URLs to your script as well. The script has access to your database and can look up the new "text-based" URLs when given an old "id-number-based" based URLs. The script can then generate a 301 redirect from old to new URLs. This function is not strictly necessary, but will speed up re-indexing of the site, preserve legacy page-rank and link-pop, and allow old links and bookmarks to continue working.

If the requested URL is already in the new "text-based" format, then the script just serves the appropriate page content.

To preserve your sanity, keep in mind that you've got two sets of URLs -- new and old, and only one file -- your script.

Jim

Mulith

12:13 pm on May 11, 2010 (gmt 0)

10+ Year Member



Hi Jim,

I've managed to do the development work on my site so that it recognises the new URL structure and retrieves the correct information for the user. I have implemented the following rewrite rule to handle the new urls:

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Buy/([^.]+)\-For\-Sale\-in\-([^.]+)\-([0-9]+)$ Search_Results.php?bus_type=$1&location_page=$2&pageno=$3 [L]

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Buy/([^.]+)\-For\-Sale\-in\-([^.]+)$ Search_Results.php?bus_type=$1&location_page=$2 [L]


I would like to rewrite requests for the old urls to the new ones and I thought of something like this:

RewriteRule ^Browse/1/Sectors/([^.]+)/([^.]+)_Businesses_For_Sale\.php$ Buy/$2\-Businesses\-For\-Sale\-in\-the\-United-Kingdom

RewriteRule ^Browse/([^.1]+)/Sectors/([^.]+)/([^.]+)_Businesses_For_Sale\.php$ Buy/$2\-Businesses\-For\-Sale\-in\-the\-United\-Kingdom\-$1


The problem is that the old urls use underscores and the new ones use dashes. So for example, a request for the page /Browse/1/Sectors/11/Tech_and_Media_Business_For_Sale.php would make the $3 variable equal to "Tech_and_Media_" instead of "Tech-and-Media".

Any ideas on how to integrate this?

jdMorgan

1:37 pm on May 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As I've posted several times, I suggest that you rewrite all requests to your script (or to *a* script), and let that script do the heavy lifting for URL-redirection. Apache mod_rewrite is not a scripting language, so support for database lookups, case-conversion, and character-conversion (e.g. underscores to hyphens) is poor to non-existent (poor in config files, mostly non-existent in .htaccess). A function that requires recursion (completely restarting all processing from the top) and is horribly slow in mod_rewrite can be done simply and quickly with a single 'preg_replace' in PHP.

While I'm a big fan of mod_rewrite, it is simply not the right tool for your job here. It is a meat-cleaver when what is needed is a scalpel.

Unrelated note: It is not necessary to escape any characters in your substitution paths except for literal "$" and "%" characters.

Jim

Mulith

11:53 am on May 13, 2010 (gmt 0)

10+ Year Member



So do you mean something like this:

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule .* rewrite.php [L]

If this is what you mean then I've created a rewite script but not sure how exactly it would work.

How would I pass the output of the script back to the rewrite engine?

g1smd

12:48 pm on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not quite. Don't rewrite "everything" to the script.

Only rewrite URLs requests that the script is going to handle.

Internally rewrite (do not redirect) all of the old "/Browse/1" URLs to your script, and internally rewrite all new "/buy/Pubs-For-Sale" URLs to your script as well. The script has access to your database and can look up the new "text-based" URLs when given an old "id-number-based" based URLs. The script can then generate a 301 redirect from old to new URLs.

jdMorgan

1:44 pm on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, rewrite all "page" URL requests to the script. There is no need to rewrite requests for images, css, javascript, multimedia, or document-format files (exclude those from being rewritten by using a negative-match RewriteCond *before* doing any file- or directory-exists checking).

In the script:

Validate the requested URL by checking the database for an *exact* match on either the old or new format.

If the requested URL is a valid new URL {
. serve the requested content.
. exit
}

If the requested URL is a valid old URL {
. generate a 301-Moved Permanently redirect to the new URL listed in the database.
. exit
}

Else if the requested URL is invalid {
. do a search of the database for URLs that 'resemble' it -- such things as casing errors, hyphens interchanged with underscores or spaces, extra punctuation at the end (from malformed forum posts, for example), and truncation should be ignored (in this order or similar), looking for a 'best match.'

. If a best match is found {
... generate a 301-Moved Permanently redirect to the correct URL.
. }
. Else if no match found {
... generate a 404-Not Found error response.
. }
. exit
}

Note that the leading periods on the lines above are just used to force proper indentation on this forum.

The 'best match' searching mentioned above is a good example of the improved functions that using a script makes possible. Because mod_rewrite itself has no database access, and because it has no way to do a "best match" search of any kind, it cannot offer these opportunities to 'recover' requests for URLs with errors in them, and thereby make a sale instead of serving a 404-Not Found.

Jim

Mulith

7:25 am on May 14, 2010 (gmt 0)

10+ Year Member



Ok, so would I need to create two rules like this:

RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^/Browse/([^.]+)\.php$ rewrite.php?request=$1 [L]

RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^/Buy/([^.]+)$ rewrite.php?request=$1 [L]

Mulith

7:34 am on May 14, 2010 (gmt 0)

10+ Year Member



or something like this?


RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^/Browse/*$ rewrite.php [L]

RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^/Buy/*$ rewrite.php [L]

g1smd

7:40 am on May 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The -d and -f "exists" checks are very slow as they have to read the disk.

You missed this additional instruction above.

There is no need to rewrite requests for images, css, javascript, multimedia, or document-format files (exclude those from being rewritten by using a negative-match RewriteCond *before* doing any file- or directory-exists checking).

g1smd

8:43 am on May 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If there are no "real" files (other than "rewrite.php") in the Buy and Browse folders, you can probably get away with:

RewriteRule ^B(uy|rowse)/([^.]+)$ rewrite.php?type=B$1&request=$2 [L]


otherwise do the -f and -d checks, but add the additional RewriteCond as noted above.

Mulith

9:27 am on May 14, 2010 (gmt 0)

10+ Year Member



Sorry did miss that. No there is no "real" files in those folders. So I would be looking at something like this:

RewriteCond %{REQUEST_URI} !.*\.(css¦jpg¦gif¦zip¦js)
RewriteRule ^B(uy|rowse)/([^.]+)$ rewrite.php?type=B$1&request=$2 [L]

Mulith

9:39 am on May 14, 2010 (gmt 0)

10+ Year Member



Hmmm, very strange.

Getting a 404 Page not found when I use that.

Any suggestions?

Mulith

11:03 am on May 14, 2010 (gmt 0)

10+ Year Member



Whoohoo managed to get it working!

I had already written a script to handle the new "Buy/" urls which formed ppart of my search results page.

I changed my htacces a bit to:

Options +FollowSymLinks
RewriteEngine On
RewriteBase /


RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Buy/([^.]+)\-For\-Sale\-in\-([^.]+)\-([0-9]+)$ Search_Results.php?bus_type=$1&location_page=$2&pageno=$3 [L]

RewriteCond %{REQUEST_URI} !Search_Results_Mod.php
RewriteRule ^Buy/([^.]+)\-For\-Sale\-in\-([^.]+)$ Search_Results.php?bus_type=$1&location_page=$2 [L]

RewriteCond %{REQUEST_URI} !.*\.(css|jpg|gif|zip|js)
RewriteRule ^Browse/([^.]+) rewrite.php?request=$1 [L]


I only used the rewrite.php to handle the old urls and redirected them from within the script using:

header("HTTP/1.1 301 Moved Permanently");
header("Location: ". $url);

One of the first two rules would then evaluate and my Search_Results.php script would serve the correct info.

I do realise that I may have missed something or overlooked something important so any input would be handy.

Thank you

p.s Do you have a "Donate" or "Buy me a beer" section?

jdMorgan

1:46 pm on May 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Some minor pattern-anchoring (see regular-expressions tutorial cited in our Charter), pattern-value, and character-escaping errors.

Also, you need not "exclude" a path from a rule unless it matches the path to which that rule redirects or rewrites. I left the second rule's exclusion in place, because it was not clear whether "Search_Results_Mod\.php" and "Search_Results\.php" are the same thing, but possibly typed incorrectly here.

If not, then second rule's exclusion isn't required either. And if so, then the exclusion must match the substitution path as shown (compare to your code posted above).

Options +FollowSymLinks
RewriteEngine On
RewriteBase /
#
RewriteRule ^Buy/([^-]+)-For-Sale-in-([^-]+)-([0-9]+)$ Search_Results.php?bus_type=$1&location_page=$2&pageno=$3 [L]
#
RewriteCond %{REQUEST_URI} !^Search_Results\.php$
RewriteRule ^Buy/([^-]+)-For-Sale-in-(.+)$ Search_Results.php?bus_type=$1&location_page=$2 [L]
#
RewriteCond %{REQUEST_URI} !\.(css|jpg|gif|zip|js)
RewriteRule ^Browse/(.+)$ rewrite.php?request=$1 [L]

You can join WebmasterWorld as a supporter [webmasterworld.com], or you can buy a pint for those you meet at the next PubCon [pubcon.com].

Jim

Mulith

4:42 pm on Jun 24, 2010 (gmt 0)

10+ Year Member



I've used a header redirect in my rewrite script like the example below, is that the right way to do it?

header("HTTP/1.1 301 Moved Permanently");
header("Location: ". $url);

g1smd

7:10 am on Jun 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, as long as $url contains all of protocol, domain and path.