Forum Moderators: phranque

Message Too Old, No Replies

rewrite rules and spiders

converting large static html site to dynamic php site

         

Quarfelburg

7:01 pm on May 31, 2005 (gmt 0)

10+ Year Member



We are converting our several hundred page static web site into a dynamic site using php. One of our requirements is to make the whole new php site structure transparent to the outside world (spiders and users). In other words, only the old (currently indexed be search engines) urls are visible to them. For example:

old link "www.oursite.com/product1.html" will be mapped to new php url "www.oursite.com/product.php?id=123456".

All the mapping is done by using apache's rewrite module, by adding rewrite rules like:

RewriteRule ^\/product1\.html$ /product.php?id=123456

We have done this for every page that is currently indexed on the major engines. Also when we add a new page to the site we automatically create a new rewrite rule.

However, inside every new php page there are many links to other pages on the same site.

i.e. inside page /product.php?id=123456 there is a link to another product /product.php?id=654321, which maps to the old page /product2.html.

My question is:
If we use the structure as it is with links pointing to URLS such as /product.php?id=1234567 will the search engines see this url or the static url it is mapped too in the .htaccess (product3.html)?

It's really important that this transformation be completely transparent to the search engines, as our rankings are doing extremely well and I don't want to jeopardize that.

Thanks in advance,
Quarfelburg

jdMorgan

7:50 pm on May 31, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> with links pointing to URLS such as /product.php?id=1234567 will the search engines see this url or the static url it is mapped too in the .htaccess (product3.html)?

They will see what users see -- the dynamic URLs.

mod_rewrite changes URLs received from clients --browsers and robots-- and converts them to URL-paths used inside your server. It is a front-end process, not an output filter.

I suggest you stop and re-assess your plan. Here's a recommended procedure:

Use preg_replace and database calls on your php pages to convert /product.php?id=1234567 to a link like http://www.example.com/products/p1234567.html or, with an extra database call to get a "short" description, to http://www.example.com/pn-12345673/green-fuzzy-widget or http://www.example.com/pd-/green-fuzzy-widget/1234567 (for a small added SEO boost).

The key is to have a unique path for mod_rewrite to detect so it can invoke the rewrite, and to keep the URL as short as possible, while possibly adding a few keywords and keeping future expansion of the product line in mind.

301-redirect all of your old "product1.html" pages to the new equivalent URLs (as you have done), and switch to a more maintainable database-driven scheme for all future 'pages'.

In other words, do all the lookup work using the existing database. In this way you can have a single RewriteRule in httpd.conf or .htaccess instead of using hundreds of rules or having to use a RewriteMap with a second copy of data you already have in your database. Let php do all the heavy lifting for you; It is more powerful and flexible, has database access, and you are already using it, so you're not adding another big chunk of complexity.

Jim

Quarfelburg

7:39 am on Jun 2, 2005 (gmt 0)

10+ Year Member



Thanks a lot for your reply, it's really helpful.

Actually we just want to keep the old format urls, i.e., to make the transition from static -> dynamic transparent to users and search engines. So may i ask how can i use php to do the mapping?

To avoid creating hundreds of rewrite rules in .htaccess, the only way i can thought of so far is to direct the old-format page requests(say, product1.html) to a single php file , which acts as an error handling page then this page will redirect the requests to their dynamic format(product.php?id=123456). But then all the requests will be logged as error(File does not exist) in the log file. Also I am not sure if the error affects the search engine. Could you give me a hint on how i can do it in a better way?

Thanks again.

Quarfelburg

4:03 pm on Jun 2, 2005 (gmt 0)

10+ Year Member



in other words, do all the lookup work using the existing database. In this way you can have a single RewriteRule in httpd.conf or .htaccess instead of using hundreds of rules or having to use a RewriteMap with a second copy of data you already have in your database. Let php do all the heavy lifting for you;

So you mean do all the mapping by a php script? I already keep the mapping records in the database so this sounds reasonable for me to do. How does the rewrite rule looks like? Does it direct all page requests to the php script?

jdMorgan

5:12 pm on Jun 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Redirect all old-style page requests to a script that looks up the correct new-style php page parameters and serves the correct page. This is *not* any kind of error handler, so let's avoid that confusion. It's just a piece of code at the top of your regular page-handling script that looks at the requested URL. If the URL is the old-style URL, then it can use the database or a fixed internal table to look up the correct script parameters for the that request.

Your .htaccess might be something like.

RewriteRule ^(old_style_url1)$ /script.php?style=old&old_url=$1 [L]
RewriteRule ^(old_style_url2)$ /script.php?style=old&old_url=$1 [L]
RewriteRule ^(old_style_url3)$ /script.php?style=old&old_url=$1 [L]

In the script, check for "style=old". If defined and valid, then look up the normal script parameters using the URL passed in old_url. If "style" is undefined or not equal to "old", then your script should behave exactly as it does today.

This is just an example. There are hundreds of acceptable ways to do this, and I'm no php expert. You might want to ask about more refined methods in one of the scripting forums.

Jim