Welcome to WebmasterWorld Guest from 54.167.157.247

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Redirecting old urls via php script

   
11:17 pm on Oct 10, 2012 (gmt 0)

10+ Year Member



Hello all. I'm attempting to redirect thousands of URLs from an old custom CMS to a new Drupal powered site. The new URLs wont match the old since Drupal auto-assigns page ids. I do have access to the old page ids in my database.

I found [webmasterworld.com ]( Summary: Rewrite all "old" requests to a script. Have the script look up the new URL in a database and force the redirect).

I could use some feedback on my rewrite rules as well as the redirect script.

Old URLs: http://www.example.com/dir1/dir2/view.pl?id=12345
New URLs: http://www.example.com/node/$randomid


// redirect old traffic to redirect script
RewriteCond %{QUERY_STRING} ^id=([0-9]*)$
RewriteRule ^dir1/dir2/view\.pl$ /redir.php? [L]



This is the contents of redir.php:

<?php

// get id from the querystring
$legacy_id = htmlspecialchars($_GET["id"]);

// db Connection
$dbhost = 'xxx';
$dbuser = 'xxx';
$dbpass = 'xxx';
$dbname = 'xxx';

$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ('Error connecting to mysql');
mysql_select_db($dbname);

//Query
$query = ("SELECT ua.dst
FROM content_type_artwork cta
JOIN url_alias ua
ON ua.src = CONCAT('node/',cta.nid)
WHERE cta.field_legacy_art_id_value = $legacy_id");

$result = mysql_query($query);

while($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
$new_path = $row['dst'];
}

// close connection
mysql_close($conn);

// Permanent redirection
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.example.com/$new_path");

exit();

?>



Another related question - In the above, I am doing a 301 redirect to /node/$id. Drupal has internal path aliasing for friendly URLs. So, /node/$id will get transformed to something like /path1/page_title. Would it be better for me to redirect to the alias, rather than /node/$id, or does it not matter?

Thanks.
Thanks.
2:57 am on Oct 11, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



it looks like you are basically doing it right.


// redirect old traffic to redirect script


that comment should read "internally rewrite", not "redirect".

you should redirect to the alias urls and you should also use those alias urls in the site navigation and internal linking.
any requests for the /node/$id urls should be externally (301) redirected to the equivalent alias urls.
your CMS should do this for you.
the requests for the alias urls should get internally rewritten to your drupal script.
6:07 am on Oct 11, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



RewriteRule ^dir1/dir2/view\.pl$ /redir.php? [L]
<snip>
// get id from the querystring

How can it? You just got rid of the query string.
6:29 am on Oct 11, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The basic idea is sound, but do actually pass the id value to the script.

The special rewrite needs to appear high in the htaccess file, directly after rules that block access and before rules that redirect.

You will need to add an exclusion (RewriteCond) to your non-www/www redirecting rule to NOT redirect requests for the old URLs, otherwise you will expose the redir.php script path itself as a new URL back out on to the web when an old non-www URL is requested.

If the old URL passed to the script is not valid, currently the script will return 200 OK status and a blank page. This is a disaster. The script MUST return a 404 status and you would be wise to "include" the content of your 404 error page here.

[0-9]*
will allow a "blank" id to be passed to the PHP script. You should use
[0-9]+
here.

All of these changes are vital to overall success.

If you're interested in seeing how much traffic is being redirected, "include" the custom logger PHP code I posted a few months ago.
8:49 am on Oct 11, 2012 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



this one?

logger php code:
http://www.webmasterworld.com/google/4484532.htm#msg4484597

[edited by: phranque at 9:56 am (utc) on Oct 11, 2012]

9:50 am on Oct 11, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



That's the one.

In that script "$statusCode" is set to "301", "404", or "410", or anything else that's valid, as each of those types is logged in separate files.

Additionally, for 301 logging, "$pageType" was set to "category", "product" or "review" when the "$statusCode" was "301" as I was logging the redirects for each of those different page types to separate log files. The calling PHP script also set "$newLocation" so that the logging would show where the user was redirected to.

The "$statusCode" and "$pageType" variables are set by the calling PHP script just before the logging "include". The log file name includes those elements as well as the year and week number for weekly log rotation.

The logger script also detects whether the request is for www., test., or dev. and logs each separately. Adjust the internal file paths to suit your server.
9:33 pm on Oct 11, 2012 (gmt 0)

10+ Year Member



How can it? You just got rid of the query string.

Hah, good point.

Thanks everyone for the advice and for pointing out potential pitfalls, I appreciate it. I'm sure I'll be posting again as I get close to rolling this out.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month