homepage Welcome to WebmasterWorld Guest from 54.161.240.10
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Redirecting old urls via php script
mcomet




msg:4506677
 11:17 pm on Oct 10, 2012 (gmt 0)

Hello all. I'm attempting to redirect thousands of URLs from an old custom CMS to a new Drupal powered site. The new URLs wont match the old since Drupal auto-assigns page ids. I do have access to the old page ids in my database.

I found [webmasterworld.com ]( Summary: Rewrite all "old" requests to a script. Have the script look up the new URL in a database and force the redirect).

I could use some feedback on my rewrite rules as well as the redirect script.

Old URLs: http://www.example.com/dir1/dir2/view.pl?id=12345
New URLs: http://www.example.com/node/$randomid


// redirect old traffic to redirect script
RewriteCond %{QUERY_STRING} ^id=([0-9]*)$
RewriteRule ^dir1/dir2/view\.pl$ /redir.php? [L]



This is the contents of redir.php:

<?php

// get id from the querystring
$legacy_id = htmlspecialchars($_GET["id"]);

// db Connection
$dbhost = 'xxx';
$dbuser = 'xxx';
$dbpass = 'xxx';
$dbname = 'xxx';

$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die ('Error connecting to mysql');
mysql_select_db($dbname);

//Query
$query = ("SELECT ua.dst
FROM content_type_artwork cta
JOIN url_alias ua
ON ua.src = CONCAT('node/',cta.nid)
WHERE cta.field_legacy_art_id_value = $legacy_id");

$result = mysql_query($query);

while($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
$new_path = $row['dst'];
}

// close connection
mysql_close($conn);

// Permanent redirection
header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.example.com/$new_path");

exit();

?>



Another related question - In the above, I am doing a 301 redirect to /node/$id. Drupal has internal path aliasing for friendly URLs. So, /node/$id will get transformed to something like /path1/page_title. Would it be better for me to redirect to the alias, rather than /node/$id, or does it not matter?

Thanks.
Thanks.

 

phranque




msg:4506714
 2:57 am on Oct 11, 2012 (gmt 0)

it looks like you are basically doing it right.


// redirect old traffic to redirect script


that comment should read "internally rewrite", not "redirect".

you should redirect to the alias urls and you should also use those alias urls in the site navigation and internal linking.
any requests for the /node/$id urls should be externally (301) redirected to the equivalent alias urls.
your CMS should do this for you.
the requests for the alias urls should get internally rewritten to your drupal script.

lucy24




msg:4506757
 6:07 am on Oct 11, 2012 (gmt 0)

RewriteRule ^dir1/dir2/view\.pl$ /redir.php? [L]
<snip>
// get id from the querystring

How can it? You just got rid of the query string.

g1smd




msg:4506770
 6:29 am on Oct 11, 2012 (gmt 0)

The basic idea is sound, but do actually pass the id value to the script.

The special rewrite needs to appear high in the htaccess file, directly after rules that block access and before rules that redirect.

You will need to add an exclusion (RewriteCond) to your non-www/www redirecting rule to NOT redirect requests for the old URLs, otherwise you will expose the redir.php script path itself as a new URL back out on to the web when an old non-www URL is requested.

If the old URL passed to the script is not valid, currently the script will return 200 OK status and a blank page. This is a disaster. The script MUST return a 404 status and you would be wise to "include" the content of your 404 error page here.

[0-9]* will allow a "blank" id to be passed to the PHP script. You should use [0-9]+ here.

All of these changes are vital to overall success.

If you're interested in seeing how much traffic is being redirected, "include" the custom logger PHP code I posted a few months ago.

phranque




msg:4506820
 8:49 am on Oct 11, 2012 (gmt 0)

this one?

logger php code:
http://www.webmasterworld.com/google/4484532.htm#msg4484597

[edited by: phranque at 9:56 am (utc) on Oct 11, 2012]

g1smd




msg:4506847
 9:50 am on Oct 11, 2012 (gmt 0)

That's the one.

In that script "$statusCode" is set to "301", "404", or "410", or anything else that's valid, as each of those types is logged in separate files.

Additionally, for 301 logging, "$pageType" was set to "category", "product" or "review" when the "$statusCode" was "301" as I was logging the redirects for each of those different page types to separate log files. The calling PHP script also set "$newLocation" so that the logging would show where the user was redirected to.

The "$statusCode" and "$pageType" variables are set by the calling PHP script just before the logging "include". The log file name includes those elements as well as the year and week number for weekly log rotation.

The logger script also detects whether the request is for www., test., or dev. and logs each separately. Adjust the internal file paths to suit your server.

mcomet




msg:4507090
 9:33 pm on Oct 11, 2012 (gmt 0)

How can it? You just got rid of the query string.

Hah, good point.

Thanks everyone for the advice and for pointing out potential pitfalls, I appreciate it. I'm sure I'll be posting again as I get close to rolling this out.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved