Forum Moderators: coopster & phranque

Message Too Old, No Replies

Generating search engine friendly URLS

With HTML::Mason and dhandlers

         

bennymack

2:15 pm on Jun 15, 2005 (gmt 0)

10+ Year Member



Hello, this isn't a request for help so much but more of a request for comments. I've been reading about search engine friendly urls lately so I decided to implement search engine friendly URLs on a site I am developing.

Without going into too much detail, suffice it to say that the site will be almost 100% database driven. Therefore there will be many get variables in the URLs including the dreaded?id=1234 that I hear google dislikes.

What I wanted to do was translate a URL such as /items/widgets/ to items.html?id=5 where 5 is the database id of the widget item. This is slightly different than other approaches that I've seen that go the /items/5.html. This approach actually uses the name of the item as the filename in the URL. To me, this is an exciting prospect as far as getting indexed by a search engine. Read on if you agree.

If you're reading this post then hopefully you have a server that's configured with mod_perl and HTML::Mason, if not, do yourself a favor and set one up!

If you're not familiar with the concept of a dhandler in Mason, please visit
[masonbook.com...]
and
[masonhq.com...]

You'll want to configure Mason with decline_dirs off. This can be done in your handler.pl by specifying "decline_dirs => 0" or in httpd.conf "PerlSetVar MasonDeclineDirs 0".

Then you need to make sure you have an "/items" folder or whatever. Of course, all folders under "/items" are fake and completely arbitrary. Place this directive into your httpd.conf:
<Location /items>
SetHandler perl-script
PerlHandler HTML::Mason::ApacheHandler
DefaultType text/html
</Location>

Replace "HTML::Mason::ApacheHandler" with whatever package name you might have used in your handler.pl.

At this point, you'll want to place a dhandler into the /items folder. The dhandler will catch the request to the non-existent directory and hand it off to whatever component you would like.

dhandler:
<%perl>
my $uri = $r->uri;
$uri =~ s#/?(.*?)/?#$1#; # drop leading/trailing slashes
my @uri = join('/',$uri);
my $dbh = ... open database
my $query = "SELECT id FROM items WHERE name=?";
my $sth = $dbh->prepare( $query );
$sth->execute( $uri[1] ) or die $dbh->errstr;
my $rec = $sth->fetchrow_hashref;
$m->comp('items.html',id=>$rec->{id});
$m->abort;
</%perl>

Voila! This is off the top of my head so there may be errors. It's just to show the concept of using a dhandler to translate a nonexistent url to an actual page. Now you should be able to use /items/widget instead of items.html?id=5... Of course, items.html should be smart enough to handler id="" in case one wasn't found.

That's all for now. I hope I didn't leave anything out.

bennymack

1:11 pm on Jun 16, 2005 (gmt 0)

10+ Year Member



Ok I was hoping to generate a little bit more of a discussion than that :] I guess I better rephrase it in the form of a question.

Perhaps I need to use an example a little closer to home. What I would like to know is if designing a site that uses keyword laden URLs will benefit the site's SE ranking?

Take webmasterworld for example. Not that it needs any help :] but would it benefit from having URLs like:
[webmasterworld.com...]

Instead of:
[webmasterworld.com...]

While this is just one simple example, what about doing a whole site like this?

[foo.com...] etc...

These SE things can be counter-intuitive sometimes so might this cause the keyword density to be too high and therefore drop a site's ranking?

SeanW

5:27 pm on Jun 17, 2005 (gmt 0)

10+ Year Member



I've never done it with Mason, but I've done it with TT2, and that's similar to how I did it.

However, I'd be doing more tainting of @uri before directly passing it to your SQL backend.

Sean

bennymack

6:03 pm on Jun 17, 2005 (gmt 0)

10+ Year Member



SeanW thanks for the reply.

AFAIK when using perl DBI it is not possible to do any sort of SQL injection if that's what you are referring to. Someone could type in single quotes and semi colons all day. They way it works is it would just dump them on a page that shows everything. Sort of like a site map.

I tried breaking it but was unable to. If you have some examples of what could be harmful in the @uri let me know. I have a server I can test it out on. Thanks again!

P.S. I'm assuming then since you've done this as well that it's a good idea to do so? How were the results as far as PR?