|Clean URL help|
how can i construct clean urls without unique id
This is my first post in this forum, and i hope my question has not been asked before, actually i googled a alot about the issue with no luck.
Most websites nowadays use database driven, dynamic pages. That is, a site like http://example.com would contain a page for displaying articles showarticle.php then to determine which article to pull out of the database, one would use query strings (http://example.com/showarticle.php?artcileid=523). I recently read about clean urls and how you can convert the previous URL to http://example.com/articles/523.html for example using mod_rewrite and .htaccess. If you notice here, we didn't lose the article id (523) it's still in the URL, it's just the place that changed. My question now is: a lot of websites use this technique, but i can't find any unique id in the title, for example a site like http://example.com/articles/how-to-be-an-effective-leader/ i don't see any reference to the page id here, and i don't think that they use the page title to be the primary key, as this would be against optimization rules. I learned that this technique is used by Wordpress. Do you have any idea how do they establish this?
[edited by: jdMorgan at 2:36 pm (utc) on May 25, 2010]
[edit reason] example.com [/edit]
> i don't think that they use the page title to be the primary key
Yes they do.
If that is all that is in the URL, then that is what they use for the key. Since all of the unique-location information received from the client is in the URL, the server *must* use the URL -- There is nothing else that it could use if the site is to be designed in a way that allows search engines to index it.
What I mean by that is that the "page id" could be passed to the server using a cookie or a custom HTTP request header generated by a client-side script, but that would make the site un-indexable to search engines, and make bookmarking and linking to specific pages quite impossible.
In your example above, server-side code 'examines' the requested URL-path and sees "article" in the URL. It then passes control to WordPress which is evidently using a Search-Engine-Friendly plug-in. Part of this plug-in allows for retrieval of the article by its title, "how-to-be-an-effective-leader". The database is searched for that text, and the "article ID" can be retrieved (if needed) for use in internal processing of the request.
The biggest problem one sees in projects like this is in trying to redirect old URLs with id numbers to new ones with "titles". This cannot be done in .htaccess except on a page-by-page basis -- one rule for each old page URL. This is impractical for most sites unless they are very small. So instead, a rewriterule can be used to invoke a script to look up the new "title-based" URL using the old "id-number" URL, and generate a 301 redirect to the new URL. Using a script which can access the "article" or "product" database is really the only practical solution on sites with more than a handful of URLs.
This of course requires the the database records contain a field (or fields) for the old URL(s) for each current page.
Thanks for the thorough explanation. I dug deeper in this subject, so for the record, the way Word Press does this "URL Friendly" thing is called "post slug" in case anyone is searching.
Thanks again and sorry for not writing my urls as example.com.
If you can leave the article ID in the URL like
http://example.com/[b]a2345[/b]-how-to-be-an-effective-leader and you check the text is "exactly right" for that article ID number, and redirect to the correct URL if any part of the text is wrong or missing, then that opens the door to doing a number of other useful things, for example being able to post
http://example.com/a2345 to Twitter as your own URL short.
As an aside, URLs for 'pages' should not have a trailing slash.