Forum Moderators: phranque

Message Too Old, No Replies

any advice to this "mod_rewrite" newbie?

a newbie setting a lyrics site...

         

Skeleton

10:16 pm on Feb 27, 2005 (gmt 0)

10+ Year Member



hi there,

i am trying to set up a php-mysql based lyrics site. first (before mod_rewrite) i am just thinking of make '/show_lyric.php?id=12345' type php template page for retrieving lyrics from mysql db. but i heard that SE's dont like these type of urls and dont cache them very good. after some investigation i come up with solution having pages like '/artist_name/album_name/name_of_the_song.htm'. in constructing anchors for the actual page i can generate these fake urls. but in mod_rewrite side i am a totally newbie to this module (actually to apache server :( )

i just prepared my huge lyrics mysql db and made the graphic design. now i must do some real job by solving the problem i figured above. thanks for any suggestion (even any word about this)

jdMorgan

10:53 pm on Feb 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The mod_rewrite part is easy. It is setting up the database that is the most work. I strongly suggest designing your database layout so that records can be retrieved using the 'search-friendly' URL format. If you can do lookups only by index number, then you'll have to use either mod_rewrite or a script to translate "artist/album/song" to "id=1234". And this will involve basically a huge, different, separate database, that you will have to maintain in parallel with the main database -- in other words, error-prone and a big time-waster.

There's nothing wrong with using database record numbers to ease your job at db maintenance if you need them, but set up your database so that you can also pull records using the /artist/album/song format. If you do that, you will need only one or a very few simple rewrite rules, and you will have to maintain only one database.

For example, the following rewrite will translate a URL in the form "example.com/lyrics/U2/The_Joshua_Tree/In_the_Name_of_Love" to
"example.com/find_lyrics.php?artist=U2&album=The_Joshua_Tree&song=In_the_Name_of_Love"


RewriteRule ^lyric/([^/]+)/([^/]+)/([.+])$ /find_lyrics.php?artist+$1&album=$2&song=$3 [L]

Then the script can pass those variables the database manager to look up the lyrics, and the whole thing is easier to work on, rather than trying to use "?lyric=1234"

The critical thing is that you don't want to have to re-invent the wheel. You want to discuss songs with your friends using artist/album/song. You want to link on your pages with text relevant to artist/album/song. It would not be natural for you to discuss "database entry 1234" or "database entry 1234565678999" with your friends. These would also make lousy links if you want to get your pages indexed in search engines.

At the same time, mod_rewrite and scripts are only good for re-formatting data. They are not good at doing "lookups" -- in this case taking "/lyrics/U2/The_Joshua_Tree/In_the_Name_of_Love" and 250,000 other titles and translating them to "id-1234" and other record numbers. That's a database job.

So that's the number one thing to do to make your life easier - make the database do the work. Same goes for any dynamic site database, whether it's the product catalog for "Wide World of Widgets" or for a lyrics site. Let the database do the work, not you.

Jim

Skeleton

12:02 am on Feb 28, 2005 (gmt 0)

10+ Year Member



thanks for your long explanatory reply. it made me more clear on the topic. i also think that the most of the job will be done by database manager (i will most probably go to the T-Sql forum after Apache cos i am also not very good at sql statements :) )

then i tried this RewriteRule you gave but couldn't get it work properly:

RewriteRule ^lyric/([^/]+)/([^/]+)/([.+])$ /find_lyrics.php?artist+$1&album=$2&song=$3 [L]
(i fixed ....artist+$1.... to ....artist=$1.... this is perhaps a mistake)

but i compiled a new one which works perfectly:

RewriteRule ^lyrics/([^/.]+)/([^/.]+)/([^/.]+)\.(htm¦html)$ /find_lyrics.php?artist=$1&album=$2&song=$3 [L]

i want to ask you if this rule is good enough to use? is there any mistake, or sth that will arise any error in the future?

btw what is the difference between these :
([^/.]+) and ([^/]+) and ([.+])

thanks.

jdMorgan

12:32 am on Feb 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check out the references in our charter [webmasterworld.com] for info on regular expressions. Basically, "[^/]+" means, "match one or more characters not equal to a slash." In this way, it grabs all the characters between the slashes, and it knows exactly when to stop grabbing characters -- on the first slash that it finds. As such, it is much less ambiguous and faster than "match one or more of any character," as in ".+".

If you use a subpattern like ".+" or ".*" at the beginning or in the middle of a complex pattern, then it will initially grab (match) all the characters in the string. The regex parser will then realize that it cannot match the remaining subpatterns without "backing up" through the previously-matched string. After many iterations, it will finally figure out that the slashes denote the boundaries between subpatterns. It *will* work, but it's much faster if you simply *tell it* where those boundaries are. If you do, pattern-matching can proceed in one pass from left to right in the string.

<soapbox>As you may notice from my other posts in this forum, I have declared war against the unnecessary use of ".*", because it is the most ambiguous, least efficient possible pattern, and should be used only where actually required. It's sole virtue is that it is easy to learn and use, but this comes at the cost of very bad regex processing performance when used in complex patterns. I therefore promote the use of forward-looking negative matches like [^/]+ to increase efficiency.</soapbox>

Your new pattern looks OK -- I see no need to include the "." in "[^/.]". Just be aware that there is no rule that says you must name your virtual pages with an ".html" extension -- or any other extension. As a matter of fact, the W3C is promoting the use of URLs with no filetype to allow for seamless upgrades in technology, for example from html to php. Certainly, if you are going to include file extensions in your new plan, you should pick one and only one -- .htm or .html -- and stick with it.

Jim

Skeleton

2:32 am on Mar 20, 2005 (gmt 0)

10+ Year Member



hi again after a long while :)

my project has nearly finished. but i have recently noticed that some of my artist names in db contains .(dots). Then it returns wrong parameters for generating .php url. i am using this sort of rules:

RewriteRule ^([^/.]+)\.htm$ /artist.php?artist=$1 [L]

What can i do for avoiding this error? Thanks in advance.

Caterham

6:09 pm on Mar 20, 2005 (gmt 0)

10+ Year Member



Just remove the '.' from the character class (none of this char):

RewriteRule ^([^/]+)\.htm$ /artist.php?artist=$1 [L]

jdMorgan

8:06 pm on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That makes two of us who didn't like the "." in the character class. :)

Jim

Skeleton

9:24 pm on Mar 20, 2005 (gmt 0)

10+ Year Member



when i have tried removing that . it stops working :(

Caterham

9:44 pm on Mar 20, 2005 (gmt 0)

10+ Year Member



And how does your complete .htaccess look like? Do you've some conflict with other rules?

> stops working
This can have differend meanings. What happens? 404 not found?

Your request looks like /DJ.rewrite.htm?

jdMorgan

11:38 pm on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The following rule should work for anything except "artist names" that contain slashes. You simply cannot allow artist names to have slashes or this won't work. Also, the rule is a lot less efficient than the previous version, since the pattern has to be matched from the end ("\.htm$") back toward the beginning.

RewriteRule ^(.+/)?([^/]+)\.htm$ /artist.php?artist=$1 [L]

I suggest you put a filter in your database manager to prevent band names with slashes from being entered. If you get a band name with a slash, replace the slash with a character that has no special meaning in a URL. Avoid trying to use any reserved characters in URLs [faqs.org] or query strings.

Jim

Skeleton

12:14 am on Mar 21, 2005 (gmt 0)

10+ Year Member



i think i am supposed to clean my db from .'s and /'s. i have recently noticed the / containing ones :) . thanks for your responses