Forum Moderators: phranque

Message Too Old, No Replies

Implementing a site categorization scheme a la del.icio.us

newbie wants to avoid big implementation mistake

         

goodwill14

7:43 pm on Jan 19, 2006 (gmt 0)

10+ Year Member



Hello,

I am new to Apache & Linux so please forgive me if this is basic. (Also I thought mentioning del.icio.us by name in this descriptive context was OK--I apologize if I'm wrong.)

I want to create a site categorization scheme similar to
how del.icio.us displays tags to the user--in my case the categories are not user-driven, but they are like tags in that every category is only 1 level under the root and a category does not contain other categories.

So I plan to just create many subfolders under the public_html root like this:

mydomain.com/category1
mydomain.com/category2
mydomain.com/category3
...potentially 10,000 more subdirectories (in theory)...

each with an index.php inside the directory, so that users type & see in their address bar "mydomain.com/category1" , rather than "mydomain.com/category1.php"

Before doing this I thought it prudent to ask if my strategy is technically inadvisable for some reason (are thousands of subdirectories under the root public_html OK?), or whether there is an obviously superior way of implementing this. (From my admittedly tiny understanding of mod_rewrite it does not seem that mod_rewrite is in any way needed to implement this.)

(Note: I'm currently using a shared hosting environment using Apache 1.3.34 under Linux kernel 2.4.20)

Thank you,

Dan

jdMorgan

9:21 pm on Jan 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dan,

Welcome to WebmasterWorld!

I wouldn't want to have to deal with a top-level directory containing 10,000 subdirectories. This would make even simple ftp access difficult, as you'd have to wait for 10,000 directory names to be displayed before uploading a single file.

One method is to create 'virtual' subdirectories using only part of the requested category names, and then hide these virtual directories from the user by using URL rewriting.

As an example, take the first letter of the category name as the first-level subdirectory, and then the first and second as the next level. Then use mod_rewrite to build these subdirectory paths for each request. In this way, it appears to the visitor that your site has 10,000 subdirectories below root, but this is not really the case.

The one thing to keep in mind about using mod_rewrite is that a URL and a filepath are not at all the same thing, and actually, they need have no common elements whatsoever if mod_rewrite is used. However, it *is* better if the filepath can be derived from the URL using a single or limited number of rules for transformation.

For example:


Requested URL ........... Server directory path
example.com/apple-sauces /a/ap/apple-sauces/index.php
example.com/applications /a/ap/applications/index.php
example.com/aquatic-life /a/aq/aquatic-life/index.php

Here, apple-sauces and applications share a second-level directory, but aquatic-life is in a separate second-level directory, and shares only the first-level /a subdirecotry path with the other two.

You could further break that down into one, two, and three-letter subdirectories, or maybe even four if required to keep each directory at a manageable size (try for less than 200 entries per directory).

The mod_rewrite code for implementing the scheme above is relatively simple:


RewriteRule ^([a-z])([a-z])([a-z])+/?$ /$1/$1$2/$1$2$3 [L]

Doing so, your site will be much easier to maintain, and the visitor will see no difference whatsoever.

URL references within your scripts will also remain unchanged. However, 'includes' and other mechanisms that use server file accesses should be written using server-root-relative or absolute filepathsfor ease of maintenance and script 'sharability'.

Jim

goodwill14

11:41 pm on Jan 19, 2006 (gmt 0)

10+ Year Member



Jim,

Thanks a lot for the very helpful reply. I think you saved me about 3 weeks of grief.

Now if I could push my luck and abuse your time with a few follow-up questions...

-Just confirming Mod_rewrite rewriting is prior to and transparent to all users, including bots, so there are no search engine implications?

-Would the URL be case insensitive (ideally I would want
the user to be able to type in "example.com/apple-sauces" or "example.com/APPLE-Sauces")?

-I may have categories consisting of either one letter (e.g., for an alphabetized list of items, "example.com/a")
or two letters. Plus, category characters can be numbers.
My regex powers are weak but it seems therefore I would
have to rewrite the regex: I'm thinking that if regex groups $2 and $3
are empty, as in the 1-character case, the last two forward slashes would still be output, e.g.,
"example.com/a//"--alas I don't know how to express conditional output of the 2nd and 3rd backslashes based on whether group $2 and\or $3 are null. Sorry for the regex question.

This is as close as I can get:
RewriteRule ^([a-z]¦[0-9])([a-z]¦[0-9])?([a-z]¦[0-9])?[^/]*?/?$ /$1/$1$2/$1$2$3 [L]

(Maybe I need 3 rewrite rules, one for each case: 1 character; 2 characters; and 3 or more characters?)

--With regard to implementation I can put something like below in the .htaccess in my site root.
RewriteEngine on
RewriteRule ^([a-z]¦[0-9])([a-z]¦[0-9])?([a-z]¦[0-9])?[^/]*?/?$ /$1/$1$2/$1$2$3 [L]
(Later I'll study the AllowOverride and sym links syntax as per the 'Beginning Mod Rewrite' tutorial on this site);

Your first reply was really all I needed, so I'll be able to answer these questions on my own if you can't reply.

In any case thank you again!

Dan

jdMorgan

12:18 am on Jan 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Would the URL be case insensitive

No, not if your Apache server is running on a *nix machine -- Unix and its derivatives/clones are case-sensitive. However, if you have httpd.conf access, you can use RewriteMap and the system tolower function to make everything lowercase. Also, if you are accepting user input using a script, then that script can convert to lowercase before creating the category URL.

I may have categories consisting of either one letter (e.g., for an alphabetized list of items, "example.com/a") or two letters. Plus, category characters can be numbers.
...
Maybe I need 3 rewrite rules, one for each case: 1 character; 2 characters; and 3 or more characters?)

I would suggest separate rules - at least for initial development.

Jim

goodwill14

12:43 am on Jan 20, 2006 (gmt 0)

10+ Year Member



Jim,

Thank you again, that's very helpful.

Dan