Forum Moderators: phranque
okay, my first post is about mod_rewrite
I have a forum and would like to make the urls more search engine friendly so they can be indexed easily
I'd like to convert, for example:
[mydomain.com...]
into:
[mydomain.com...]
so it doesn't have hard to read symbols for the spider, so the changes would be:
"foros/index.php?" into "foros_"
";" into "_"
"=" into "-"
add ".htm" at the end
I could make the friendly url even shorter, I know, but since I'm not familiar with all the different elements that could be used in the url, I rather simply change the symbols
I know I have to change the links in my page to the new format, what I need help with is the .htaccess code to change the friendly urls to the normal ones
I'm really new to this actually, so I'm having a hard time, although it may be quite easy for many of you
in advance, thank you very much for the trouble of replying :)
[edited by: DaveAtIFG at 2:30 pm (utc) on July 4, 2003]
[edit reason] Revised URLs [/edit]
moving url characters into query_strings will require a bit more qoerk in mod_rewrite. It would be a lot simpler if you could do that part in the php itself.
Here is some basic info:
First make sure it's not some other part of the site:
RewriteCond %{REQUEST_URI} !^/foros_
RewriteRule ^/foros_board-([0-9]*)_action-([^_]*)_threadid-([0-9]*)_start-([^.]*)\.htm$ [E=QUERY_STRING:board=$1;action=$2;threadid=$3;start=$4]
I quickly threw this together from my own notes and this section in the docs:
[httpd.apache.org...]
Not tested but you get the idea.
Basically I meake sure with the RewriteCond it's really an access to the forum. Then I capture the relevant bitsvia a regex into variables, i.e. all the bits in round braces (). Lastly I use the E=VAR:VAL to build a new QUERY_STRING (normally the bit behind the?) using back references, the $1-4, to load thebits extracted from the URL.
If it doesn't work, let me know and I'll test it and fix it for ya... I love playign with this stuff.
SN
thak you :)
I tried searching Google, but after reading several of the pages that came up, I still don't understand this :P
killroy:
cool! thank you for your reply =)
the thing is that the forum generates different urls, with different vars depending on the action you'll take... the one I posted was just an example
what I'm saying is that I don't think that defining a rule for a certain url format would be the best option, because of the different formats there may be
I think the best would be to write only four rules, the ones I mentioned, can it be done? something like a find-and-replace:
"foros_" into "foros/index.php?"
"_" into ";"
"-" into "="
remove ".htm" at the end
this way, it doesn't matter what variables, or in what order, are passed with the url, the rename will still work properly... or am I wrong?
[edited by: Anguz at 1:14 pm (utc) on July 4, 2003]
Be carefull with the complexity, read slowly ;) it's jsut the same pattern repeated. if all your accesses have only four variables you need only that rule of course:
#ONE variable:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)\.htm$
RewriteRule ^/foros_([^-]*)-([^_.]*)\.htm$ /foros/index.php [E=QUERY_STRING:$1=$2]#TWO variables:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$
RewriteRule ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$ /foros/index.php [E=QUERY_STRING:$1=$2;$3=$4]#THREE variables:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$
RewriteRule ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$ /foros/index.php [E=QUERY_STRING:$1=$2;$3=$4;$5=$6]#FOUR variables:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$
RewriteRule ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$ /foros/index.php [E=QUERY_STRING:$1=$2;$3=$4;$5=$6;$7=$8]#FIVE variables:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$
Sory for the wide lines.
SN
hmm... you got me thinking... the thing is that there might be no variable at all, and there may be more than 5, it depends on the page...
but you gave me an idea! :)
what if a friendly url like the one in the example:
[mydomain.com...]
was handled with the format like you said first, but a bit different, so that it'd be converted to this:
[mydomain.com...]
and rename.php would do all the replacing to write the proper url and then redirect to:
[mydomain.com...]
so only one rename would be done in .htaccess every time, no matter now many variables... how would it be done?
[edited by: DaveAtIFG at 2:32 pm (utc) on July 4, 2003]
[edit reason] Revised URLs [/edit]
best would be if your script could handle:
index.php/var1=val1&var2=val2&var3=val3
and so forth instead of:
index.php?var1=val1&var2=val2&var3=val3
i.e. use PATH_INFO instead of QUERY_STRING
but you probably don'T even access QUERY_STRING directly and use some fancy PHP stuff todeliver the variables into neat arrays or something... puh.
No need for an extra script, that woulöd be messy, slow, and ugly.
SN
Welcome to WebmasterWorld [webmasterworld.com]!
There's a fundamental problem here. The problem is that mod_rewrite acts after an HTTP request is received by the server, and before any content is served. Therefore, the script itself must output "friendly" URLs for spiders and users to see, and mod_rewrite can be used to convert those friendly URLs requested by users and 'bots back into the "unfriendly" URLs needed to call the script. Therefore, the methods discussed previously in this thread are probably "backwards."
I'd suggest getting rid of everything that is not needed in the URL, using a standard static-URL format, and rewriting from something like
[mydomain.com...]
to
[mydomain.com...]
Something like:
RewriteRule ^board([0-9])/([^/]+)/([^/]+)/(.*) /foros/index.php?board=$1;action=$2;threadid=$4;start=$3 [L]
You might also want to dig around in the library here in Website Technology, and try a WebmasterWorld site search for "friendly URL" and similar. We've discussed this issue a lot, and the other threads might help you to avoid pitfalls.
HTH,
Jim
about keeping the url simple, I see your point guys...
I don't know if there's a case with more than 5 variables, but there might be... the thing is that after talking about this, I realize that the friendly urls are only needed to display the threads
edit, post, pm, admin, all the other links can remain unfriendly, my visitors won't care and the spiders shouldn't follow them anyway
so if I stick to thread display, the fixed format will be easier to figure out...
let's see, I'm gonna check the necessary links for the spider to be able to browse:
a) the index doesn't need to be renamed
b) the link to a board is:
[mydomain.com...]
c) the link to the different pages in a board is:
[mydomain.com...]
d) the link to a thread with one page is:
[mydomain.com...]
e) the link to a thread with more pages is:
[mydomain.com...]
$board and $threadid are always numerical, $action is "messageindex" or "display", $start could be numerical or "new" (although "new" wouldn't be needed by the spider, it'd be generated with the friendly format, so it has to be renamed or it won't work for my visitors)
I'd like the friendly url to have a file format more than subdirectories, like for example:
[mydomain.com...]
so I made the wise decision of learning it, I'm sure I won't bother you if I know that, at least not with this ;)
you can still suggest a regex if you feel like it :)
thank you all again, I am really impressed with the nice community here, I plan to stay for a long time and hope to gain enough knowledge so that I may be of help to someone in return
Writing regex is actually easier than reading it.
See this quick tutorial [etext.lib.virginia.edu].
And we have this good Introduction to mod_rewrite [webmasterworld.com]
HTH,
Jim
Also, I do not recommend the use of underscores for file naming structures. I prefer the hyphen and there have been confirmed sightings that hypens may perform better than underscores. Apparently Google treats them differently with the hyphen representing a true space.
The goal when rewriting URLs is to eliminate as many variables as possible and also hide some. There are certain variables that do not need to appear in the string. The shorter the string, the more user friendly it becomes.
Make sure to bring your most important variables to the front of the string (keywords/phrases). Place the least important ones at the end of the string.
At the same time you are doing this, you want to make sure that the Server Header is returning the proper status codes for the new URLs. I'm sure you'll be utilizing 301s and 404s, make sure those are returning the proper status codes.
Check, double check, triple check and do one last final check before going live.
I even thought of mapping the ids of the boards and show the board name in the url instead of the id
the problem is that the url formats for YaBBSE are pretty random, and depending on the action, the variables change, no just the value, the var used too
I don't think that doing what I mentioned before would be very user friendly, so I guess I'll have to check every possible url format and then write a rule for each
about the hyphen, thx for the tip! what other symbols are spider-friendly? :)
I hope jdMorgan's post was clearer.
Gotta work on my simplification skills.
SN