homepage Welcome to WebmasterWorld Guest from 174.129.80.166
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
search engine friendly url
yet another mod_rewrite question
Anguz




msg:1520852
 11:01 am on Jul 4, 2003 (gmt 0)

hello! I've just discovered these wonderful forums... expecto to see me posting sometimes :)

okay, my first post is about mod_rewrite

I have a forum and would like to make the urls more search engine friendly so they can be indexed easily

I'd like to convert, for example:
[mydomain.com...]
into:
[mydomain.com...]

so it doesn't have hard to read symbols for the spider, so the changes would be:

"foros/index.php?" into "foros_"
";" into "_"
"=" into "-"
add ".htm" at the end

I could make the friendly url even shorter, I know, but since I'm not familiar with all the different elements that could be used in the url, I rather simply change the symbols

I know I have to change the links in my page to the new format, what I need help with is the .htaccess code to change the friendly urls to the normal ones

I'm really new to this actually, so I'm having a hard time, although it may be quite easy for many of you

in advance, thank you very much for the trouble of replying :)

[edited by: DaveAtIFG at 2:30 pm (utc) on July 4, 2003]
[edit reason] Revised URLs [/edit]

 

trillianjedi




msg:1520853
 12:15 pm on Jul 4, 2003 (gmt 0)

Welcome to WebmasterWorld Anguz!

I can't answer your question, but I know there are others here who can, so I'm going to bump your thread for you!

Also, you should try searching this site (I suggest you use google rather than the inbuilt seearch facilities).

TJ

killroy




msg:1520854
 12:40 pm on Jul 4, 2003 (gmt 0)

Do you have access to the scripts themselves?

moving url characters into query_strings will require a bit more qoerk in mod_rewrite. It would be a lot simpler if you could do that part in the php itself.

Here is some basic info:

First make sure it's not some other part of the site:

RewriteCond %{REQUEST_URI} !^/foros_
RewriteRule ^/foros_board-([0-9]*)_action-([^_]*)_threadid-([0-9]*)_start-([^.]*)\.htm$ [E=QUERY_STRING:board=$1;action=$2;threadid=$3;start=$4]

I quickly threw this together from my own notes and this section in the docs:

[httpd.apache.org...]

Not tested but you get the idea.

Basically I meake sure with the RewriteCond it's really an access to the forum. Then I capture the relevant bitsvia a regex into variables, i.e. all the bits in round braces (). Lastly I use the E=VAR:VAL to build a new QUERY_STRING (normally the bit behind the?) using back references, the $1-4, to load thebits extracted from the URL.

If it doesn't work, let me know and I'll test it and fix it for ya... I love playign with this stuff.

SN

Anguz




msg:1520855
 1:03 pm on Jul 4, 2003 (gmt 0)

TJ:

thak you :)

I tried searching Google, but after reading several of the pages that came up, I still don't understand this :P

killroy:

cool! thank you for your reply =)

the thing is that the forum generates different urls, with different vars depending on the action you'll take... the one I posted was just an example

what I'm saying is that I don't think that defining a rule for a certain url format would be the best option, because of the different formats there may be

I think the best would be to write only four rules, the ones I mentioned, can it be done? something like a find-and-replace:

"foros_" into "foros/index.php?"
"_" into ";"
"-" into "="
remove ".htm" at the end

this way, it doesn't matter what variables, or in what order, are passed with the url, the rename will still work properly... or am I wrong?

[edited by: Anguz at 1:14 pm (utc) on July 4, 2003]

killroy




msg:1520856
 1:06 pm on Jul 4, 2003 (gmt 0)

I see what you mean, that's some fancy regexing, give me a mintute, I'll give it a shot.

The problem is you can't easily dorepeating patterns in this case, so you'd have to do a filter, all the ones with 1,2,3,4,5, variables seperately, I'll try it...

BRB

SN

killroy




msg:1520857
 1:13 pm on Jul 4, 2003 (gmt 0)

Ok, here is what I came up with.

Be carefull with the complexity, read slowly ;) it's jsut the same pattern repeated. if all your accesses have only four variables you need only that rule of course:

#ONE variable:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)\.htm$
RewriteRule ^/foros_([^-]*)-([^_.]*)\.htm$ /foros/index.php [E=QUERY_STRING:$1=$2]

#TWO variables:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$
RewriteRule ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$ /foros/index.php [E=QUERY_STRING:$1=$2;$3=$4]

#THREE variables:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$
RewriteRule ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$ /foros/index.php [E=QUERY_STRING:$1=$2;$3=$4;$5=$6]

#FOUR variables:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$
RewriteRule ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$ /foros/index.php [E=QUERY_STRING:$1=$2;$3=$4;$5=$6;$7=$8]

#FIVE variables:
RewriteCond %{REQUEST_URI} ^/foros_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)_([^-]*)-([^_.]*)\.htm$

Sory for the wide lines.

SN

killroy




msg:1520858
 1:43 pm on Jul 4, 2003 (gmt 0)

Just wanted to add:

In my experience it's best to simply map the script in mod_rewrite, and then query the PATH_INFO environment variable in the script, instead of QUERY_STRING. But you'll need to know your way around the scripts.

SN

Anguz




msg:1520859
 2:24 pm on Jul 4, 2003 (gmt 0)

wow! you really know a lot about this! :o

hmm... you got me thinking... the thing is that there might be no variable at all, and there may be more than 5, it depends on the page...

but you gave me an idea! :)

what if a friendly url like the one in the example:
[mydomain.com...]

was handled with the format like you said first, but a bit different, so that it'd be converted to this:
[mydomain.com...]

and rename.php would do all the replacing to write the proper url and then redirect to:
[mydomain.com...]

so only one rename would be done in .htaccess every time, no matter now many variables... how would it be done?

[edited by: DaveAtIFG at 2:32 pm (utc) on July 4, 2003]
[edit reason] Revised URLs [/edit]

killroy




msg:1520860
 2:33 pm on Jul 4, 2003 (gmt 0)

The conversions of the = ; and _ - is the easy bit, the only think I'm not certain is about over writing theenvironmetn variables.

best would be if your script could handle:

index.php/var1=val1&var2=val2&var3=val3

and so forth instead of:

index.php?var1=val1&var2=val2&var3=val3

i.e. use PATH_INFO instead of QUERY_STRING

but you probably don'T even access QUERY_STRING directly and use some fancy PHP stuff todeliver the variables into neat arrays or something... puh.

No need for an extra script, that woulöd be messy, slow, and ugly.

SN

killroy




msg:1520861
 2:34 pm on Jul 4, 2003 (gmt 0)

added: If you need more then 5 variables your site is seriously screwed up and you should do a logn and hard rethink of your URL strategy ;) please feel free to do a site sarch on "cool URIs don't change"

SN

jdMorgan




msg:1520862
 3:09 pm on Jul 4, 2003 (gmt 0)

Anguz,

Welcome to WebmasterWorld [webmasterworld.com]!

There's a fundamental problem here. The problem is that mod_rewrite acts after an HTTP request is received by the server, and before any content is served. Therefore, the script itself must output "friendly" URLs for spiders and users to see, and mod_rewrite can be used to convert those friendly URLs requested by users and 'bots back into the "unfriendly" URLs needed to call the script. Therefore, the methods discussed previously in this thread are probably "backwards."

I'd suggest getting rid of everything that is not needed in the URL, using a standard static-URL format, and rewriting from something like
[mydomain.com...]
to
[mydomain.com...]

Something like:

RewriteRule ^board([0-9])/([^/]+)/([^/]+)/(.*) /foros/index.php?board=$1;action=$2;threadid=$4;start=$3 [L]

might work.

You might also want to dig around in the library here in Website Technology, and try a WebmasterWorld site search for "friendly URL" and similar. We've discussed this issue a lot, and the other threads might help you to avoid pitfalls.

HTH,
Jim

Anguz




msg:1520863
 3:57 pm on Jul 4, 2003 (gmt 0)

jdMorgan, thx for the welcome :)
I'll do the search you suggest

about keeping the url simple, I see your point guys...

I don't know if there's a case with more than 5 variables, but there might be... the thing is that after talking about this, I realize that the friendly urls are only needed to display the threads

edit, post, pm, admin, all the other links can remain unfriendly, my visitors won't care and the spiders shouldn't follow them anyway

so if I stick to thread display, the fixed format will be easier to figure out...

let's see, I'm gonna check the necessary links for the spider to be able to browse:

a) the index doesn't need to be renamed

b) the link to a board is:
[mydomain.com...]

c) the link to the different pages in a board is:
[mydomain.com...]

d) the link to a thread with one page is:
[mydomain.com...]

e) the link to a thread with more pages is:
[mydomain.com...]

$board and $threadid are always numerical, $action is "messageindex" or "display", $start could be numerical or "new" (although "new" wouldn't be needed by the spider, it'd be generated with the friendly format, so it has to be renamed or it won't work for my visitors)

I'd like the friendly url to have a file format more than subdirectories, like for example:
[mydomain.com...]

Anguz




msg:1520864
 6:31 pm on Jul 4, 2003 (gmt 0)

while working on this, I noticed I have no idea of how to read or write regex lol

so I made the wise decision of learning it, I'm sure I won't bother you if I know that, at least not with this ;)

you can still suggest a regex if you feel like it :)

thank you all again, I am really impressed with the nice community here, I plan to stay for a long time and hope to gain enough knowledge so that I may be of help to someone in return

jdMorgan




msg:1520865
 6:44 pm on Jul 4, 2003 (gmt 0)

Anguz,

Writing regex is actually easier than reading it.

See this quick tutorial [etext.lib.virginia.edu].

And we have this good Introduction to mod_rewrite [webmasterworld.com]

HTH,
Jim

pageoneresults




msg:1520866
 7:05 pm on Jul 4, 2003 (gmt 0)

Anguz, I come at this from the IIS side. If you are going to rewrite the URLs, I would suggest rewriting all of them. Not just for the search engines, but your users also. We've got some recent threads floating around that refer to dirty URLs vs. clean URLs and the overall consensus is that cleaning all of them will be of great benefit.

Also, I do not recommend the use of underscores for file naming structures. I prefer the hyphen and there have been confirmed sightings that hypens may perform better than underscores. Apparently Google treats them differently with the hyphen representing a true space.

The goal when rewriting URLs is to eliminate as many variables as possible and also hide some. There are certain variables that do not need to appear in the string. The shorter the string, the more user friendly it becomes.

Make sure to bring your most important variables to the front of the string (keywords/phrases). Place the least important ones at the end of the string.

At the same time you are doing this, you want to make sure that the Server Header is returning the proper status codes for the new URLs. I'm sure you'll be utilizing 301s and 404s, make sure those are returning the proper status codes.

Check, double check, triple check and do one last final check before going live.

Anguz




msg:1520867
 10:01 pm on Jul 4, 2003 (gmt 0)

hmm... I agree that it'll be better to have nicer urls for everything...

I even thought of mapping the ids of the boards and show the board name in the url instead of the id

the problem is that the url formats for YaBBSE are pretty random, and depending on the action, the variables change, no just the value, the var used too

I don't think that doing what I mentioned before would be very user friendly, so I guess I'll have to check every possible url format and then write a rule for each

about the hyphen, thx for the tip! what other symbols are spider-friendly? :)

killroy




msg:1520868
 1:23 am on Jul 5, 2003 (gmt 0)

I appologise, my bad, I jsut now tried it out and realised I could mod_rewrite the QEURY_STRING right there in the target url.The messy settign of the QUERY_STRING environment variable is therefore completely unecessary, and probably wouldn'T have worked (as the QUERY_STRING env would prolly have been overwritten again anyways).

I hope jdMorgan's post was clearer.

Gotta work on my simplification skills.

SN

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved