homepage Welcome to WebmasterWorld Guest from 54.226.191.80
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Mod_rewrite for making message board crawlable?
Google can't crawl my message board urls
larsj




msg:1511440
 4:55 pm on Nov 29, 2002 (gmt 0)

Googlebot don't like my urls in a message board (ikonboard):

http*//domain.tld/cgi-bin/ikonboard.cgi?s=af0f848fce80fd2094425ebc6c8ad310;act=SF;f=6

Would a mode_rewrite like this make it crawlable so Google can index the message, or am I wrong here?

^(.*)ikonboard/(.*)/(.*)/(.*)/(.*)/(.*)/(.*)\.htm$ $1cgi-bin/ikonboard.cgi?$2=$3;$4=$5;$6=$7

If this would work, is it anything more I need to do?

 

DaveAtIFG




msg:1511441
 4:50 pm on Dec 5, 2002 (gmt 0)

My apologies larsj, we're not ignoring you, this post just got overlooked for a while... :) Your question is beyond my meager mod_rewrite skills but I'll keep bugging our regular rewrite wizards until they agree to take a look! :)

jdMorgan




msg:1511442
 6:34 pm on Dec 5, 2002 (gmt 0)

larsj,

I don't think that's going to work, since the fields don't match (there are not six slashes in http://domain.tld/cgi-bin/ikonboard.cgi?s=af0f848fce80fd2094425ebc6c8ad310;act=SF;f=6,
so the regular-expressions pattern
^(.*)ikonboard/(.*)/(.*)/(.*)/(.*)/(.*)/(.*)\.htm$ $1cgi-bin/ikonboard.cgi?$2=$3;$4=$5;$6=$7 won't match).

The search engines don't like cgi with query strings anyway, so rewriting to a URL that still contains ".cgi?(anything)" isn't likely to help.

You might try making [domain.tld...] accessible from an apparently-static URL. That is, publish links that look like this:

http://domain.tld/board/SF/6/af0f848fce80fd2094425ebc6c8ad310.html

and then translate those internally to [domain.tld...]
for use by ikonboard.

ikonboard would need to be modified to output the links which contain no ".cgi" and no query strings (perhaps by wrapping ikonboard in another simple script), and then letting mod_rewrite provide "feedback" into ikonboard by rewriting these SE-friendly URLs back to ikonboard-style URLs:

RewriteRule ^board/([A-Z]{2})/([0-9])/(0-9a-z)*\.html /cgi-bin/ikonboard.cgi?s=$3;act=$1;f=$2 [L]

I have assumed that the "s=" part is always numbers 0-9 and lowercase letters a-f (hexadecimal) but of varying length, and that the "act=" part "SF" is always a two-letter, uppercase string A-Z, and that it is then followed by ";f=" and a single digit for the purpose of this example.

The basic idea is that any URL "seen" by a search engine should look like a static html page. Therefore, any html output visible to the search engine should not contain ".cgi" and/or a query string. Remember that mod_rewrite executes in response to a URL request during the process of converting that URL into a local file pathname on your server. So, the search engine must request a SE-friendly URL, and mod_rewrite will translate it to the form that ikonboard expects; ikonboard must then output pages with more SE-friendly URLs in order for the SE to spider those pages. As such, the above process may be the reverse of what you thought it should be.

I haven't worked with ikonboard, and I personally use only a very few, very simple script-cloaking rewrites, so I hope this helps!

Jim

sun818




msg:1511443
 6:42 pm on Dec 5, 2002 (gmt 0)

s= is the session id in Ikonboard. Your site won't get crawled as long as that string is present. An option is to convert to another forum like YaBB whose URLs are more search engine friendly.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved