Forum Moderators: phranque

Message Too Old, No Replies

Change query string to lower case

         

Nosmada

5:25 am on Aug 5, 2004 (gmt 0)

10+ Year Member



I have the following .htaccess file in my cgi-bin to handle taking a query typed into my search engine up to three words long. It strips out spacing characters and other strange characters and puts dashes in between words. If you type in lower case words all is perfect. If you put any caps in you get an error. I need to allow a user to put in their query in whatever case they want and behind the scenes this rewrite will clean their query of capital letters before processing. How should I modify or add to what is below?

# 3 words
RewriteCond %{QUERY_STRING} keywords=([^+]+)\+([^+]+)\+([^+]+)
RewriteRule ^search.cgi /%1-%2-%3.shtml? [R=301,L]
RewriteCond %{QUERY_STRING} keywords=([^%]+)\%20([^%]+)\%20([^%]+)
RewriteRule ^search.cgi /%1-%2-%3.shtml? [R=301,L]

# 2 words
RewriteCond %{QUERY_STRING} keywords=([^+]+)\+([^+]+)
RewriteRule ^search.cgi /%1-%2.shtml? [R=301,L]
RewriteCond %{QUERY_STRING} keywords=([^%]+)\%20([^+]+)
RewriteRule ^search.cgi /%1-%2.shtml? [R=301,L]

# 1 word
RewriteCond %{QUERY_STRING} keywords=([^+]+)
RewriteRule ^search.cgi /%1.shtml? [R=301,L]

Birdman

5:34 am on Aug 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It may be better to handle the parsing of the query in the script itself. You will have more options available to you.

I don't think you can do it in httpd.conf or .htaccess.

Birdman

jdMorgan

5:38 am on Aug 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can use RewriteMap to invoke the tolower: function, but RewriteMap is only available in httpd.conf context, not .htaccess.

Jim

Nosmada

6:04 am on Aug 5, 2004 (gmt 0)

10+ Year Member



where is this....httpd.conf?

If I did it in the script then I would end up with duplicate content because the URL's wouldn't be cleaned up:

i.e. domain.com/test-phrase.shtml
and domain.com/Test-phrase.shtml

Birdman

6:19 am on Aug 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Search engines can't index variables entered into search forms.

Are you creating the URLs from previous searches and displaying them?

<edit>nevermind ;)</edit>

[edited by: Birdman at 6:24 am (utc) on Aug. 5, 2004]

Nosmada

6:21 am on Aug 5, 2004 (gmt 0)

10+ Year Member



I can't have the above two URL's becuase of Google's duplicate content rules. If I can clean up spaces and other stuff and replace with dashes I should be able to clean up a capital or two. If you change dmoz to lower case b below, it changes the URL back to capital B.

[dmoz.org...]

Nosmada

6:25 am on Aug 5, 2004 (gmt 0)

10+ Year Member



I guess there is two problems:

1. If you type something into my search box with one or more capitals you get an error. So the person's entry needs to be cleaned of capitals.

2. If someone types out a URL with a capital and Google indexes it and the version with no capital then I have a duplicate URL problem.

Birdman

6:31 am on Aug 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DMOZ returns 302(moved temporarily) header if you use all lowercase. My money says it's a CGI script that's doing the redirect, not Apache.

Nosmada

6:33 am on Aug 5, 2004 (gmt 0)

10+ Year Member



still if I type in Gifts I get an error

if I type in gifts my rewrite handles things fine

Birdman

6:39 am on Aug 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1. If you type something into my search box with one or more capitals you get an error. So the person's entry needs to be cleaned of capitals.

Your search engine is very limited if cannot handle upper case letters! A simple stringtolower command(in the search script) and that's fixed.

2. If someone types out a URL with a capital and Google indexes it and the version with no capital then I have a duplicate URL problem.

Google cannot index URLs that are formed from user searches via a form. Just because you may be using a GET method on your form(bad) and you see the parameters in the address field of the browser, doesn't mean Google is indexing them.

Nosmada

6:48 am on Aug 5, 2004 (gmt 0)

10+ Year Member



1. it is not that simple.

I used to have the rewrite rewriting to /gift-baskets/ to clean the URL

all was fine

now changing to /gift-baskets.shtml invokes a problem caused by the rewrite code not the search engine. The engine in question is a seperate application pulling from other places. The query is taken raw from the form, rewritten to clean the url. Then another .htaccess file in the www root reconverts it back into cgi for processing so that the URL remains clean. Please look at the original .htaccess above.

2. I am not using a GET method, bad. I meant that if someone actually put the URL on one of their web pages and then Google spidered that then it would be problematic.