Forum Moderators: phranque

Message Too Old, No Replies

Strip unwanted characters from url using htaccess

strip chars from url

         

frogz

11:02 pm on Dec 31, 2008 (gmt 0)

10+ Year Member



Hi Jim,

how can I strip(and/or)replace "," "!" "?" (unwanted characters) from a requested url? Redirect them accordingly through mod_rewrite?

Ty!

frogz

11:42 pm on Dec 31, 2008 (gmt 0)

10+ Year Member




cond: ^([^\ ]*)\ (?)$ $1-$2
rule: (.*) http://%{HTTP_HOST}/$1

hmm

frogz

11:53 pm on Dec 31, 2008 (gmt 0)

10+ Year Member



preg_replace is in order i suppose

g1smd

12:16 am on Jan 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Make sure it generates a 301 redirect.

Include the target host name in the redirect.

patd

2:25 pm on Mar 20, 2009 (gmt 0)

10+ Year Member



Hi,
I have the same problem, can you just give me the whole code which I can just copy to my .htaccess file.

I wants to remove all unwanted char like ~!@#$%^&*()-+=_ and all the numbers from url 0-9

-
Thanks in advance.

g1smd

2:45 pm on Mar 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What have you tried so far?

We can help you get your code working.

patd

2:52 pm on Mar 20, 2009 (gmt 0)

10+ Year Member



I am quite new to this .htaccess thing

I have tried this
#RewriteRule ^/(.*)\+(.*)$ [%{HTTP_HOST}...] [R=301,L]

like if I have charecters like ~!@#$% etc and 0-9 then they will be just removed.

like [sitename.com...]
will become
[sitename.com...]

also is there any good tutorial availabe, I have't found any good one.

patd

2:53 pm on Mar 20, 2009 (gmt 0)

10+ Year Member



I also tried

RewriteCond ^([^\ ]*)\ (?)$ $1$2
RewriteRule (.*) [%{HTTP_HOST}...] [R=301,L]

this also not working

g1smd

2:58 pm on Mar 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule cannot see the leading / so omit that from your pattern.

Removing path parts is easy if they are always in a fixed position in the URL, or are easily matched to a pattern.

If the extra characters are random in their position, and are random characters themselves, then the problem becomes a lot more difficult... any .htaccess redirect code is likely to be very very inefficient - and you absolutely do not want to have 'chained' redirects - where you have one redirect after each single character fixed. For 10 fixes, you would have 10 chained redirects and the browser or user agent would likely give up after several redirects and NOT access the content at all.

In this case, you might be better to look at using a RewriteMap but be aware that your HOST will likely need to set that up for you.

jdMorgan

3:41 pm on Mar 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code posted in this recent thread, Replacing Underscores with Dashes, past 9 [webmasterworld.com], can be used to replace characters, or it can easily be modified to simply remove them.

I would strongly advise anyone contemplating the use of mod_rewrite to read the Apache mod_rewrite documentation [httpd.apache.org] thoroughly. Mod_rewrite is a powerful module which modifies your server configuration; One single typo can take down your server (if you are lucky), or it can quietly destroy your search engine rankings over time. As such, it cannot be treated casually, and cutting and pasting code you do not fully understand is a sure recipe for disaster.

Additional resources are available in our Apache Forum Charter, and examples can be found in our Apache Forum Library. Links to these resources appear at the top left of every page in this forum.

Jim

g1smd

3:48 pm on Mar 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



While I would contemplate doing simple fixes for common issues such as:

- redirecting a request for:

/~jim
to
example.com/jim/

- redirecting a request for:
/thispage"sometext
to
example.com/thispage

I think I would simply send a 404 back for a request like:

/use~r&some@this$other"stuff

patd

6:47 pm on Mar 20, 2009 (gmt 0)

10+ Year Member



Hi,

I can remove all the spaces from url with this
# Replace spaces with hyphens
RewriteRule ^([^\ ]*)\ (.*)$ $1$2 [E=rspace:yes,N]
# Redirect to update URL in search engine listings and browsers
RewriteCond %{ENV:rspace} yes
RewriteRule (.*) [%{HTTP_HOST}...] [R=301,L]

I can do the same with underscore by just replacing rspace with unscors

but what about the rest of the charecters like 0-9 and ~!@# etc

jdMorgan

9:46 pm on Mar 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unless you want to put the client (and the server) through multiple redirects -- one for each character replaced, I strongly suggest that you re-evaluate your interpretation of the code I posted in that other thread.

Move any URL with any character you don't like into a server variable.
Replace all instances of the first unwanted character in the server variable (multiple rewritecond/rewriterule steps as shown in that other thread, enough to allow for all instances you might expect to replace.
Then repeat with a set of rules for the next unwanted character, updating the same variable as in the first replacement ruleset.
Continue with a ruleset for each unwanted character.
When all unwanted characters have been replaced in the server variable, then and only then do the external redirect.

This will be a ton of code, and very, very slow and inefficient. I suggest that you re-evaluate exactly what is causing this problem, fix the root cause, and then "repair" only the critical URLs which have errors, letting the rest return a 404 Not Found error.

If you depend on such a far-reaching URL "repair" solution, you are likely to have problems in the future in addition to severe server-performance-related problems; For example, you will not be able to successfully set up a Google or Yahoo Webmaster Tools account, because doing so requires that you place a file on your server with 16 digits in its name, and the code we're talking about here would strip those numbers from the URL, and make it impossible to use that file to validate your site with Google.

Jim