homepage Welcome to WebmasterWorld Guest from 54.197.19.35
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Remove Long string with various symbols using .htaccess
Help remove Long string with various symbols using .htaccess file
mailhaz




msg:4572756
 4:13 pm on May 10, 2013 (gmt 0)

An old version of our website caused Google to discover multiple versions of our front page. It does not appear to be a query as there is no '=' anywhere in this.

All of the problematic strings start with: ?c%

How would I go about redirecting all requests starting with ?c% to the front page to eliminate this duplicate content?

Here is one of the strings in it's entirety:
/?c%25253EenGtcHW6eHmu%25255BYNvZ3F%25253E'd
mjdlgsbve%25253E2'lfzxpset%25253EQs'sbol%2
5253E2'f%25253Evt%25253Cvt%25253C79%25253C2%
25253C2%25253C54136375%25253Cyofumboefsy%252
53C2%25253C29249%25253A'vsm%25253Eiuuq%25252
64B%2525263G%2525263Gxxx%25252Fopdmjdlz%2525
2Fdpn%2525263G'gffe%25253Eopqbz'qsjdf%25253E
'tbq%25253E82%25253Ad8d1979f%25253A87933dg28
e8b967f3759'zbsht%25253Exxx%25252Fopdmjdlz%25252Fdpn

[edited by: incrediBILL at 8:07 pm (utc) on May 10, 2013]
[edit reason] added line breaks [/edit]

 

phranque




msg:4572951
 2:31 am on May 11, 2013 (gmt 0)

welcome to WebmasterWorld, mailhaz!


those strings look like they were originally percent-encoded so that special or reserved characters were encoded.
this mean all '[' characters were encoded as '%5B', '>' characters were encoded as '%3E', '<' characters were encoded as '%3C', etc.
the string was subsequently re-encoded so all the '%' signs were replaced with '%25'.
then the string was encoded a third time so all the '%' signs were again replaced with '%25'.
therefore a '<' in the original string eventually became '%25253E'.

it might be worth unencoding that query string to see if it means anything to you afterward.

in general the solution is to use a RewriteCond with the QUERY_STRING as the TestString to see if it exists:

http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
If you wish to match against the hostname, port, or query string, use a RewriteCond with the %{HTTP_HOST}, %{SERVER_PORT}, or %{QUERY_STRING} variables respectively.


and then a RewriteRule with Substitution string that erases the requested query string:

http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
When you want to erase an existing query string, end the substitution string with just a question mark.

mailhaz




msg:4572963
 4:01 am on May 11, 2013 (gmt 0)

I am a bit of a noob so I will do my best to view the links provided and come up with something.

These strings/queries are from the previous website owner (I purchased this domain), so I am not worried about salvaging these. I just want them removed from Google's index so perhaps robot.txt is a better method of blocking these?

lucy24




msg:4572973
 4:39 am on May 11, 2013 (gmt 0)

it might be worth unencoding that query string to see if it means anything to you afterward.

I fed it into my usual disencoding script and ended up with, well, GIGO:
>enGtcHW6eHmu[YNvZ3F>'dmjdlgsbve>
et cetera.

There are a suspicious lot of %3E and %3C -- but this can mean anything from an evil attempt at php injection to a clumsy robot misinterpreting html markup. As the OP said, = signs (%3D) are conspicuous by their absence. But so are all 8th-bit characters, so I don't think we're looking at something in a one-byte encoding made for a non-Roman script.

As phranque said, you can easily redirect them with a RewriteRule looking at the query string. But it may be more appropriate to use the same rule to serve up a flat 404. Or even a 410 ("It used to exist but is now gone"). It's a little dishonest, but may be appropriate here, if only to make the googlebot go away faster.

Don't bother with robots.txt. That's strictly for controlling the behavior of honorable crawlers.

mailhaz




msg:4573389
 1:18 am on May 13, 2013 (gmt 0)

Thank you for the response. What I am really looking for is something I can copy/paste into my .htaccess file

So, what would I put in my access file to redirect

http://example.com/?c%25253EenGtcHW6eHmu%25255BYNvZ3F%25253E'd
mjdlgsbve%25253E2'lfzxpset%25253EQs'sbol%2
5253E2'f%25253Evt%25253Cvt%25253C79%25253C2%
25253C2%25253C54136375%25253Cyofumboefsy%252

to

http://example.com

lucy24




msg:4573397
 3:19 am on May 13, 2013 (gmt 0)

What I am really looking for is something I can copy/paste into my .htaccess file

Ah. For that, I'm afraid you have come to the wrong forum.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved