Welcome to WebmasterWorld Guest from 54.166.87.123

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Remove Long string with various symbols using .htaccess

Help remove Long string with various symbols using .htaccess file

   
4:13 pm on May 10, 2013 (gmt 0)



An old version of our website caused Google to discover multiple versions of our front page. It does not appear to be a query as there is no '=' anywhere in this.

All of the problematic strings start with: ?c%

How would I go about redirecting all requests starting with ?c% to the front page to eliminate this duplicate content?

Here is one of the strings in it's entirety:
/?c%25253EenGtcHW6eHmu%25255BYNvZ3F%25253E'd
mjdlgsbve%25253E2'lfzxpset%25253EQs'sbol%2
5253E2'f%25253Evt%25253Cvt%25253C79%25253C2%
25253C2%25253C54136375%25253Cyofumboefsy%252
53C2%25253C29249%25253A'vsm%25253Eiuuq%25252
64B%2525263G%2525263Gxxx%25252Fopdmjdlz%2525
2Fdpn%2525263G'gffe%25253Eopqbz'qsjdf%25253E
'tbq%25253E82%25253Ad8d1979f%25253A87933dg28
e8b967f3759'zbsht%25253Exxx%25252Fopdmjdlz%25252Fdpn

[edited by: incrediBILL at 8:07 pm (utc) on May 10, 2013]
[edit reason] added line breaks [/edit]

2:31 am on May 11, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, mailhaz!


those strings look like they were originally percent-encoded so that special or reserved characters were encoded.
this mean all '[' characters were encoded as '%5B', '>' characters were encoded as '%3E', '<' characters were encoded as '%3C', etc.
the string was subsequently re-encoded so all the '%' signs were replaced with '%25'.
then the string was encoded a third time so all the '%' signs were again replaced with '%25'.
therefore a '<' in the original string eventually became '%25253E'.

it might be worth unencoding that query string to see if it means anything to you afterward.

in general the solution is to use a RewriteCond with the QUERY_STRING as the TestString to see if it exists:

http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
If you wish to match against the hostname, port, or query string, use a RewriteCond with the %{HTTP_HOST}, %{SERVER_PORT}, or %{QUERY_STRING} variables respectively.


and then a RewriteRule with Substitution string that erases the requested query string:

http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
When you want to erase an existing query string, end the substitution string with just a question mark.
4:01 am on May 11, 2013 (gmt 0)



I am a bit of a noob so I will do my best to view the links provided and come up with something.

These strings/queries are from the previous website owner (I purchased this domain), so I am not worried about salvaging these. I just want them removed from Google's index so perhaps robot.txt is a better method of blocking these?
4:39 am on May 11, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



it might be worth unencoding that query string to see if it means anything to you afterward.

I fed it into my usual disencoding script and ended up with, well, GIGO:
>enGtcHW6eHmu[YNvZ3F>'dmjdlgsbve>

et cetera.

There are a suspicious lot of %3E and %3C -- but this can mean anything from an evil attempt at php injection to a clumsy robot misinterpreting html markup. As the OP said, = signs (%3D) are conspicuous by their absence. But so are all 8th-bit characters, so I don't think we're looking at something in a one-byte encoding made for a non-Roman script.

As phranque said, you can easily redirect them with a RewriteRule looking at the query string. But it may be more appropriate to use the same rule to serve up a flat 404. Or even a 410 ("It used to exist but is now gone"). It's a little dishonest, but may be appropriate here, if only to make the googlebot go away faster.

Don't bother with robots.txt. That's strictly for controlling the behavior of honorable crawlers.
1:18 am on May 13, 2013 (gmt 0)



Thank you for the response. What I am really looking for is something I can copy/paste into my .htaccess file

So, what would I put in my access file to redirect

http://example.com/?c%25253EenGtcHW6eHmu%25255BYNvZ3F%25253E'd
mjdlgsbve%25253E2'lfzxpset%25253EQs'sbol%2
5253E2'f%25253Evt%25253Cvt%25253C79%25253C2%
25253C2%25253C54136375%25253Cyofumboefsy%252

to

http://example.com
3:19 am on May 13, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



What I am really looking for is something I can copy/paste into my .htaccess file

Ah. For that, I'm afraid you have come to the wrong forum.