homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 48 message thread spans 2 pages: < < 48 ( 1 [2]     
Spam words in a query string - do these backlinks hurt rankings?

 6:55 am on May 8, 2010 (gmt 0)

< This thread was split from another location [webmasterworld.com] >

to put it in a different perspective, if your URL can be dynamically changed to another URL which does not reflect the actual keyword you programmed for your URL and still resolves as a 200 header found, you've got a problem.

Informative post Dusky. Let me understand what you have posted here. If there is an additional keyword injected in the URL, you mean it should take you to a 404 error instead of displaying a page?

I have a Wordpress blog where I tested example.com/keyword-here/?q=spam-keyword and see that while the URL in the address remains the same (with the ?q= part included), the webpage has however resolved to example.com/keyword-here/

Do you mean this is a problem?

[edited by: tedster at 2:56 pm (utc) on May 8, 2010]



 3:57 pm on May 9, 2010 (gmt 0)

Yes, that's exactly that, the fix is when the URLs are already mod rewritten and only for certain types of CMSes otherwise it has to be modified to work for other types of pages.

In addition, best practices suggest that a full URL --protocol, domain, and URL-path-- should be used in any rule intended to invoke an external redirect.

Advice noted jdMorgan, yes I ought to know better I guess.


 2:14 pm on May 10, 2010 (gmt 0)

I need some advice here. I have searched till I am at a loss.
Our dynamic sites are fine but my home ecommerce site is another issue.
I use Helicon isapi rewrite for several applications. No url rewriting.
The site is a static site except for my search fuction. The search is something like this www.example.com/search.asp?searchterm=search term

When I tested the ?q=something all my url's will throw a 200 so I have looked for a solution. I testeed some of the coding given but it won't work for my site.Any suggestions would be greatly appreciated.


 4:49 pm on May 10, 2010 (gmt 0)

I'm using

RewriteCond %{QUERY_STRING} ^from.*$ [OR]
RewriteCond %{QUERY_STRING} ^ref.*$ [OR]
RewriteCond %{QUERY_STRING} ^q=$
RewriteRule ^/(.*) http://www.example.com/$1? [R=301,L]

### DROP ALL Qs Except Select Folders/Files
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !^/search$
RewriteCond %{REQUEST_URI} !^/widgets/search.html$
RewriteCond %{REQUEST_URI} !^/stats.*$
RewriteCond %{REQUEST_URI} !^/__utm.gif$
RewriteRule ^/(.*) http://www.example.com/$1? [R=301,L]


 6:32 pm on May 10, 2010 (gmt 0)

What's the question?


 6:44 pm on May 10, 2010 (gmt 0)

g1smd the question is.
I have a static site .htm and want to prevent the example.com/folder/name.htm?q=something or example.com/?q=something from throwing a 200 to keep this issue from happening to this static site. I am really stumped here and need some help if this is possible.

I have Helicon isapi installed on the server were I use it for 301 and other purposes but have no url rewiting taking place.


 7:38 pm on May 10, 2010 (gmt 0)

As a general rule, Helicon's ISAPI Rewrite for the IIS server [isapirewrite.com] now uses the same syntax as Apache's .htaccess. That means you can use Regular expressions for URL rewriting. However, I haven't been hands-on with this kind of query string issue, so I can't be 100% sure what works in this case.

I'd suggest beginning a thread in our Windows IIS forum [webmasterworld.com] for more experienced advice.


 8:45 pm on May 10, 2010 (gmt 0)

bwnbwn, I feel your pain, but I can't be of much help on MS or IIS stuff (I use them rarely for local testing only), however, the problem is not originating on your site or created deliberatly, it just that it can be abused and G* or other SEs aren't going to penalize you, so take your time and don't panic, trust me 50%+ of Internet sites have this problem and most are OK with it (most don't even know they have it anyway).

Starting a thread on the IIS forum would a best bet, and for reference you can always point them to my discussion so they'll have the general idea. I'd test on a local Intranet webserver to make sure the fixes are all good and run a link checker as well.

If you have URLs that are constructed as yoursite.com/?.... i.e with the ? after the domain.tld and slash or after the end of the URL blabla.html?...be careful as the fix will prevent those URLs and block or redirect them to a 404. Can't remember where I've seen some commercial solutions using this practice.


 1:13 am on May 11, 2010 (gmt 0)

Dusty thanks I have started a post on the forum before posting here no responses as of yet.
Tedester I do use a .htaccess to control what fuctions I do use for 301 and other things just no url rewrites.


 2:05 am on May 11, 2010 (gmt 0)

Well, Helicon named the module ISAPI Rewrite for a reason, right?


 2:22 am on May 11, 2010 (gmt 0)

I actually just looked at it for a minute thinking I should be able to come up with a starting point, but they do not have a 'THE_REQUEST' equivalent that I saw, so you could get most of them by checking for the QUERY_STRING string server variable containing a single character and then redirecting, and while this would solve MOST of the issue it would most likely miss: http://example.com/? because the query string is empty.

I would probably go with checking for the query string containing a character and then using a canonical link relationship to specify the version without the ? as the correct version of the page. It would look like this in Apache:

RewriteCond %{QUERY_STRING} .
RewriteRule .? http://example.com%{REQUEST_URI}? [R=301,L]

In ISAPI the best I can guess at is:
RewriteCond Query-String: .
RewriteRule (.*) http://example.com/$1? [RP,L]


 1:09 pm on May 12, 2010 (gmt 0)

I have a fix for the problem thanks for all those for advice and assistance in this matter.

RewriteBase /
RewriteCond %{QUERY_STRING} ^(?!searchterm=.+).+$ [NC]
RewriteRule .* 404.htm? [R=301,L]

*note The 404.htm can be replaced with what ever the site owner wants the 301 to go to I prefer it go to my 404 custom page.

Posted fix for those that are in the same fix and hope this will help them.

Dusky thanks for the heads up on the problem. I can't tell ya how many good tips I have gotten from this forum. Then digging for a workable solution for the sites I own and help manage.


 2:56 pm on May 12, 2010 (gmt 0)

You are apparently rewriting to your 404 page, which will return the proper "content" but with a 200-OK response. That is likely NOT what you want.

Also, because the syntax may differ between ISAPI Rewrite and Apache mod_rewrite I cannot be sure, but in mod_rewrite the %{QUERY_STRING} variable does not include the "?" character -- It is a delimiter between the URL-path and the query string, and is not a part of either.

This is the reason that TheMadScientist referred to the THE_REQUEST variable above, because it *does* include the question mark, even if the following query string is blank.

The 200-OK response problem can probably be fixed in one of two ways, depending on which version of Apache mod_rewrite ISAPI Rewrite is compatible with. I'd suggest trying the following:

Apache 2.x compatible:

RewriteBase /
RewriteCond %{QUERY_STRING} ^([^&]*&)*searchterm=[^&]* [NC]
RewriteRule .* - [R=404,L]

Apache 1.3.x compatible:

RewriteBase /
RewriteCond %{QUERY_STRING} ^([^&]*&)*searchterm=[^&]* [NC]
RewriteRule .* /some-path-that-you-know-will-never-exist.html? [L]



 4:03 pm on May 12, 2010 (gmt 0)

JD the code is correct as I did a header check with the http://www.example.com?q=something and the header response
#1 Server Response: http://www.example.com?q=something
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Date: Wed, 12 May 2010 15:58:44 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
Location: http://www.example.com/404.htm
Content-Length: 247
Content-type: text/html
Redirect Target: http://www.example.com/404.htm
#2 Server Response: http://www.example.com/404.htm
HTTP Status Code: HTTP/1.1 404 Not Found
Date: Wed, 12 May 2010 15:58:45 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
Content-Length: 8514
Content-Type: text/html
Cache-control: private

I as well checked product pages that are in different folder names and get the same header resopns as above for all urls.

This is what I am wanting correct?

I really do appreciate all help here JD as I am have never been to a day of computer school and everything I learned has been hard knock education.


 12:49 pm on May 13, 2010 (gmt 0)

Sorry to drag this thread out again, but I have finally found an answer.

This was taken from another post here last year but works.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteCond %{query_string} !^option= #This allows genuine urls with a query string to be found.
RewriteRule !\.php$ http://www.example.com/%1? [R=301,L]

jdMOrgan.. I was able to combine two of your examples into a series of rules that work perfectly for me:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteCond %{QUERY_STRING} !^SID=
RewriteCond %{QUERY_STRING} !^np=
RewriteRule !\.php$ http://example.com/%1? [R=301,L]

Anyway, let me state what the above does... unless a query string has ?SID= or ?np= it removes the query string and the question mark with a 301 redirect, for both bots & users (unless it's a .php page).
[webmasterworld.com ]

Now the only thing I need to be able to modify now is getting the server to give a 404 response directly rather than via a 301 direct, which from what I've read so far is dangerous.

Could anyone help please.


 12:57 pm on May 13, 2010 (gmt 0)

Post #:4131405 has the answer you require.

Internally rewrite to a
/non-existent-path and the ErrorDocument mechanism should directly return the correct 404 header and the content of the error page.

 1:10 pm on May 13, 2010 (gmt 0)

tessmac your not dragging it out as this is important to many of us.
JD if tessmac question could be answered or an educated guess would be great, because I can't see why doing a 301 to a 404 page for this is not correct.
JD thanks I did try your code but as I thought it would it broke my site search and sent them to a 404 page. I will play around with it to see if I can get it to 404 without breaking my site search fuction.


 11:29 am on May 17, 2010 (gmt 0)

I think you are overlooking this issue and you break the functionality of a site by adding various rewrite rules that all they do is hide a problem (if one exists)

First, any request may give a 200 header by adding various parameters to it. For example:
where values of i,j,k are say the spam keywords. Do you care if the server responds with 200 OK? No.

Second does the link http://www.example.com/index.php?i=1&j=2&k=3 propagates inside your pages? In other words if you view the HTML source after you access the site with that kind of link, does that link appears somewhere? Does it propagate? If it does then yes, you have a problem with the application and you need to find it and fix it.

Lots of these attempts are probing for this type of weakness. If one exists it can be exploited and not just to propagate spam.


 1:32 pm on May 17, 2010 (gmt 0)

Post #:4131405 has the answer you require.

Internally rewrite to a /non-existent-path and the ErrorDocument mechanism should directly return the correct 404 header and the content of the error page.

Ok, so how would this code below then be written?

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteCond %{query_string} !^option= #This allows genuine urls with a query string to be found.
RewriteRule !\.php$ http://www.example.com/%1? [R=301,L]

This 48 message thread spans 2 pages: < < 48 ( 1 [2]
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved