Welcome to WebmasterWorld Guest from 50.19.53.104

Message Too Old, No Replies

Spam words in a query string - do these backlinks hurt rankings?

     

anand84

6:55 am on May 8, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



< This thread was split from another location [webmasterworld.com] >

to put it in a different perspective, if your URL can be dynamically changed to another URL which does not reflect the actual keyword you programmed for your URL and still resolves as a 200 header found, you've got a problem.

Informative post Dusky. Let me understand what you have posted here. If there is an additional keyword injected in the URL, you mean it should take you to a 404 error instead of displaying a page?

I have a Wordpress blog where I tested example.com/keyword-here/?q=spam-keyword and see that while the URL in the address remains the same (with the ?q= part included), the webpage has however resolved to example.com/keyword-here/

Do you mean this is a problem?

[edited by: tedster at 2:56 pm (utc) on May 8, 2010]

dusky

3:57 pm on May 9, 2010 (gmt 0)

10+ Year Member



Yes, that's exactly that, the fix is when the URLs are already mod rewritten and only for certain types of CMSes otherwise it has to be modified to work for other types of pages.

In addition, best practices suggest that a full URL --protocol, domain, and URL-path-- should be used in any rule intended to invoke an external redirect.

Advice noted jdMorgan, yes I ought to know better I guess.

bwnbwn

2:14 pm on May 10, 2010 (gmt 0)

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I need some advice here. I have searched till I am at a loss.
Our dynamic sites are fine but my home ecommerce site is another issue.
I use Helicon isapi rewrite for several applications. No url rewriting.
The site is a static site except for my search fuction. The search is something like this www.example.com/search.asp?searchterm=search term

When I tested the ?q=something all my url's will throw a 200 so I have looked for a solution. I testeed some of the coding given but it won't work for my site.Any suggestions would be greatly appreciated.

levo

4:49 pm on May 10, 2010 (gmt 0)

10+ Year Member



I'm using


### DROP KNOWN BAD Qs
RewriteCond %{QUERY_STRING} ^from.*$ [OR]
RewriteCond %{QUERY_STRING} ^ref.*$ [OR]
RewriteCond %{QUERY_STRING} ^q=$
RewriteRule ^/(.*) http://www.example.com/$1? [R=301,L]

### DROP ALL Qs Except Select Folders/Files
RewriteCond %{QUERY_STRING} !^$
RewriteCond %{REQUEST_URI} !^/search$
RewriteCond %{REQUEST_URI} !^/widgets/search.html$
RewriteCond %{REQUEST_URI} !^/stats.*$
RewriteCond %{REQUEST_URI} !^/__utm.gif$
RewriteRule ^/(.*) http://www.example.com/$1? [R=301,L]

g1smd

6:32 pm on May 10, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



What's the question?

bwnbwn

6:44 pm on May 10, 2010 (gmt 0)

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 5+ Year Member



g1smd the question is.
I have a static site .htm and want to prevent the example.com/folder/name.htm?q=something or example.com/?q=something from throwing a 200 to keep this issue from happening to this static site. I am really stumped here and need some help if this is possible.

I have Helicon isapi installed on the server were I use it for 301 and other purposes but have no url rewiting taking place.

tedster

7:38 pm on May 10, 2010 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



As a general rule, Helicon's ISAPI Rewrite for the IIS server [isapirewrite.com] now uses the same syntax as Apache's .htaccess. That means you can use Regular expressions for URL rewriting. However, I haven't been hands-on with this kind of query string issue, so I can't be 100% sure what works in this case.

I'd suggest beginning a thread in our Windows IIS forum [webmasterworld.com] for more experienced advice.

dusky

8:45 pm on May 10, 2010 (gmt 0)

10+ Year Member



bwnbwn, I feel your pain, but I can't be of much help on MS or IIS stuff (I use them rarely for local testing only), however, the problem is not originating on your site or created deliberatly, it just that it can be abused and G* or other SEs aren't going to penalize you, so take your time and don't panic, trust me 50%+ of Internet sites have this problem and most are OK with it (most don't even know they have it anyway).

Starting a thread on the IIS forum would a best bet, and for reference you can always point them to my discussion so they'll have the general idea. I'd test on a local Intranet webserver to make sure the fixes are all good and run a link checker as well.

If you have URLs that are constructed as yoursite.com/?.... i.e with the ? after the domain.tld and slash or after the end of the URL blabla.html?...be careful as the fix will prevent those URLs and block or redirect them to a 404. Can't remember where I've seen some commercial solutions using this practice.

bwnbwn

1:13 am on May 11, 2010 (gmt 0)

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Dusty thanks I have started a post on the forum before posting here no responses as of yet.
Tedester I do use a .htaccess to control what fuctions I do use for 301 and other things just no url rewrites.

tedster

2:05 am on May 11, 2010 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Well, Helicon named the module ISAPI Rewrite for a reason, right?

TheMadScientist

2:22 am on May 11, 2010 (gmt 0)

WebmasterWorld Senior Member themadscientist is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



I actually just looked at it for a minute thinking I should be able to come up with a starting point, but they do not have a 'THE_REQUEST' equivalent that I saw, so you could get most of them by checking for the QUERY_STRING string server variable containing a single character and then redirecting, and while this would solve MOST of the issue it would most likely miss: http://example.com/? because the query string is empty.

I would probably go with checking for the query string containing a character and then using a canonical link relationship to specify the version without the ? as the correct version of the page. It would look like this in Apache:

RewriteCond %{QUERY_STRING} .
RewriteRule .? http://example.com%{REQUEST_URI}? [R=301,L]

In ISAPI the best I can guess at is:
RewriteCond Query-String: .
RewriteRule (.*) http://example.com/$1? [RP,L]

bwnbwn

1:09 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I have a fix for the problem thanks for all those for advice and assistance in this matter.

RewriteBase /
RewriteCond %{QUERY_STRING} ^(?!searchterm=.+).+$ [NC]
RewriteRule .* 404.htm? [R=301,L]

*note The 404.htm can be replaced with what ever the site owner wants the 301 to go to I prefer it go to my 404 custom page.

Posted fix for those that are in the same fix and hope this will help them.

Dusky thanks for the heads up on the problem. I can't tell ya how many good tips I have gotten from this forum. Then digging for a workable solution for the sites I own and help manage.

jdMorgan

2:56 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



You are apparently rewriting to your 404 page, which will return the proper "content" but with a 200-OK response. That is likely NOT what you want.

Also, because the syntax may differ between ISAPI Rewrite and Apache mod_rewrite I cannot be sure, but in mod_rewrite the %{QUERY_STRING} variable does not include the "?" character -- It is a delimiter between the URL-path and the query string, and is not a part of either.

This is the reason that TheMadScientist referred to the THE_REQUEST variable above, because it *does* include the question mark, even if the following query string is blank.

The 200-OK response problem can probably be fixed in one of two ways, depending on which version of Apache mod_rewrite ISAPI Rewrite is compatible with. I'd suggest trying the following:

Apache 2.x compatible:

RewriteBase /
RewriteCond %{QUERY_STRING} ^([^&]*&)*searchterm=[^&]* [NC]
RewriteRule .* - [R=404,L]

-or-
Apache 1.3.x compatible:

RewriteBase /
RewriteCond %{QUERY_STRING} ^([^&]*&)*searchterm=[^&]* [NC]
RewriteRule .* /some-path-that-you-know-will-never-exist.html? [L]

Jim

bwnbwn

4:03 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 5+ Year Member



JD the code is correct as I did a header check with the http://www.example.com?q=something and the header response
#1 Server Response: http://www.example.com?q=something
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Date: Wed, 12 May 2010 15:58:44 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
Location: http://www.example.com/404.htm
Content-Length: 247
Content-type: text/html
Redirect Target: http://www.example.com/404.htm
#2 Server Response: http://www.example.com/404.htm
HTTP Status Code: HTTP/1.1 404 Not Found
Date: Wed, 12 May 2010 15:58:45 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
Content-Length: 8514
Content-Type: text/html
Set-Cookie: ASPSESSIONIDCQDSQDTB=IJDDJOMAJGBFAFIEMEPGBBAM; path=/
Cache-control: private

I as well checked product pages that are in different folder names and get the same header resopns as above for all urls.

This is what I am wanting correct?

I really do appreciate all help here JD as I am have never been to a day of computer school and everything I learned has been hard knock education.

tessmac

12:49 pm on May 13, 2010 (gmt 0)

5+ Year Member



Sorry to drag this thread out again, but I have finally found an answer.

This was taken from another post here last year but works.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteCond %{query_string} !^option= #This allows genuine urls with a query string to be found.
RewriteRule !\.php$ http://www.example.com/%1? [R=301,L]

jdMOrgan.. I was able to combine two of your examples into a series of rules that work perfectly for me:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteCond %{QUERY_STRING} !^SID=
RewriteCond %{QUERY_STRING} !^np=
RewriteRule !\.php$ http://example.com/%1? [R=301,L]

Anyway, let me state what the above does... unless a query string has ?SID= or ?np= it removes the query string and the question mark with a 301 redirect, for both bots & users (unless it's a .php page).
[webmasterworld.com ]


Now the only thing I need to be able to modify now is getting the server to give a 404 response directly rather than via a 301 direct, which from what I've read so far is dangerous.

Could anyone help please.

g1smd

12:57 pm on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Post #:4131405 has the answer you require.

Internally rewrite to a
/non-existent-path
and the
ErrorDocument
mechanism should directly return the correct 404 header and the content of the error page.

bwnbwn

1:10 pm on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member bwnbwn is a WebmasterWorld Top Contributor of All Time 5+ Year Member



tessmac your not dragging it out as this is important to many of us.
JD if tessmac question could be answered or an educated guess would be great, because I can't see why doing a 301 to a 404 page for this is not correct.
JD thanks I did try your code but as I thought it would it broke my site search and sent them to a 404 page. I will play around with it to see if I can get it to 404 without breaking my site search fuction.

enigma1

11:29 am on May 17, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I think you are overlooking this issue and you break the functionality of a site by adding various rewrite rules that all they do is hide a problem (if one exists)

First, any request may give a 200 header by adding various parameters to it. For example:
http://www.example.com/index.php?i=1&j=2&k=3
where values of i,j,k are say the spam keywords. Do you care if the server responds with 200 OK? No.

Second does the link http://www.example.com/index.php?i=1&j=2&k=3 propagates inside your pages? In other words if you view the HTML source after you access the site with that kind of link, does that link appears somewhere? Does it propagate? If it does then yes, you have a problem with the application and you need to find it and fix it.

Lots of these attempts are probing for this type of weakness. If one exists it can be exploited and not just to propagate spam.

tessmac

1:32 pm on May 17, 2010 (gmt 0)

5+ Year Member



Post #:4131405 has the answer you require.

Internally rewrite to a /non-existent-path and the ErrorDocument mechanism should directly return the correct 404 header and the content of the error page.


Ok, so how would this code below then be written?

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\?
RewriteCond %{query_string} !^option= #This allows genuine urls with a query string to be found.
RewriteRule !\.php$ http://www.example.com/%1? [R=301,L]
This 48 message thread spans 2 pages: 48
 

Featured Threads

Hot Threads This Week

Hot Threads This Month