Welcome to WebmasterWorld Guest from 54.147.44.93

Forum Moderators: phranque

Message Too Old, No Replies

Concern with spammy search words

     

el_roboto

8:24 am on Jul 9, 2007 (gmt 0)

5+ Year Member



I've been using awstats to monitor traffic etc to my website.

For atleast the past 3 months I've noticed that a certain number of spammy search terms are appearing in the search keywords/phrases logs.

I'm wondering how this could be happening? might someone else on the server be spamming somehow as I'm on a VPS?

DXL

12:52 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This happens to just about every site that I manage. I generally get keywords related to prescription drugs even though none of their names appear on any of the sites that I maintain.

el_roboto

12:57 pm on Jul 9, 2007 (gmt 0)

5+ Year Member



It's crazy!

fioricet and gambling are the two which appear every month without fail - likewise I can't see anything thats been hacked into my source anywhere.

jdMorgan

2:42 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



If these are like the ones I've seen, they're mostly (or all) coming from the "Tide" proxy servers at Microsoft. As such they're fairly easy to block based on their REMOTE_ADDR, REMOTE_HOST, and VIA headers.

You should be able to confirm this by reviewing your server access log file, and searching for the 'spammy' terms.

Jim

el_roboto

2:55 pm on Jul 9, 2007 (gmt 0)

5+ Year Member



hah! there it is

131.107.0.96 - - [08/Jul/2007:06:52:15 +0100] "GET / HTTP/1.1" 200 2290 "http://search.live.com/result.aspx?q=fioricet&mrt=en-us&FORM=LVSP" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; Win64; x64; SV1; .NET CLR 2.0.50727)

with the information above how would I then go about blocking?

physics

11:51 pm on Jul 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jdMorgan is probably right. However, if I were you I would do a search like:
site:yoursite.com gambling
on Google to make sure that none of your pages have those terms on them.

jdMorgan

12:43 am on Jul 10, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yes, that IP address resolves to tide526.microsoft.com

I don't know what these people (or robots) are up to --or whether blocking these accesses will actually accomplish anything-- but if you're on Apache, something like this in .htaccess would return a 403-Forbidden response for these requests:


RewriteCond %{REMOTE_ADDR} ^131\.107\.0\.(6[4-9]¦[789][0-9]¦1[01][0-9]¦12[0-7])$
RewriteCond %{HTTP:Via} SEA-PRXY
RewriteRule .* - [F]

This snippet assumes that you already have other working mod_rewrite rules. If not, you'll need to "set up" mod_rewrite before using this code.

An alternative rule, if your host allows/supports reverse DNS lookups, might be:


RewriteCond %{HTTP_REFERER} ^http://search\.live\.com/result\.aspx\?q=
RewriteCond %{REMOTE_HOST} ^tide[0-9]+\.microsoft\.com
RewriteRule .* - [F]

This second version, based on the tide.microsoft.com hostnames, eliminates the possible need to maintain the IP address "list" for the REMOTE_ADDR check in the first code snippet.

In the second version, the RewriteCond examining the referrer isn't strictly necessary, but I included it to limit the number of reverse DNS lookups, which are by nature 'expensive' in terms of CPU cycles and waiting server threads. You can make it even more specific by including the search terms that you find in the requests to your server, as in:


RewriteCond %{HTTP_REFERER} ^http://search\.live\.com/result\.aspx\?q=(widgets¦wodgets¦whatever)

as long as you replace the broken pipe "¦" characters with solid pipe characters before use; Posting on this forum modifies the pipe characters.

These are just two example routines; Adjust as desired to suit your needs.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month