Forum Moderators: DixonJones

Message Too Old, No Replies

Is this legitimate Google.

or another nasty bot in disguise?

         

Angonasec

12:31 pm on Feb 4, 2008 (gmt 0)



Spotted a new type of suspicious activity in my access logs this week.
56 hits on 20 different pages in 2 days, at a rate of ten per second, claiming to be compatible; Google Keyword tool.

It has used 3 different Google IPs so far: 66.249.84.** then 74.125.16.** then 72.14.195.**

Is it a bot using Google, or legitimate activity?
It looks automated to me.

Should I block the IPs?

Here's the full referrer:
Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"

Our site does not use Adsense or Adwords.

Receptional Andy

12:35 pm on Feb 5, 2008 (gmt 0)



Hi Angonasec,

Those look like Google IPs, and if you visit the URL in the UA [adwords.google.com] it seems pretty clear what the activity is - people generating keywords based on the content of your URL:

So, be flattered, or block the bot if you don't like it ;)

Angonasec

1:59 am on Feb 6, 2008 (gmt 0)



Thanks Andy,

I realised it was use of the genuine G site and tool, but what concerned me was by "people" or a bot? The rate of 10 calls a second, and the systematic hitting of our pages looks suspicious.

G does try to block bots from using it's search, and presumably its tools as well.

I'm not convinced this is human activity.
If it continues I'll definitely bock it.
Btw. We don't even use G analytics on our site.

Receptional Andy

8:58 am on Feb 6, 2008 (gmt 0)



I'd assumed the large number of hits was due to the option 'Include other pages on my site linked from this URL'.

Would be easy enough to test by running the tool against a site that you have access to server logs for. I don't have time to try it myself, I'm afraid.

Receptional Andy

9:09 am on Feb 6, 2008 (gmt 0)



I don't have time to try it myself, I'm afraid.

OK, I'm contradicting myself, but I got curious ;)

Here's the result of using the tool and choosing the 'other pages' option:

66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /example/ HTTP/1.1" 200 2896 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/ HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example HTTP/1.1" 200 2508 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example1 HTTP/1.1" 200 2827 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example2 HTTP/1.1" 200 3054 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example3 HTTP/1.1" 200 3017 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example4 HTTP/1.1" 200 2783 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category2/example HTTP/1.1" 200 2995 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example5 HTTP/1.1" 200 3000 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET / HTTP/1.1" 200 3882 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"

So, Google does request a large volume of pages a second via this tool. I hope there is some limit in order to prevent abuse. Interesting choice of pages to spider, too.

Incidentally, if you're inclined to, I would block this via robots exclusion (if it works for this bot) or a user-agent ban as opposed to by IP.

[edited by: Receptional_Andy at 9:11 am (utc) on Feb. 6, 2008]

Angonasec

1:39 am on Feb 9, 2008 (gmt 0)



Thanks Andy, that confirms what I'm seeing in my logs.

So it could be a real person, but since I see the same sort of query everyday, I'm still suspicious it is a spam bot abusing their tool, that G have yet to close down.

If it persists, I'll follow your advice and use robots.txt to block it.

Not being an Adsense or Adwords user, I'm not familiar with what motivates people to use a tool like this anyway.

Why would a stranger look up keywords on our site using it, since they cannot place Adsense on our pages?

smallcompany

4:24 am on Feb 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One could take a look into your site via keyword tool if:

- you run affiliate program
- they do what you do (you may appear on page #1 for keywords or products they are after)
- checking phrases that Google sees on your pages (kind of connected to the upper case)

cgrantski

1:44 pm on Feb 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is just Google looking at the words used on your site in order to add to its database of "what words tend to occur together on web pages" which is part of the service it offers to AdWords users --- AdWords users enter a keyword in Google's KW suggestion tool and Google offers suggestions for additional words, based on what Google sees out there on sites like yours. Doesn't matter if you are using AdWords or AdSense or anything else. It is kind of overdoing it on the hitting, but I don't think it's intending any harm.

It might be hitting only selected pages on your site because it's looking to fill out the database only for certain terms.

Angonasec

2:51 am on Feb 10, 2008 (gmt 0)



Thanks for the clarification.

It is a NFP site, not commercial.

This domain has no affiliate programme, but links to one that has some Amzn books.

We do have several keywords and phrases appearing on page one of the serps.

I'll keep an eye on it, and block it if it continues because we derive no benefit from it, but it appears "our competitors" may.

ogletree

11:04 pm on Mar 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had a few visits from 66.249.84.xx. It is very strange the referrer was from a site: search on google. I did a report from all traffic from that IP range. I also saw it came from a referrer of a site that links to me. The traffic came from 2 computers. One was vista with ie7 and one was xp ie6.

ogletree

7:42 pm on Mar 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm still seeing traffic like this.


2008-03-17 13:20:30 W3SVC1683862455 servername 192.168.0.160 GET /page-name.asp - 80 - 72.14.193.134 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) - [google.com...] www.mydomain.com 301 0 0 440 352 125

I see one of these every day. Also the /page-name.asp does not exist any more it 301 redirects to /page-name.htm The next log entry is the same as the first but the asp is changed to .htm.

Umbra

12:57 pm on Jul 4, 2008 (gmt 0)

10+ Year Member



I recently installed Google Web Accelerator and it looks like 66.249.84.** is one of the new IPs.

The old GWA IP ranges seem to be defunct since approx March 2008

Receptional

2:00 pm on Jul 4, 2008 (gmt 0)



Well - I think there is much to be said for blocking it, but NOT by IP. Clearly (as Andy showed) the tool is being used to analyse your site - and that tool is being used by a person, not by Google (otherwise Andy wouldn't have duplicated the logfile pattern). So - you should look to do something that throttles based on speed on access, as long as you can clearly differentiate between the tool and Googlebot. That's really easy enough, becaus Google defines it as "http://adwords.google.com/select/KeywordToolExternal".

So - job done - you stop "external" snoops, but allow Google itself.

Dixon.

dstiles

6:26 pm on Sep 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Further to this thread, I got a few of hits from this tool on the 74.125.16.* range today to one site.

Two hits were on contact form pages which are disallowed in robots.txt.

A total of ten pages in seven seconds but no robots.txt access (could have been read earlier but if so it should have been obeyed).

There are no adverts of any kind on the web site.

Tool now blocked by 403.