Forum Moderators: DixonJones
It has used 3 different Google IPs so far: 66.249.84.** then 74.125.16.** then 72.14.195.**
Is it a bot using Google, or legitimate activity?
It looks automated to me.
Should I block the IPs?
Here's the full referrer:
Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
Our site does not use Adsense or Adwords.
Those look like Google IPs, and if you visit the URL in the UA [adwords.google.com] it seems pretty clear what the activity is - people generating keywords based on the content of your URL:
So, be flattered, or block the bot if you don't like it ;)
I realised it was use of the genuine G site and tool, but what concerned me was by "people" or a bot? The rate of 10 calls a second, and the systematic hitting of our pages looks suspicious.
G does try to block bots from using it's search, and presumably its tools as well.
I'm not convinced this is human activity.
If it continues I'll definitely bock it.
Btw. We don't even use G analytics on our site.
Would be easy enough to test by running the tool against a site that you have access to server logs for. I don't have time to try it myself, I'm afraid.
I don't have time to try it myself, I'm afraid.
OK, I'm contradicting myself, but I got curious ;)
Here's the result of using the tool and choosing the 'other pages' option:
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /example/ HTTP/1.1" 200 2896 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/ HTTP/1.1" 200 2967 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example HTTP/1.1" 200 2508 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example1 HTTP/1.1" 200 2827 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example2 HTTP/1.1" 200 3054 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example3 HTTP/1.1" 200 3017 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example4 HTTP/1.1" 200 2783 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category2/example HTTP/1.1" 200 2995 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET /category1/example5 HTTP/1.1" 200 3000 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
66.249.84.11 - - [06/Feb/2008:09:03:19 +0000] "GET / HTTP/1.1" 200 3882 "-" "Mozilla/5.0 (compatible; Google Keyword Tool; +http://adwords.google.com/select/KeywordToolExternal)"
So, Google does request a large volume of pages a second via this tool. I hope there is some limit in order to prevent abuse. Interesting choice of pages to spider, too.
Incidentally, if you're inclined to, I would block this via robots exclusion (if it works for this bot) or a user-agent ban as opposed to by IP.
[edited by: Receptional_Andy at 9:11 am (utc) on Feb. 6, 2008]
So it could be a real person, but since I see the same sort of query everyday, I'm still suspicious it is a spam bot abusing their tool, that G have yet to close down.
If it persists, I'll follow your advice and use robots.txt to block it.
Not being an Adsense or Adwords user, I'm not familiar with what motivates people to use a tool like this anyway.
Why would a stranger look up keywords on our site using it, since they cannot place Adsense on our pages?
It might be hitting only selected pages on your site because it's looking to fill out the database only for certain terms.
It is a NFP site, not commercial.
This domain has no affiliate programme, but links to one that has some Amzn books.
We do have several keywords and phrases appearing on page one of the serps.
I'll keep an eye on it, and block it if it continues because we derive no benefit from it, but it appears "our competitors" may.
2008-03-17 13:20:30 W3SVC1683862455 servername 192.168.0.160 GET /page-name.asp - 80 - 72.14.193.134 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1) - [google.com...] www.mydomain.com 301 0 0 440 352 125
I see one of these every day. Also the /page-name.asp does not exist any more it 301 redirects to /page-name.htm The next log entry is the same as the first but the asp is changed to .htm.
So - job done - you stop "external" snoops, but allow Google itself.
Dixon.
Two hits were on contact form pages which are disallowed in robots.txt.
A total of ten pages in seven seconds but no robots.txt access (could have been read earlier but if so it should have been obeyed).
There are no adverts of any kind on the web site.
Tool now blocked by 403.