sftriman - 7:30 am on Mar 28, 2012 (gmt 0)
Hey Tedster, Googlebot is crawling 100's of bizarre search pages on my site every day. They are not Google Search referral pages. Where Googlebot got all these weird searches, I have no idea! I've tried to find the source, but no luck. I estimate that of my 28,000 indexed pages, 8,000 are for search.php. In my opinion, that's too many, but what's worse is that, based on crawl percentages, a huge number of those 8,000 are bogus pages. The search terms have no meaning on my site and, yes, they return 0 results. So I want 'em gone!
A small sample of some weird words:
<Q>$ zcat acc* | grep -i bullseye | grep -i googlebot | wc -l
<Q>$ zcat acc* | grep -i bullseye | grep -i -v googlebot | wc -l
<Q>$ zcat acc* | grep -i akadema | grep -i googlebot | wc -l
<Q>$ zcat acc* | grep -i akadema | grep -i -v googlebot | wc -l
So of those 13 "bullseye" that aren't Googlebot, here's one:
22.214.171.124 - - [11/Mar/2012:15:38:22 -0400] "GET /search.php?q=bullseye+crystal+clear+stained+glass HTTP/1.1" 200 18921 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; WOW64; SV1; .NET CLR 2.0.50727)"
That IP is:
inetnum: 126.96.36.199 - 188.8.131.52
status: ASSIGNED PA
remarks: ABUSE REPORTS:
source: RIPE # Filtered
role: Dedicated Server Contact Admin Role
address: Dedicated Server Contact
address: 2 Frater Gate Business Park
address: Aerodrome Road
address: PO13 0GW
address: UNITED KINGDOM
Not sure if that's good or bad, but that's what it is. I think it's bad, though. For March, they've crawled my site hitting 11,549 pages so far.
<Q>$ zcat acc* | grep -i 184.108.40.206 | wc -l
Scanning the 5,398 search.php of those, oddly, many look ok. But many look weird!
220.127.116.11 - - [11/Mar/2012:15:27:00 -0400] "GET /search.php?q=http%3A%2F%2Fqymdvpbat
yml.com%2F&lp=cTYZUNMGJkrVSZak&hp=nKczBHUCBkuikNj HTTP/1.1" 200 16436 "http://www.mysite.com/search.php?q=cookie+sunglasse" "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)"
What the? Why is the referral that weird search? And what is that q= value? Another weird one:
18.104.22.168 - - [11/Mar/2012:15:37:27 -0400] "GET /search.php?q=dichroic+primary+color
+starter+pack+clear HTTP/1.1" 200 22943 "-" "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0
So there's a whole lot that aren't from Googlebot. But my daily gathering of weird searches is specifically from Googlebot, at least going by the REFERRER string.