Hey Tedster, Googlebot is crawling 100's of bizarre search pages on my site every day. They are not Google Search referral pages. Where Googlebot got all these weird searches, I have no idea! I've tried to find the source, but no luck. I estimate that of my 28,000 indexed pages, 8,000 are for search.php. In my opinion, that's too many, but what's worse is that, based on crawl percentages, a huge number of those 8,000 are bogus pages. The search terms have no meaning on my site and, yes, they return 0 results. So I want 'em gone!
A small sample of some weird words:
<Q>$ zcat acc* | grep -i bullseye | grep -i googlebot | wc -l
<Q>$ zcat acc* | grep -i bullseye | grep -i -v googlebot | wc -l
<Q>$ zcat acc* | grep -i akadema | grep -i googlebot | wc -l
<Q>$ zcat acc* | grep -i akadema | grep -i -v googlebot | wc -l
So of those 13 "bullseye" that aren't Googlebot, here's one:
126.96.36.199 - - [11/Mar/2012:15:38:22 -0400] "GET /search.php?q=bullseye+crystal+clear+stained+glass HTTP/1.1" 200 18921 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; WOW64; SV1; .NET CLR 2.0.50727)"
That IP is:
inetnum: 188.8.131.52 - 184.108.40.206
status: ASSIGNED PA
remarks: ABUSE REPORTS:
source: RIPE # Filtered
role: Dedicated Server Contact Admin Role
address: Dedicated Server Contact
address: 2 Frater Gate Business Park
address: Aerodrome Road
address: PO13 0GW
address: UNITED KINGDOM
Not sure if that's good or bad, but that's what it is. I think it's bad, though. For March, they've crawled my site hitting 11,549 pages so far.
<Q>$ zcat acc* | grep -i 220.127.116.11 | wc -l
Scanning the 5,398 search.php of those, oddly, many look ok. But many look weird!
18.104.22.168 - - [11/Mar/2012:15:27:00 -0400] "GET /search.php?q=http%3A%2F%2Fqymdvpbat
yml.com%2F&lp=cTYZUNMGJkrVSZak&hp=nKczBHUCBkuikNj HTTP/1.1" 200 16436 "http://www.mysite.com/search.php?q=cookie+sunglasse" "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)"
What the? Why is the referral that weird search? And what is that q= value? Another weird one:
22.214.171.124 - - [11/Mar/2012:15:37:27 -0400] "GET /search.php?q=dichroic+primary+color
+starter+pack+clear HTTP/1.1" 200 22943 "-" "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0
So there's a whole lot that aren't from Googlebot. But my daily gathering of weird searches is specifically from Googlebot, at least going by the REFERRER string.