Forum Moderators: open

Message Too Old, No Replies

fake Google

         

wilderness

11:17 pm on Oct 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



grabbed about forty pages from two websites.

75.13.176.zz - - [10/Oct/2008:14:19:23 -0500] "GET / HTTP/1.1" 200 6096 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]

Just a heads up, as many have conditions in place which restrict google access based upon current Google IP ranges [iplists.com]

incrediBILL

12:28 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Maybe I'm missing something, but if this IP is fake why would it matter if we block on IP ranges?

Probably just a proxy Google tried to crawl thru.

wilderness

12:54 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bill,
The forum policy on obscuring this very small Class D range results in the ability to provide precise insight.

Seven numbers in the Class D which is registered to an accounting firm. (Their website says nothing on anything except accounting services).

Why on earth an accounting firm is crawling/harvesting a widget website is beyond my comprehension.
Perhaps their server offers an open proxy for another to utilize?

Don

dstiles

12:57 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've seen seven access attempts during the past 10 days using a googlebot or, in one case yahoo mobile, UA from broadband IPs. About half were USA. Previously it was only an occasional hit every few weeks.

First couple I thought may have been an attempt to pirate our content to google, although using broadband seems a bit odd. Now I'm wondering if someone has recommended looking at sites with a googlebot UA as a means of finding out how competitors look to google. Or something. Maybe it's just a new botnet trick.

wilderness

12:57 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Maybe I'm missing something, but if this IP is fake why would it matter if we block on IP ranges?

Perhaps I should expand on something I mistakenly assumed most would understand ;)

Only allowing access from Google or any other search engine based on specific IP ranges, omits the use of fake name use of the search engines UA.

jdMorgan

1:01 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had a fake googlebot from a cable broadband provider's customer the other day. In addition to the non-rDNS IP address, the request headers were all hosed up, too.

Jim

incrediBILL

1:35 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Only allowing access from Google or any other search engine based on specific IP ranges, omits the use of fake name use of the search engines UA.

OK, that makes sense now, got a cold today, a little slow...

Actually I use IP ranges plus full trip DNS cached 24 hours for speed, so I think the bases are covered ;)

incrediBILL

1:40 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I got curious and checked how many times fake googlebots or googlebot crawling thru a proxy has hit my site and my year-to-date total is 1054 bogus googlebots.

This is down significantly from 2700 fake googlebot attempts in 2007!

[edited by: incrediBILL at 1:40 am (utc) on Oct. 11, 2008]

GaryK

3:56 am on Oct 11, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Bill, what's the advantage in using IP address ranges AND full trip DNS? Right now I just use full trip DNS. Thanks. Oh, I see on average three fake googlebots daily on my browser project website. That's why I prefer to use inclusion-based rules instead of frantically trying to keep up with excluding new stuff when it shows up. :)