Forum Moderators: open

Message Too Old, No Replies

Visitor pretending to be Google bot?

Who is gandalf.volutionmedia.com?

         

grandma genie

5:07 am on Oct 1, 2010 (gmt 0)

10+ Year Member



Hi Guys,
Found this in my server logs today. The IP is from gandalf.volutionmedia.com. They were looking for files I do not have and the user agent said it was googlebot. Is this someone pretending to be googlebot? I was thinking of blocking them in htaccess. -- Grandma genie

209.235.192.nn - - [30/Sep/2010:17:02:40 -0400] "GET / HTTP/1.1" 200 31375 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
209.235.192.nn - - [30/Sep/2010:17:02:41 -0400] "GET /old/ HTTP/1.1" 404 8747 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
209.235.192.nn - - [30/Sep/2010:17:02:41 -0400] "GET /forum/ HTTP/1.1" 404 8747 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
209.235.192.nn - - [30/Sep/2010:17:02:41 -0400] "GET /forums/ HTTP/1.1" 404 8747 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
209.235.192.nn - - [30/Sep/2010:17:02:41 -0400] "GET /vb/ HTTP/1.1" 404 8747 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
209.235.192.nn - - [30/Sep/2010:17:02:41 -0400] "GET /vbulletin/ HTTP/1.1" 404 8747 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Pfui

8:48 am on Oct 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1.) Their server is compromised and probing for exploits. Last week, people posted about fake Googlebots seeking the same set of files. [webmasterworld.com...]

2.) Googlebot not coming from .googlebot.com is block-worthy. Period. Block that way and you don't have to bother blocking individual Hosts/IPs.

topr8

9:04 am on Oct 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



you should block all fake googlebots and all other fake search engine bots,
we use a routine that does a reverse dns check on the ip address, all the major search engines support this.

[googlewebmastercentral.blogspot.com...]

grandma genie

4:25 pm on Oct 1, 2010 (gmt 0)

10+ Year Member



Pfui - How do you block a googlebot that is not coming from .googlebot.com? There is nothing in the user_agent string I can use to block via htaccess and blocking the IP seems hopeless. Can my website host implement something that will block this type of exploit?

Pfui

5:34 pm on Oct 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I use htaccess and mod_rewrite. And I agree with topr8, you should block all fake majors, not just fake Googlebots. So-o-o, if there isn't one there already, how about starting a 'How to block fake Googlebots?' kind of thread in the Apache forum? TIA:)

grandma genie

6:10 pm on Oct 1, 2010 (gmt 0)

10+ Year Member



Hi Pfui - OK, I just posted in the Apache forum. Thank you.

dstiles

10:33 pm on Oct 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've been seeing loads of googlebot hits from non-G IPs all week. Looks like a botnet attack. Most hits are unique, get a 403 and never come back. Not really sure of the purpose of it all. :(

grandma genie

11:51 pm on Oct 1, 2010 (gmt 0)

10+ Year Member



Well, I'm lost. jd says to block the user agent, but the only user agent I can see is this: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)". Can't block that. The only other thing I can decipher is that the IP belongs to gandalf.volutionmedia.com. But again, I'm guessing whoever is using their server isn't them. How do you trace the origin of a visitor to your site if everything is spoofed or faked or hijacked?

wilderness

12:23 am on Oct 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



# deny when UA contains Googlebot EXCEPT from IP range.
RewriteCond %{REMOTE_ADDR} !^66\.249\.
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteRule .* - [F]

wilderness

2:17 am on Oct 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



gg,
As a general note, when participants here refer to the user agent (UA), they may not be referring to the entire/complete UA left by the visitors browser.

In various modules of Apache and as related to htaccess you need to GRASP three simple understandings?

1) a line (or phrase) may be designated with begins with and that character is ^.

2) a line (or phrase) may be designated with ends withand that character is $.

3) a line (or phrase) may be designated with contains and that is absent any trailing or leading character designations.

Two other items (and more) that will use regularly are except and that is a leading character defined as !.

You will also use the [NC] and [OR} end of line designations constantly, or you may even use them together [NC,OR].

Hope this helps.

Don

grandma genie

2:22 am on Oct 2, 2010 (gmt 0)

10+ Year Member



Hi Wilderness,
Thank you for your help. Jim gave me some ideas, too. How do you link in the forums to show what happened in another forum. Also, I need to learn how to read and write this type of coding. I think it is fascinating. - Jeannie

wilderness

2:40 am on Oct 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Saw Jim's reply in the apache forum, as I'm sure others here did as well.

grandma genie

4:09 am on Oct 2, 2010 (gmt 0)

10+ Year Member



Hi Don, That is very helpful. My next move is to make a list of all the little codes and put it on one piece of paper, to help me read the code and write it, as well. Cool!

My memory is not what it used to be. Gotta write it all down, now.

enigma1

9:57 pm on Oct 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



# deny when UA contains Googlebot EXCEPT from IP range.

That isn't reliable, best to use the rdns on the ip and the opposite and check for a match.
[google.com...]

here's an example from my logs was trying to come in with the googlebot UA.
66.249.16.211 - - "GET / HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

grandma genie

1:46 am on Oct 3, 2010 (gmt 0)

10+ Year Member



I hope I'm not off topic on this, but see the server log entries below. This visitor is from the University of San Diego. Is this someone just using the Googlebot as a user agent? They came to the site twice with the same three entries, just at a different time. Is this a blockable offense? Or is this just some college kid working on an assignment?

137.110.222.nnn - - [02/Oct/2010:04:16:51 -0400] "GET / HTTP/1.1" 200 31375 "http://www.google.com/search?hl=en&source=hp&btnG=Google+Search&q=bunny" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100403 Firefox/3.6.3"
137.110.222.nnn - - [02/Oct/2010:04:16:52 -0400] "GET / HTTP/1.1" 200 31375 "http://www.google.com/search?hl=en&source=hp&btnG=Google+Search&q=bunny" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
137.110.222.nnn - - [02/Oct/2010:04:16:52 -0400] "GET / HTTP/1.1" 200 31375 "http://www.google.com/search?hl=en&source=hp&btnG=Google+Search&q=bunny" "Mozilla/5.0 (compatible; Yahoo! Slurp; []help.yahoo.com/help/us/ysearch/slurp)"

grandma genie

1:48 am on Oct 3, 2010 (gmt 0)

10+ Year Member



By the way, enigma, I was shocked to see the IP you sent was not Google. I would have just assumed from the 66.249. that it was. So I am checking more thoroughly now. Thank you.