Forum Moderators: open

Message Too Old, No Replies

Googlebots being spoofed?

How can you tell?

         

pendanticist

1:23 pm on Apr 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



161.65-104.adsl.ij.net - - [08/Apr/2003:04:54:46 -0700] "GET /Internet_Stuff.html HTTP/1.0" 200 11864 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"


So, what then do you make of this one? The IP Number pings to 65.161.65.104 at ij.net.

Does this look like a spoofed googlebot?

I looked for that Googlebot IP Number post which was floating around here yesterday , but can't find it this morning.

Pendanticist.

edit_g

1:35 pm on Apr 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looks like it. Why worry? It certainly isn't google checking you out.

pendanticist

1:39 pm on Apr 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Suspected fraud always demands cause for pause...prudently speaking of course.

Pendanticist.

ga_ga

1:59 pm on Apr 8, 2003 (gmt 0)

10+ Year Member



> IP Number post which was floating around here yesterday

I think it was moved to the spider identification forum.

pendanticist

2:07 pm on Apr 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yepper ga_ga! That's the one [webmasterworld.com]. :)

<Didn't realize Google had soooo many of them.>

Thanks.

pendanticist

6:31 am on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ssoooo, I guess blocking by individual IP Number is the only way to ban?

Pendanticist.

vincevincevince

1:26 pm on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



now if everyone changed their mozilla useragent tags to GoogleBot then that sure would confused the **** out of you paranoid bunch!

jdMorgan

2:30 pm on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For those who have had a problem:

# Block all user-agents identifying as Googlebot/ from non-Google IP address ranges
RewriteCond %{REMOTE_ADDR} !^64\.68\.82\.
RewriteCond %{REMOTE_ADDR} !^216\.239\.46\.
RewriteCond %{HTTP_USER_AGENT} ^Googlebot/
RewriteRule .* - [F]

The above can also be done with SetEnvIf and deny from, but I'll need more coffee first, since mod_rewrite is my native language and I only took mod_access as a second language...

Warning: When Google adds a new IP address range, you must add it to the code above!

Jim

pendanticist

4:57 pm on Apr 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Jim! :)

now if everyone changed their mozilla useragent tags to GoogleBot then that sure would confused the **** out of you paranoid bunch!

Paranoia has nothing to do with it vincevincevince.

See, there are those webmasters (such as myself) who garner no income from their work, rather they do it for the 'altruistic glow' when they sees individuals (as well as major institutions) perusing & enjoying your work. Getting some value out of it...so to speak.

Then, you have those timid folks who, for whatever reason, feel they need to lie to folks like me by falsifying their UA. As in life, when anyone falsly represents themselves to me, I am concerned.

I am concerned because they've already proven themselves unworthy of using the material I so freely provide them by 'hiding'.

Hence, when those spoofed UAs come up, it is up to us (as responsible webmasters) to ensure those visitors don't harvest or otherwise steal our work.

Hell, I had a student (from an .edu) rape my content overnight taking as many as 13 files per second! Eating up the bandwidth which I pay.

Paraniod? Nah. Protective is a much better word.

Pendanticist.

vincevincevince

3:51 pm on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



interesting... maybe people spoof as googlebot so they can access cloaked content?

johannes

3:59 pm on Apr 10, 2003 (gmt 0)

10+ Year Member



vincevincevince that is a key.

There are some webservices, though I have forgotten the urls, that spoof the useragent. They are called like "see exactly what search engines see". I've tried these myself, so some of you can have googlebot in your logs that are me :)

But these services are useless anyway. No pro cloaker, cloaks just by the user-agent.

jdMorgan

4:16 pm on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code I posted above fixes the problem. They could spoof by user-agent and IP address, but then they would not be able to receive the server's response, and the situation is moot.

Each webmaster has the right to determine who can and cannot access his/her site, and for what purposes. Once you've been site-jacked or had your server brought to its knees by a DOS attack, maybe you'll understand. At any rate, name-calling and ad-hominem attacks serve no purpose here on WebmasterWorld.

Jim

ncw164x

9:00 pm on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This site ripping/copying has been getting worse over the last 15 months, I for one do not like adding a UA or IP number to the ban list but sometimes you have no choice if either 1000's of pages are being requested per hour from the same IP number or if your bandwidth quota has gone through the roof. I have a hobby site which does not make money but as Pendanticist says

"there are those webmasters (such as myself) who garner no income from their work, rather they do it for the 'altruistic glow' when they sees individuals (as well as major institutions) perusing & enjoying your work. Getting some value out of it"

Nice choice of words:

As for being paranoid, try tailing your log file for a couple of hours watching the same IP number taking 1,000's of pages and after you have stopped head butting the screen trying to stop the text shooting up the screen faster than you can read it, then you would add the UA or IP number to your ban list.

I don't think the word paranoid is the correct word to use, you are just protecting what you have spent months or years building.

jimbeetle

9:41 pm on Apr 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's another current thread [webmasterworld.com] on googlebot spoofing. Think I'm going to use JD's code and keep 'em out before they get to me.

And as for being paranoid this guy [webmasterworld.com] pulled down more than 50,000 pages from my site yesterday (only have about 1,500) and sucked up more than a gig of bandwidth. It's my dime he's playing with!

Jim