homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
googlebot drink the brown acid?
Gobbledygook "GETs" from google
dupres01




msg:4477595
 6:49 pm on Jul 20, 2012 (gmt 0)

I am getting some strange requests from Google. For example:

66.249.68.97 - - [20/Jul/2012:12:16:45 -0600] "GET /eqbdyrmhxsgc.html HTTP/1.1" 404 298 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I have gotten about 2 dozen of these this morning. Each has a different googleykook set of letters following the GET. What is going on?

 

dstiles




msg:4477602
 7:35 pm on Jul 20, 2012 (gmt 0)

Yep! A known G intrusion attack. Got the same on my server today, worst I've seen so far.

We THINK that G thinks it should see how you site works when presented with garbage. If G got on with sorting out their own house instead of trying it on in ours they may begin to get a better reputation. Although I doubt it.

Since the attacks come from real bot IPs with real bot UAs etc there isn't much can be done about it other than block using 404 or, in my case, a 403 or 405 because an html extension is an illegal access attempt on the site.

I've been contemplating, this past couple of hours, removing G's access entirely from at least one of my sites. It's no business of theirs and their actions and instrusions are getting worse.

wilderness




msg:4477604
 7:42 pm on Jul 20, 2012 (gmt 0)

Just a thought?

Why not generate something using the same method used to eliminate the "old random UA's", however based upon "random request"?

keyplyr




msg:4477642
 10:05 pm on Jul 20, 2012 (gmt 0)


I'd leave it alone

lucy24




msg:4477643
 10:08 pm on Jul 20, 2012 (gmt 0)

Hey, I met one of those not long ago. I'd recently changed a batch of individual 404s or 410s to page-specific redirects. So it made sense for Google to want to confirm that I hadn't stepped into Soft 404 territory by redirecting everything.

I realize that this makes it sound like blaming the victim. But to me it did seem like a reasonable action. I might have felt differently if it threw out lots and lots of them. It only takes one to get the message.

The requested URL is unambiguous gobbledygook, not at all like the garbage they routinely come up with when following sloppy links.

dstiles




msg:4478265
 8:24 pm on Jul 23, 2012 (gmt 0)

For the record: just accidentally discovered a new google IP range - 172.218.0.0/16 registered April this year. No known hits from it as yet. I've blocked it, mainly because I do not trust G and have all their other non-bot IPs blocked.

brotherhood of LAN




msg:4478266
 8:28 pm on Jul 23, 2012 (gmt 0)

How does WMT authentication work, is it still a file like /eqbdyrmhxsgc.htm?

Was just thinking that a site's competitors would look for lax .htaccess rules and try and get a HTTP 200 to get access to some analytics.

lucy24




msg:4478305
 11:58 pm on Jul 23, 2012 (gmt 0)

For the record: just accidentally discovered a new google IP range - 172.218.0.0/16 registered April this year.

172 ? ! ? ! Aren't those supposed to be private registrations? Has the range been reclassified?

:: routine detour to raw logs ::

172.218.23.126 - - [26/Jun/2012:17:53:02 -0700] "GET /{administrative gif from a different site} HTTP/1.1" 200 318 "{the other site}" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1"

172.218.201.152 - - [02/Jul/2012:20:12:54 -0700] "GET /{same file} HTTP/1.1" 200 310 "{same site}" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2"

172.218? Are you sure? I'm getting Telus (whole /15 range), which is much more probable for this file and this site.




Edit:

I am getting some strange requests from Google. For example:

I got a bit behind on logs. In the course of about half an hour on the 20th-- around the same time as the OP, looks like-- I got eleven of these in a row. One I can understand: routine checking of suspected Soft 404s. Eleven and I think the thread title is right. Googlebot's been drinking something.

wilderness




msg:4478315
 12:36 am on Jul 24, 2012 (gmt 0)

172.218? Are you sure? I'm getting Telus


I agree.

How'd ya get Google?

wilderness




msg:4478316
 12:39 am on Jul 24, 2012 (gmt 0)

How does WMT authentication work


There's somebody hanging about here that has utilized those for a long while.
I seem to recall looking at it some years ago, and decided not to jump those hoops.

Google was no help on "WMT authentication".

brotherhood of LAN




msg:4478323
 1:25 am on Jul 24, 2012 (gmt 0)

Just to clarify, I meant google webmaster tools just in case there was any confusion there. I somehow came to the acronym WMT..... I remember first (and maybe last) time I used it, you simply had to upload an empty file to the root of your site, and in some cases, you would be able to peak at other peoples data.

lucy24




msg:4478356
 7:16 am on Jul 24, 2012 (gmt 0)

Yup, it's a file with a nonsense name-- except that unlike the ones g### has been looking for recently, the authentication file's name is mixed alphanumerics. If you have file-upload access to someone else's site I guess you could easily claim to be the site's owner for gwt purposes. Don't know how else you'd do it.

Now, if googlebot were a human in a certain age bracket, it would turn out that it had simply forgotten the name of that authentication file and was wildly guessing in hopes of hitting the right one. But, even at the minimum range of ten lower-case letters, the odds of guessing right don't look good. (Calculator says it's a 15-digit number. I'll take its word for it.)

dstiles




msg:4478513
 8:18 pm on Jul 24, 2012 (gmt 0)

Lucy - thanks for the correction.

It is, in fact, 172.217/16

Telus was the one I was actually registering in my system, G was the accidental finding when I checked to see how big telus was. I always try to check adjacent IP ranges when registering a naughty one.

Don't know about WMT authentication but to verify ownership of a domain (aka web URL) you have two options: one is a file in the root, the other is to paste the verification code into each page header - the one I preferred until I gave up playing G's games. Same methods for bing and (once upon a time) yahoo. The codes are much longer and more alpha-numeric than G's current attacks on web sites.

g1smd




msg:4478538
 9:08 pm on Jul 24, 2012 (gmt 0)

the other is to paste the verification code into each page header

AFAIR you need it only on the root page of the site.

In any case I use only the googlennnnnnnnnnnnnnn.html file.

dstiles




msg:4478874
 8:37 pm on Jul 25, 2012 (gmt 0)

You could be right about only 1st page - I'm just cautious. :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved