Strange google bot requests? - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Strange google bot requests?

I have noticed request to strange filenames that do not exist on my site

vdoyl

10:36 pm on Jul 1, 2005 (gmt 0)

10+ Year Member

Hi, everybody!

Recently I found strange records in my access log:

66.249.66.240 - GoogleBot IP address.
===============
/jfveqcdgnkbr.html
Http Code: 404 Date: Jul 01 16:32:49 Http Version: HTTP/1.1 Size in Bytes: 1219
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

/cyvhpthdm.html
Http Code: 404 Date: Jul 01 16:32:49 Http Version: HTTP/1.1 Size in Bytes: 1218
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

/vihhrxmph.html
Http Code: 404 Date: Jul 01 16:32:50 Http Version: HTTP/1.1 Size in Bytes: 1218
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
===============

I got about 10 hits to such files today.

Has anyone seen anything like this?

As you can see the file names are very random and virtually nonsense. I do not have such files on my site nor I have such or similar links.

So I was wandering if anyone here has experienced or noticed such strange occurrences?

I am at lost what these can be. They showed up today.

Thank you in advance!

Clint

11:04 am on Jul 2, 2005 (gmt 0)

Yep, I have the exact same thing. Not the exact same string of characters, but the exact same layout of seemingly random characters.

I also have the G bot trying to access things like domain.com/jpg!

larryhatch

11:09 am on Jul 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

A wild guess: Is this a way to check .htaccess type redirects,
or maybe that and/or other forms of cloaking? -Larry

Reid

12:01 pm on Jul 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

sounds like google is turning over some rocks to see whats under them.

theBear

7:14 pm on Jul 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

There is another possibility Reid, think for a minute, it may become clear.

Pick a site most any site will do, make up a file name say x1x1x1y.stuff place the following [domain...] into a header checker.

Try a few sites and see what return codes you get, you might be surprised and the end result if followed by a s/e bot might be a nice little url in the index that eventually given enough of these nice little urls for different names amount to S/E spam.

Remember Rule 5 is in full force, YMMV, and Rule 6 has been invoked.

MikeNoLastName

7:44 pm on Jul 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Yep I noticed them on June 30 too. They started out with about 4 to 6 letters then got out to 7-14 letters or more.

at around 0654 PST on June 30 we got one that looked like this from 66.249.65.130:
/&&DI=293&IG=9172802615174b3
fb95ad7b2a73b097f&POS=6&CM=WPU&CE=3&CS=AWP&SR=3

Then very little at all until 1400 PST when we started getting a bunch of gobbledygook ones from 66.249.65.8

Also noticed the first ones were to www.domain.com while the last ones were to domain.com/ which were redirected and followed immediately to another request to www.domain.com.

Testing some 301 algorithm?

phish

4:36 am on Jul 3, 2005 (gmt 0)

10+ Year Member

yep...i got em also. I checked a few sites on my server, and it seems it's not specific to any one domain. Actually about 8 out of 10 sites recieved about 5 or 6 each of those vfdfvqwekvbo.htm hits,obviously resulting in 404's. Wonder what the Big "G" is up to with these.

jomaxx

5:46 am on Jul 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I was skeptical of this but I checked and found a bunch 2 days ago. I imaigne they're forcing 404 errors to see how differerent sites handle them.

nickied

11:45 am on Jul 3, 2005 (gmt 0)

10+ Year Member

I'm not getting the random letters. The last few odd entries in the logs look like offset=-265, offset=-525, offset=-31, etc. Where offset is used as a paging function, i.e. offset=10 is page 2 of a page of 10 widget entries. Checked the whole site, checked for odd backlinks and couldn't find any errors on the site itself. Got a non-answer from support. Time for a 302 redirect to catch anything with ending in -99?

tantalus

7:13 pm on Jul 3, 2005 (gmt 0)

10+ Year Member

I thought it might helpful to link these threads [webmasterworld.com]

g1smd

7:32 pm on Jul 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Maybe not related, but all the URLs we excluded with robots.txt back in March (by submitting the robots.txt file URL to the Google URL console) are today all back in a site: search and all of them are shown as URL-only entries. [webmasterworld.com...]

activeco

7:59 pm on Jul 3, 2005 (gmt 0)

10+ Year Member

Most of my sites have 404 redirecting to the home page.
Could it be that competition does linking in that nonsense way to make Google penalize the site due to duplicate content?

suidas

9:10 pm on Jul 3, 2005 (gmt 0)

10+ Year Member

Simple answer: Google is trying to figure out what a missing page looks like.

See the academic paper "Sic Transit Gloria Telae: Toward an Understanding of the Web's Decay" on the rate of page death. It contains a section on how to measure page death, going not by 404s�some domains "soft catch" 404s�but by figuring out what the page looks like when it REALLY shouldn't exist. So, you ask for a page composed of random letters and see what the server returns.

The academic paper did 25 random letters. Google is clearly varying the number, so people can't catch their efforts by mere checking if the length is 25.

Angonasec

1:37 am on Jul 4, 2005 (gmt 0)

I noticed these in our logs two weeks ago.

Our site is clean.

It's G looking at your 404.

Q/
Most of my sites have 404 redirecting to the home page.
Could it be that competition does linking in that nonsense way to make Google penalize the site due to duplicate content?
/Q

No, you are your own enemy.
It's a beginner's mistake to redirect 404s to your home page. (I did it too last year.)

STOP it immediately, and use a plain 404 page.

Every lost page on your site is seen by G et al as a duplicate of your homepage.

It's your own fault.

activeco

9:52 am on Jul 4, 2005 (gmt 0)

10+ Year Member

Every lost page on your site is seen by G et al as a duplicate of your homepage.

Maybe that's why they are checking 404?
To correct the problem.

Angonasec

1:58 pm on Jul 4, 2005 (gmt 0)

It's not their problem it's yours.

You have full control, and all the info you now need to fix it.

activeco

3:40 pm on Jul 4, 2005 (gmt 0)

10+ Year Member

You have full control, and all the info you now need to fix it.

I don't think I have to "fix" it.
First of all I don't depend on Google traffic.
Further, most domains are previously expired ones with tens, hundreds or even thousands of different directories and I have no problem redirecting them to the new home pages.
I don't need 404 producing dynamic scraper pages that would clearly invite potential legitimate penalties, or redirecting to some other codes which would require additional click.
If I miss something here, I would appreciate a clarification.

It is obvious that Google has been having a lot of problems with different redirects (302, www/non-www, meta refresh,...) for a long time, so I guessed they finally decided to do something about it.
To correct, uhm... still THEIR problem, at least in my case.

g1smd

6:57 pm on Jul 4, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I would use a separate 404 errorpage, and include some site navigation on it. I wouldn't redirect to the actual index page. I also wouldn't redirect to a page with exactly the same content as the main index page either.

Reid

6:13 am on Jul 5, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I think it's a bad idea to redirect 404's.
First of all how is the person receiving the 404 supposed to know that it's a 404?
i get errors from time to time which are ammatures linking to me. it takes them a few tries to get it right thus leaving 404's in my log. Usually they are simple spelling mistakes or minor syntax errors. (htm instead of html ect).
Imagine if i had 404's redirecting to my homepage - I would have all these garbage links pointing at me from idiots who can't spell a filename and the idiots wouldn't even know there was a problem.
A 404 is an error code if you hide the error then you are going to have other problems period.
My 404 page has a simple script that will convert all uppercase characters to lowercase in the URL (unix server) and if that comes back to the 404 again then they have a simple 404 page with a blurb about using the refresh button or contacting the webmaster along with a static link to my homepage.
You requested a page that does not exist - deal with it or if it's my fault then contact me - that is what 404 means.

Chico_Loco

6:16 am on Jul 5, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Why not only give 404's if it's the Googlebot... but let regular visitors get redirected to the homepage. Of course the 404 response code might delist those pages at some point.

activeco

7:34 am on Jul 5, 2005 (gmt 0)

10+ Year Member

Raid:

Imagine if i had 404's redirecting to my homepage - I would have all these garbage links pointing at me from idiots who can't spell a filename and the idiots wouldn't even know there was a problem.

Sure, still many of them would not correct the links, so you will still have the garbage links, no matter what you do with 404.
But yes, that's a valid point.

g1smd:

I would use a separate 404 errorpage, and include some site navigation on it.

That's the proper way to do it, but I am afraid of losing visitors due to the additional click.
Besides, in majority of cases navigation would point to the home page.
Does anyone have some stats about percentage of visitors continuing browsing a site (from 404 pages with navigation links)?

Angonasec

9:36 am on Jul 5, 2005 (gmt 0)

Activeco wrote: "I don't think I have to "fix" it.
If I miss something here, I would appreciate a clarification."

You do need to fix it. You've certainly missed something.
Isn't it obvious? Sit back, read this thread again, and think.

And remember, it's not just Google, but ALL SE's your 404 redirect is effectively 'spamming'.

Not a wise thing to do.

Fix it or suffer.

activeco

9:40 am on Jul 5, 2005 (gmt 0)

10+ Year Member

...and think

That's a hard part.

Anyway, does anyone have more theories about why Google does it?

suidas

4:25 pm on Jul 5, 2005 (gmt 0)

10+ Year Member

I really don't think there's any mystery: this is how Google figures out what a site's "dead page" looks like. The only mystery is how Google uses this information, for example, whether 404's that redirect to a homepage are perceived as spamming.

g1smd

10:41 pm on Jul 5, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

<rhetorical question>If your "dead page" looks like your index page, does that mean that your index page is also dead, regardless that the status is "200" instead of "404"?

theBear

10:49 pm on Jul 5, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

g1smd, good one

<rant>

Why does Google restore removed pages:

1: To get us angry?

2: To get us angry?

You have 3 choices, the first 2 don't count.

</rant>

I think I'll start returning 555 as the return code on a page not found condition. It is a nice number don't you think?

g1smd

11:08 pm on Jul 5, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

666 might be better.

Reid

2:31 am on Jul 6, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

google does not remove 404's because they "exist"
the server is saying "page not found" but the URL "exists". Google doesn't like to forget pages it "knows exist".
The way to remove URL's from google is to return code 410 GONE.

It may seen trivial but you start analyzing an algo with that in mind. Combined with the fact that pages do return 404 sometimes even when they are not supposed to.In the WC3 definition regarding 404 it says "No indication is given of whether the condition is temporary or permanent"

What could they gain from the test results:
Apparently google just did some major shifts in their algo regarding HTTP status codes (302 and 301) it would ultimately involve the other HTTP staus codes, especially 404 since it is the most common one aside from the other 2. 'page not found' could mean many things but redirecting from there is a very unconventional thing to do as far as HTTP status codes are concerned

404
This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
[w3.org...]