Welcome to WebmasterWorld Guest from 34.204.173.45

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Strange google bot requests?

I have noticed request to strange filenames that do not exist on my site

     
10:36 pm on Jul 1, 2005 (gmt 0)

New User

10+ Year Member

joined:July 1, 2005
posts:22
votes: 0


Hi, everybody!

Recently I found strange records in my access log:

66.249.66.240 - GoogleBot IP address.
===============
/jfveqcdgnkbr.html
Http Code: 404 Date: Jul 01 16:32:49 Http Version: HTTP/1.1 Size in Bytes: 1219
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

/cyvhpthdm.html
Http Code: 404 Date: Jul 01 16:32:49 Http Version: HTTP/1.1 Size in Bytes: 1218
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

/vihhrxmph.html
Http Code: 404 Date: Jul 01 16:32:50 Http Version: HTTP/1.1 Size in Bytes: 1218
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
===============

I got about 10 hits to such files today.

Has anyone seen anything like this?

As you can see the file names are very random and virtually nonsense. I do not have such files on my site nor I have such or similar links.

So I was wandering if anyone here has experienced or noticed such strange occurrences?

I am at lost what these can be. They showed up today.

Thank you in advance!

11:04 am on July 2, 2005 (gmt 0)

Full Member

joined:Jan 12, 2004
posts:334
votes: 0


Yep, I have the exact same thing. Not the exact same string of characters, but the exact same layout of seemingly random characters.

I also have the G bot trying to access things like domain.com/jpg!

11:09 am on July 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2004
posts:1425
votes: 0


A wild guess: Is this a way to check .htaccess type redirects,
or maybe that and/or other forms of cloaking? -Larry
12:01 pm on July 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


sounds like google is turning over some rocks to see whats under them.
7:14 pm on July 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


There is another possibility Reid, think for a minute, it may become clear.

Pick a site most any site will do, make up a file name say x1x1x1y.stuff place the following [domain...] into a header checker.

Try a few sites and see what return codes you get, you might be surprised and the end result if followed by a s/e bot might be a nice little url in the index that eventually given enough of these nice little urls for different names amount to S/E spam.

Remember Rule 5 is in full force, YMMV, and Rule 6 has been invoked.

7:44 pm on July 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 10, 2003
posts:929
votes: 13


Yep I noticed them on June 30 too. They started out with about 4 to 6 letters then got out to 7-14 letters or more.

at around 0654 PST on June 30 we got one that looked like this from 66.249.65.130:
/&&DI=293&IG=9172802615174b3
fb95ad7b2a73b097f&POS=6&CM=WPU&CE=3&CS=AWP&SR=3

Then very little at all until 1400 PST when we started getting a bunch of gobbledygook ones from 66.249.65.8

Also noticed the first ones were to www.domain.com while the last ones were to domain.com/ which were redirected and followed immediately to another request to www.domain.com.

Testing some 301 algorithm?

4:36 am on July 3, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 2, 2002
posts:87
votes: 0


yep...i got em also. I checked a few sites on my server, and it seems it's not specific to any one domain. Actually about 8 out of 10 sites recieved about 5 or 6 each of those vfdfvqwekvbo.htm hits,obviously resulting in 404's. Wonder what the Big "G" is up to with these.
5:46 am on July 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 6, 2002
posts:4768
votes: 0


I was skeptical of this but I checked and found a bunch 2 days ago. I imaigne they're forcing 404 errors to see how differerent sites handle them.
11:45 am on July 3, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 31, 2004
posts:43
votes: 0


I'm not getting the random letters. The last few odd entries in the logs look like offset=-265, offset=-525, offset=-31, etc. Where offset is used as a paging function, i.e. offset=10 is page 2 of a page of 10 widget entries. Checked the whole site, checked for odd backlinks and couldn't find any errors on the site itself. Got a non-answer from support. Time for a 302 redirect to catch anything with ending in -99?
7:13 pm on July 3, 2005 (gmt 0)

Full Member

10+ Year Member

joined:June 15, 2003
posts:212
votes: 0


I thought it might helpful to link these threads [webmasterworld.com]
7:32 pm on July 3, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Maybe not related, but all the URLs we excluded with robots.txt back in March (by submitting the robots.txt file URL to the Google URL console) are today all back in a site: search and all of them are shown as URL-only entries. [webmasterworld.com...]
7:59 pm on July 3, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 13, 2004
posts:650
votes: 0


Most of my sites have 404 redirecting to the home page.
Could it be that competition does linking in that nonsense way to make Google penalize the site due to duplicate content?
9:10 pm on July 3, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 29, 2004
posts:180
votes: 0


Simple answer: Google is trying to figure out what a missing page looks like.

See the academic paper "Sic Transit Gloria Telae: Toward an Understanding of the Web's Decay" on the rate of page death. It contains a section on how to measure page death, going not by 404sósome domains "soft catch" 404sóbut by figuring out what the page looks like when it REALLY shouldn't exist. So, you ask for a page composed of random letters and see what the server returns.

The academic paper did 25 random letters. Google is clearly varying the number, so people can't catch their efforts by mere checking if the length is 25.

1:37 am on July 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 13, 2003
posts:705
votes: 0


I noticed these in our logs two weeks ago.

Our site is clean.

It's G looking at your 404.

Q/
Most of my sites have 404 redirecting to the home page.
Could it be that competition does linking in that nonsense way to make Google penalize the site due to duplicate content?
/Q

No, you are your own enemy.
It's a beginner's mistake to redirect 404s to your home page. (I did it too last year.)

STOP it immediately, and use a plain 404 page.

Every lost page on your site is seen by G et al as a duplicate of your homepage.

It's your own fault.

9:52 am on July 4, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 13, 2004
posts:650
votes: 0


Every lost page on your site is seen by G et al as a duplicate of your homepage.

Maybe that's why they are checking 404?
To correct the problem.

1:58 pm on July 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 13, 2003
posts:705
votes: 0


It's not their problem it's yours.

You have full control, and all the info you now need to fix it.

3:40 pm on July 4, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 13, 2004
posts:650
votes: 0


You have full control, and all the info you now need to fix it.

I don't think I have to "fix" it.
First of all I don't depend on Google traffic.
Further, most domains are previously expired ones with tens, hundreds or even thousands of different directories and I have no problem redirecting them to the new home pages.
I don't need 404 producing dynamic scraper pages that would clearly invite potential legitimate penalties, or redirecting to some other codes which would require additional click.
If I miss something here, I would appreciate a clarification.

It is obvious that Google has been having a lot of problems with different redirects (302, www/non-www, meta refresh,...) for a long time, so I guessed they finally decided to do something about it.
To correct, uhm... still THEIR problem, at least in my case.

6:57 pm on July 4, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


I would use a separate 404 errorpage, and include some site navigation on it. I wouldn't redirect to the actual index page. I also wouldn't redirect to a page with exactly the same content as the main index page either.
6:13 am on July 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


I think it's a bad idea to redirect 404's.
First of all how is the person receiving the 404 supposed to know that it's a 404?
i get errors from time to time which are ammatures linking to me. it takes them a few tries to get it right thus leaving 404's in my log. Usually they are simple spelling mistakes or minor syntax errors. (htm instead of html ect).
Imagine if i had 404's redirecting to my homepage - I would have all these garbage links pointing at me from idiots who can't spell a filename and the idiots wouldn't even know there was a problem.
A 404 is an error code if you hide the error then you are going to have other problems period.
My 404 page has a simple script that will convert all uppercase characters to lowercase in the URL (unix server) and if that comes back to the 404 again then they have a simple 404 page with a blurb about using the refresh button or contacting the webmaster along with a static link to my homepage.
You requested a page that does not exist - deal with it or if it's my fault then contact me - that is what 404 means.
6:16 am on July 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 20, 2002
posts:813
votes: 1


Why not only give 404's if it's the Googlebot... but let regular visitors get redirected to the homepage. Of course the 404 response code might delist those pages at some point.
7:34 am on July 5, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 13, 2004
posts:650
votes: 0


Raid:
Imagine if i had 404's redirecting to my homepage - I would have all these garbage links pointing at me from idiots who can't spell a filename and the idiots wouldn't even know there was a problem.

Sure, still many of them would not correct the links, so you will still have the garbage links, no matter what you do with 404.
But yes, that's a valid point.

g1smd:

I would use a separate 404 errorpage, and include some site navigation on it.

That's the proper way to do it, but I am afraid of losing visitors due to the additional click.
Besides, in majority of cases navigation would point to the home page.
Does anyone have some stats about percentage of visitors continuing browsing a site (from 404 pages with navigation links)?

9:36 am on July 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Oct 13, 2003
posts:705
votes: 0


Activeco wrote: "I don't think I have to "fix" it.
If I miss something here, I would appreciate a clarification."

You do need to fix it. You've certainly missed something.
Isn't it obvious? Sit back, read this thread again, and think.

And remember, it's not just Google, but ALL SE's your 404 redirect is effectively 'spamming'.

Not a wise thing to do.

Fix it or suffer.

9:40 am on July 5, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:June 13, 2004
posts:650
votes: 0


...and think

That's a hard part.

Anyway, does anyone have more theories about why Google does it?

4:25 pm on July 5, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 29, 2004
posts:180
votes: 0


I really don't think there's any mystery: this is how Google figures out what a site's "dead page" looks like. The only mystery is how Google uses this information, for example, whether 404's that redirect to a homepage are perceived as spamming.
10:41 pm on July 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


<rhetorical question>If your "dead page" looks like your index page, does that mean that your index page is also dead, regardless that the status is "200" instead of "404"?
10:49 pm on July 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 19, 2003
posts:804
votes: 0


g1smd, good one

<rant>

Why does Google restore removed pages:

1: To get us angry?

2: To get us angry?

You have 3 choices, the first 2 don't count.

</rant>

I think I'll start returning 555 as the return code on a page not found condition. It is a nice number don't you think?

11:08 pm on July 5, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


666 might be better.

2:31 am on July 6, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2004
posts:693
votes: 0


google does not remove 404's because they "exist"
the server is saying "page not found" but the URL "exists". Google doesn't like to forget pages it "knows exist".
The way to remove URL's from google is to return code 410 GONE.

It may seen trivial but you start analyzing an algo with that in mind. Combined with the fact that pages do return 404 sometimes even when they are not supposed to.In the WC3 definition regarding 404 it says "No indication is given of whether the condition is temporary or permanent"

What could they gain from the test results:
Apparently google just did some major shifts in their algo regarding HTTP status codes (302 and 301) it would ultimately involve the other HTTP staus codes, especially 404 since it is the most common one aside from the other 2. 'page not found' could mean many things but redirecting from there is a very unconventional thing to do as far as HTTP status codes are concerned

404
This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
[w3.org...]