Forum Moderators: Robert Charlton & goodroi
Recently I found strange records in my access log:
66.249.66.240 - GoogleBot IP address.
===============
/jfveqcdgnkbr.html
Http Code: 404 Date: Jul 01 16:32:49 Http Version: HTTP/1.1 Size in Bytes: 1219
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/cyvhpthdm.html
Http Code: 404 Date: Jul 01 16:32:49 Http Version: HTTP/1.1 Size in Bytes: 1218
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/vihhrxmph.html
Http Code: 404 Date: Jul 01 16:32:50 Http Version: HTTP/1.1 Size in Bytes: 1218
Referer: -
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
===============
I got about 10 hits to such files today.
Has anyone seen anything like this?
As you can see the file names are very random and virtually nonsense. I do not have such files on my site nor I have such or similar links.
So I was wandering if anyone here has experienced or noticed such strange occurrences?
I am at lost what these can be. They showed up today.
Thank you in advance!
I also have the G bot trying to access things like domain.com/jpg!
Pick a site most any site will do, make up a file name say x1x1x1y.stuff place the following [domain...] into a header checker.
Try a few sites and see what return codes you get, you might be surprised and the end result if followed by a s/e bot might be a nice little url in the index that eventually given enough of these nice little urls for different names amount to S/E spam.
Remember Rule 5 is in full force, YMMV, and Rule 6 has been invoked.
at around 0654 PST on June 30 we got one that looked like this from 66.249.65.130:
/&&DI=293&IG=9172802615174b3
fb95ad7b2a73b097f&POS=6&CM=WPU&CE=3&CS=AWP&SR=3
Then very little at all until 1400 PST when we started getting a bunch of gobbledygook ones from 66.249.65.8
Also noticed the first ones were to www.domain.com while the last ones were to domain.com/ which were redirected and followed immediately to another request to www.domain.com.
Testing some 301 algorithm?
See the academic paper "Sic Transit Gloria Telae: Toward an Understanding of the Web's Decay" on the rate of page death. It contains a section on how to measure page death, going not by 404s—some domains "soft catch" 404s—but by figuring out what the page looks like when it REALLY shouldn't exist. So, you ask for a page composed of random letters and see what the server returns.
The academic paper did 25 random letters. Google is clearly varying the number, so people can't catch their efforts by mere checking if the length is 25.
Our site is clean.
It's G looking at your 404.
Q/
Most of my sites have 404 redirecting to the home page.
Could it be that competition does linking in that nonsense way to make Google penalize the site due to duplicate content?
/Q
No, you are your own enemy.
It's a beginner's mistake to redirect 404s to your home page. (I did it too last year.)
STOP it immediately, and use a plain 404 page.
Every lost page on your site is seen by G et al as a duplicate of your homepage.
It's your own fault.
You have full control, and all the info you now need to fix it.
You have full control, and all the info you now need to fix it.
I don't think I have to "fix" it.
First of all I don't depend on Google traffic.
Further, most domains are previously expired ones with tens, hundreds or even thousands of different directories and I have no problem redirecting them to the new home pages.
I don't need 404 producing dynamic scraper pages that would clearly invite potential legitimate penalties, or redirecting to some other codes which would require additional click.
If I miss something here, I would appreciate a clarification.
It is obvious that Google has been having a lot of problems with different redirects (302, www/non-www, meta refresh,...) for a long time, so I guessed they finally decided to do something about it.
To correct, uhm... still THEIR problem, at least in my case.
Imagine if i had 404's redirecting to my homepage - I would have all these garbage links pointing at me from idiots who can't spell a filename and the idiots wouldn't even know there was a problem.
Sure, still many of them would not correct the links, so you will still have the garbage links, no matter what you do with 404.
But yes, that's a valid point.
g1smd:
I would use a separate 404 errorpage, and include some site navigation on it.
That's the proper way to do it, but I am afraid of losing visitors due to the additional click.
Besides, in majority of cases navigation would point to the home page.
Does anyone have some stats about percentage of visitors continuing browsing a site (from 404 pages with navigation links)?
You do need to fix it. You've certainly missed something.
Isn't it obvious? Sit back, read this thread again, and think.
And remember, it's not just Google, but ALL SE's your 404 redirect is effectively 'spamming'.
Not a wise thing to do.
Fix it or suffer.
It may seen trivial but you start analyzing an algo with that in mind. Combined with the fact that pages do return 404 sometimes even when they are not supposed to.In the WC3 definition regarding 404 it says "No indication is given of whether the condition is temporary or permanent"
What could they gain from the test results:
Apparently google just did some major shifts in their algo regarding HTTP status codes (302 and 301) it would ultimately involve the other HTTP staus codes, especially 404 since it is the most common one aside from the other 2. 'page not found' could mean many things but redirecting from there is a very unconventional thing to do as far as HTTP status codes are concerned
404
This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
[w3.org...]