Can mishandling 404 pages get a site banned? - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Can mishandling 404 pages get a site banned?

count_zer0

10:04 am on Apr 17, 2003 (gmt 0)

10+ Year Member

I have a client site which got grey PR in the last Google update. The site is clean, well optimised, 1500+ pages of substantial content, friendly URLs. It had PR 6 previously.

The only possible reason for the banning I can see at the moment is the way they handle "Page Not Found" requests on the server. Because they get a lot of mistyped URLs and broken link referrals to old pages, they have the server serve the nearest page in the folder requested.

As this effectively presented duplicate content to spiders following inbound links to old pages, I recommended that they drop this strategy and served a 404 page to requests for missing pages. However they were reluctant to do this so we compromised and served the nearest page content to a missing page, but with a 404 code in the header.

Unfortunately, 2 weeks after this setup was implemented, Google dropped the site. What is the wisdom here? Do I hang in there till the next update to see if Google figures out the new 404 setup? Or should I get them to stop trying to be clever immediately?

jimbeetle

5:19 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Hi count_zero,

I don't have the answer to this but I think I remember reading something about how 404s are handled having an affect. Am interested in what the folks in the know have to say so want to give this a bump up.

Jim

lazyz

5:23 pm on Apr 17, 2003 (gmt 0)

10+ Year Member

I did post a question something like this before... What I found out is that if your server is not configured properly and Googlebot finds the 404, then the answer will be "yes" (depending on how you format the 404 page)..Not so much banned as penalized or just low page rank... along those lines...

But for the most part, as you may already know, Googlebot ignores or doesn't find the "404.html"...

killroy

7:24 pm on Apr 17, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Why not server a 301 to the nearest page?

Thus you will effectively "correct" misspelled URLs.

SN

hobbnet

7:37 pm on Apr 17, 2003 (gmt 0)

10+ Year Member

Well, I am pretty sure a grey PR bar doesn't mean you were penalized...Grey usually indicates the website has not been found by googlebot yet and a completely white bar indicates a penalty.

So maybe somehow changing the 404's made it so that googlebot couldn't see the pages...Leading me to believe that waiting for the next update wouldn't change anything if you dont change the website first.

count_zer0

2:20 pm on Apr 22, 2003 (gmt 0)

10+ Year Member

Thanks all for your replies.

> Why not server a 301 to the nearest page?
> Thus you will effectively "correct" misspelled URLs.

Yes killroy I would think this should be OK to do, especially since Google recommends this approach in their webmaster guidelines. However, in my experience Google handles these badly, often indexing the source URL and not the target. I also suspect that if it encounters a lot of 301s (for instance by entering the site through old broken external links), it flags the site up as spamming. This is just a hunch though...

> Well, I am pretty sure a grey PR bar doesn't mean you were penalized...Grey usually indicates the website has not been found by googlebot yet and a completely white bar indicates a penalty.

hobbnet, the site has had PR 6 for over a year, and has a dmoz listing, so this looks like a ban to me.

I think I need to look at Googlebot's behaviour in the log. Any experience on dealing with 404s and Google would be gratefully received.

dmorison

2:31 pm on Apr 22, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Hi,

I've considered serving redirects to an alternative page in place of a 404, but i've always resisted because i'm unsure of the impact on my servers as a result of all the thousands of requests per day for formmail.pl, default.ida and the many other crappy requests that aren't anything to do with a human visitor...

Any thoughts?

killroy

6:32 pm on Apr 22, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I have a similar situations where I recently reformed 10s of 1000s of URLs on one of my larger sites.

I used 301s on qualified addresses. To avoid serving large 404 pages to great numbers of bogus requests such as formmail and the other exploits, I simply serve up a 0 byte blank html for those.

I have also noticed that google would pick up the 301, and then come back a few days later for the target of the 301. So if you do it near the end of a deep crawl you might miss that one.

I am happy to report though that google has picked up all th enew URLs this deep crawl that where cahnged only 2 weeks ago.

Also, by serving up (internal redirect) the blank pages for the exploit requests, I save bandwidth as well as avoiding clutter in the error logs.

I regularly check hte error logs and try to keep it empty by redirecting appropriately.

sticky me if you want details.

SN