Forum Moderators: open
Is a 404 error page that is set up to redirect all erroneous URLs to the home page the best way to do a 404 error page? That is, rather than the traditional 404 error page that says the usual "Sorry, you've typed in the wrong URL, or the page you entered no longer exists," etc. etc. and a few main links to the site?
Here's the problem: The client was victimized by a black hat SEO. The site used to have sneaky redirects that caused the site's being banned from Yahoo's search results.
All the offending files were removed from the server a couple of years ago before I ever heard of the issue. But there are still backlinks out there coming to those deleted pages from other sites with the same type of pages that our client used to have. I found at least two backlinks using the same method, example.com/sneakyredirects/example-1.htm.
And under the current 404 redirect setup, those toxic backlinks are getting redirected to index.htm.
Here's the question:
If the current 404 setup automatically redirects all comers to the index page, including who knows how many toxic backlinks, then won't Yahoo see the redirect of those black hat backlinks as being complicit in the black hat linking scheme?
Wouldn't it be safer to use a conventional 404 error page rather than this redirect? (And wouldn't that always be a best practice for all sites anyway?)
I do know that it's unlikely that Yahoo would penalize a victimized site for one-way black hat backlinks, as long as there's no reciprocating link.
Is this a case that Yahoo is just never going to forgive my client, or is there any hope that this issue can ever be fixed?
[edited by: martinibuster at 6:08 pm (utc) on Mar. 8, 2009]
[edit reason] example.com is reserved for examples. [/edit]
You can put a link on the page that goes to the home page or a sitemap or wherever you want, but never redirect there automatically. The user should have to click the link to exit the page. Don't meta-refresh away from the 404 page, either. And also don't redirect to a custom 404 page that *says* "Page Not Found" but actually returns the page with a 200 result code.
What's important is that a Page Not Found error must return a response/status code of 404.
Search engines use 404 responses to determine what pages are on the site and what pages don't exist. If you redirect, it's like telling them that no matter what they ask for, they'll get a page; it's a sneaky way to try to get thousands or millions of nonexistent pages indexed. The search engines don't trust that behavior.
[google.com...]
Enhance your custom 404 pageA 404 page is what a user sees when they try to reach a non-existent page on your site (because they've clicked on a broken link, the page has been deleted, or they've mistyped a URL).
While the standard 404 page can vary depending on your web host, it usually doesn't provide the user with much useful information, and users may just surf away from your site. Therefore, we recommend creating a custom 404 page that provides the user with more information about your site and its content. (You should still make sure that your webserver returns a 404 status code to users and spiders, so that search engines don't accidentally index your custom 404 page.)
The 404 widget is a quick and easy way to embed a search box on your custom 404 page and provide users with useful information designed to help them find the information they need. Where we can, we'll also suggest other ways for the user to find <snip/>
SteveWH, thanks. The redirecting idea did appear a bit sketchy to me. You said, "What's important is that a Page Not Found error must return a response/status code of 404." It doesn't sound as if this redirecting scheme does this.
lavazza, thanks for the tip!
Both will let you see the line in the response where it says "HTTP/1.1 404" or "HTTP/1.1 302" or "HTTP/1.1 301" etc.
See HTTP/1.1: Status Code Definitions:
[w3.org...]
But what happens then? Is the page redirecting (in the browser) to the homepage using javascript or meta refresh, or is it just displaying the homepage content?
(The code below the header response code in WFetch should be the HTML sent to the browser, with \r\n for new lines, \t for tabs etc).
References:
[en.wikipedia.org...]
[httpd.apache.org...]
Here is sample code for .htaccess:
# You might already have these first 2 lines in your .htaccess
RewriteEngine On
RewriteBase /
# These lines send the 410 response.
# To handle a group of pages with similar names,
# use a more general regular expression on the PATHANDFILENAME line.
RewriteCond %{REQUEST_URI} ^/PATHANDFILENAME\.html$ [NC]
RewriteRule .* - [G,L]
When I collapsed one page into another and wanted the old page deindexed from search engines, it wasn't enough to just do a 301 redirect from the old page to the new or to return a 404 for the old page. They kept coming back for the page. When I sent the 410, Google and Yahoo each requested the page once, got the 410, and never requested it again.
[edited by: SteveWh at 8:19 pm (utc) on Mar. 9, 2009]
SteveWH, the 410 response code makes good sense.
H76: Using meta refresh to create an instant client-side redirect ¦ Techniques for WCAG 2.0:
[w3.org...]
I'm unsure of the TOS or propriety of displaying the whole results, but here are some excerpts, if that will give a clue about how it works. Again, this is just what I picked out that looked to me to be relevant.
example.com in the code below is microsoft.
--------------------
Server: Microsoft-IIS/5.1\r\n
Content-Length: 4040\r\n
// in real bits, urls get returned to our script like this:\r\n
// res://shdocvw.dll/http_404.htm#http://www.DocURL.com/bar.htm \r\n
\r\n
\t//For testing use DocURL = "res://shdocvw.dll/http_404.htm#https://www.example.com/bar.htm"\r\n
\tDocURL = document.URL;\r\n
\t\t\r\n
\t//this is where the http or https will be, as found by searching for :// but skipping the res://\r\n
\tprotocolIndex=DocURL.indexOf("://",4);\r\n