Forum Moderators: phranque
My problem is as follows,
- I use httacess to rewrite my links to these:
mysite.com/category/129/article_title.html
The number "129" there is a dynamic id of the specific article I pull from DB.
But if you replace that number with something that is not in DB, like "98198" - my site will just display a page without the article elements, as that article ID does not exist.
The problem here happens when I delete some articles on purpose, like articles 12,13,14 and 15.
Google will still check for those and will find a same basic page, for all four deleted articles :(
And by doing so it will find the same exact page four times!
Thus I ended up with many dupes.
My idea is to check if an article exists with PHP and if it doesn't I should inform Google so by creating a certain redirect. But what redirect would that be?
For example, using the first example at hand, this forum. If I erase the ID number of this topic and put in 999999999 it will give me this "Some times a 404 is just a 404...1:dv4
Go home..."
So should I redirect to a 404 page?
Can I do this only with PHP or do I have to add every deleted article into the .htacces file?
Help me put of my dupe hole guys ! :o
I came up with this code:
if($_GET['article_id']!=""){
$sqlx = "SELECT title FROM articles WHERE article_id='".$_GET['article_id']."' LIMIT 1";
$resultx = mysql_query($sqlx, $conn)
or die('Could not get list of pending articles; ' . mysql_error());
if (mysql_num_rows($resultx) == 0) {
header("HTTP/1.0 404 Not Found");
include("../custom/404.php");
exit;
} else {
// all ok, proceed with page display
}
}
Would this be the right way to deal with this situation?
I need some encouragement from you guys, otherwise I just might make the whole mess even messier : D
Basically, 410-Gone says, "We removed it on purpose and it won't be back," whereas from a practical standpoint, and because of its ambiguity, a 404-Not Found says, "The server can't find it, we don't know why. Could be our fault, could be yours. Not sure. If it's our fault, we might or might not find the problem and fix it. Or maybe we will, so keep trying this URL once in a while."
At the present, most search engines treat 410 the same way they treat 404, but this might change as they improve over time. So I recommend going with the HTTP protocol specification [w3.org] on this, and 'doing it right.'
Also, be aware that some user-agents (such as Googlebot) sometimes claim to be HTTP/1.0 agents (probably for compatibility reasons), but they are not. This is evidenced by the fact that they *do* send a Host header with their HTTP requests. They are referred to as "enhanced" or "extended" HTTP/1.0 user-agents. So the best way to determine if you should send a 410 instead of a 404 is to check the HTTP Host request header; If it is blank and the request protocol indicates HTTP/1.0 or below, then use a 404.
Jim
Btw, a response code is a response code, no matter if it is produced by htaccess or php, right?
I check response codes via online tool and it shows results are exactly the same.
Also, now that all my dupes (deleted articles) have been redirected to 410 page. How long will Google nag me that I have duplicated pages?
Also, now that all my dupes (deleted articles) have been redirected to 410 page...
We will hope that your dupes *have not* been redirected, because a 410 (or a 404, or a 403) response is not a redirect. One of the most common errors we see in Apache is badly-coded handlers that *do* redirect to an error page, and the result is that the response code is a 200-OK. And that is a very big problem.
With a properly-implemented error response, the server returns an error status code, and substitutes the contents of the error document for the client-requested URL's contents without invoking client redirection. That is, the visitor's browser address bar does not change.
Your 410 error document should explain that an error has occurred because the content for the requested URL has been intentionally removed. It should provide text links to you home page and your site search, site-map, and major category pages (as applicable) to help the visitor easily find what he/she was looking for. It is acceptable to put a ten- to twenty-second meta-refresh tag on this error page to cause the client browser to load your home or site map page after allowing the visitor plenty of time to read, understand, and act on the information you've provided.
... how long will Google nag me that I have duplicated pages?
They will nag you until they have fetched each duplicate-content URL several times, and have satisfied themselves that the page is really, really, really gone.
This may take several years. Actually, we've had a report here in the past 24 hours that Google came back and asked for URLs that had been removed ten years ago. This may have been due to some database roll-back, but only Google knows. Leave your 410-generation code in place forever, and try to avoid changing URLs or removing content in the future.
With good planning, neither of these are necessary, and both search engines and users prefer sites which don't change and don't have resources constantly disappearing... See Hypertext Style: Cool URIs don't change, [w3.org] by the inventor of the hyperlink, Sir Tim Berners-Lee. They like it best if you treat your site like a library, not like a newspaper.
The HTTP/1.1 protocol specification I cited above defines all HTTP response codes, as well as many other aspects of using HTTP. It is well-worth a review to prevent future problems.
Jim
All my *artciles* are indeed in place and I never delete them, even if news becomes obsolete.
Stuff I do delete is mostly user generated or from my video page. I have a video page with various on topic youtube videos, but these seem to disappear over time cause sometimes users take them down. I delete such stuff.