|Removing 410 response codes|
Regardless of what I do, Google will not stop visiting removed pages, despite a 410 response code.
This floods my GWT with basically crap, making it impossible for me to find legitimate problems.
I was considering removing the response code. Leaving the content in place and having a user friendly message, informing them that the content has been removed and instead offer highly relevant alternatives.
Is there a downside to doing this, that I've not thought of?
It sounds like many URLs would have essentially the same content under that alternative. What makes it hard to sift out the 410 responses?
Of course the other question is why Google does give up on them, How long has it been, and does googlebot keep crawling them? Are they linked from other websites?
Hi Tedster, users sometimes delete content for various reasons. For example a user might delete a blue widget they created. This leads to a 410 that sticks around for a long time. The idea would be, for us to link to other users blue widgets.
That way the visitor still gets what they come looking for. It will be a different blue widget, but a blue widget non-the-less.
It does concern me also, that we have so many removed pages. Giving users the control they want, over their content makes them happy, but I fear it may cause quality issues with G.
Some of the 410's are from early 2011, no internal or external links. I've also checked sitemaps and cannot find reference. This was only on the portion that I tested, there maybe some, that are linked to externally.
|I fear it may cause quality issues with G. |
Has it caused issues? Or are you just worried it might?
When Google sees your site is UGC, I'm pretty sure different criteria are applied.
Just worried, but hopefully you're right.
If you do that they'll show up as "soft 404" errors...
I assume that by "linking" to another blue widget, you mean through a 301. Not really "link"?
If you use 301's, Google will eventually forget about the old URIs.
If you replace the old content
|Leaving the content in place and having a user friendly message, informing them that the content has been removed and instead offer highly relevant alternatives |
You would keep the URIs, but Google would see a lot of duplicate content.
Personally, I would do the 301s to better alternatives, and on the side do what I could to find out if there are links out there in the world referring to the old content. If there is, GoogleBot could keep coming back. So, check those referrers.
Hunh; I have a boatload of 404s in my GWT from two years ago (and two versions ago) that just won't go away. Pretty sure it's from scrapers and what not. I was thinking about serving up 410s for them, but if they're not dropping your 410s, not much point in that. Really aggravating too. It's like that annoying ex you kicked out who leaves five boxes of junk in your basement and never comes and gets them.
|If you use 301's, Google will eventually forget about the old URIs. |
For a given definition of "eventually". I finally put my foot down and slapped 410's on some pages that I moved and 301'd a full year ago. Redirects are nice for humans who have old addresses bookmarked, but it's infuriating when search engines keep eating 301s. What do they think-- that you'll move back, but not bother to put in a single link to the old/new URL?
Honestly. Even the post office only forwards mail for 6 months.
Well. That is different issue.
"Forgetting" simply means that if the page is in Google's archives (and hence will be retried), the 301 will fairly quickly make them forget the old URL. The new page will be archived instead.
If GoogleBot during its scans of other sites keep finding links to the old page, it will be obviously be revived again as an "interesting object to check out" so to speak. But because of the redirect, the new page stays the one indexed.
What did you think it meant that Google would "forget" the old URI? That they in addition to dropping the old URI from index and indexing your new page instead would also implement some sort of firewall list of dead URIs to remind themselves never to check that path, if someone else tells them about it once again? That is likely too much to expect. :)
Plus, they don't know up front. You could have revived it yourself. They'll have to check. Even if you 410/404 the old pages, GoogleBot would still come back to check them, but instead hitting errors.
The only way to fix the "revivals", I think, is to ask the linking sites to change their links so GoogleBot can stop finding the old links over and over.
Yes seems like they never drop them. I see accesses for years now on ancient links and even if they don't exist elsewhere google still keeps a record.
The problem with serving the "highly relevant alternatives" is that you cannot tell in advance the kind of requests many of them artificial putting stretch on the server doing various queries to find suggestions. A redirect path filters many of them, if done correctly it can put the visitor to the nearest page relevant to the request made.
DeeCee, I'm not talking about pages with old addresses being reinforced from outside. These are pages that have no current existence outside the minds of search engines.
If you go to the electronics store and meet a sign saying "We've Moved!" with a new addresss, you might absent-mindedly go back to the old location once or twice. But it wouldn't take you a year to update your address book. Wipe the old one, put up the new one.
And if you did revive dead pages, you would be linking to them yourself, or putting them on a current site map. No need for g### to "remember" the URL; it will be given a new one.
Sounds strange, I'd have to admit, if there are no references to them from the outside. :)
I have noticed that Google in some cases have a hard time letting go.
A domain I have previously had a web-site on it. At some point maybe a year ago, I killed the web-site and turned the domain into a scammer trap. It then had a robots.txt with a Deny for all user-agents. So GoogleBot stayed away.
The result was, that WMT keep complaining to me, that this domain has about 1100 pages "blocked".
The pages have not been in the indexes for likely more than a year, cannot be found on search obviously, and yet WMT keep claiming that I was blocking a lot of data. And in WMT's front-page they list the domain as having "Severe health problems" because of the blocking robots.txt. :)
I recently added an Allow for GoogleBot specifically, and am now waiting to see if it will start visiting all those old URL that has not been seen for more than a year. They no longer exist on the new use of the domain, and will become 404s (hopefully). I'll await to see whether the count of URLs will start dropping in WMT.