| 10:06 pm on Mar 22, 2012 (gmt 0)|
It sounds like many URLs would have essentially the same content under that alternative. What makes it hard to sift out the 410 responses?
Of course the other question is why Google does give up on them, How long has it been, and does googlebot keep crawling them? Are they linked from other websites?
| 10:21 pm on Mar 22, 2012 (gmt 0)|
Hi Tedster, users sometimes delete content for various reasons. For example a user might delete a blue widget they created. This leads to a 410 that sticks around for a long time. The idea would be, for us to link to other users blue widgets.
That way the visitor still gets what they come looking for. It will be a different blue widget, but a blue widget non-the-less.
It does concern me also, that we have so many removed pages. Giving users the control they want, over their content makes them happy, but I fear it may cause quality issues with G.
Some of the 410's are from early 2011, no internal or external links. I've also checked sitemaps and cannot find reference. This was only on the portion that I tested, there maybe some, that are linked to externally.
| 10:24 pm on Mar 22, 2012 (gmt 0)|
|I fear it may cause quality issues with G. |
Has it caused issues? Or are you just worried it might?
When Google sees your site is UGC, I'm pretty sure different criteria are applied.
| 11:11 pm on Mar 22, 2012 (gmt 0)|
Just worried, but hopefully you're right.
| 11:12 pm on Mar 22, 2012 (gmt 0)|
If you do that they'll show up as "soft 404" errors...
| 12:13 am on Mar 23, 2012 (gmt 0)|
I assume that by "linking" to another blue widget, you mean through a 301. Not really "link"?
If you use 301's, Google will eventually forget about the old URIs.
If you replace the old content
|Leaving the content in place and having a user friendly message, informing them that the content has been removed and instead offer highly relevant alternatives |
You would keep the URIs, but Google would see a lot of duplicate content.
Personally, I would do the 301s to better alternatives, and on the side do what I could to find out if there are links out there in the world referring to the old content. If there is, GoogleBot could keep coming back. So, check those referrers.
| 12:52 am on Mar 23, 2012 (gmt 0)|
Hunh; I have a boatload of 404s in my GWT from two years ago (and two versions ago) that just won't go away. Pretty sure it's from scrapers and what not. I was thinking about serving up 410s for them, but if they're not dropping your 410s, not much point in that. Really aggravating too. It's like that annoying ex you kicked out who leaves five boxes of junk in your basement and never comes and gets them.
| 9:41 am on Mar 23, 2012 (gmt 0)|
|If you use 301's, Google will eventually forget about the old URIs. |
For a given definition of "eventually". I finally put my foot down and slapped 410's on some pages that I moved and 301'd a full year ago. Redirects are nice for humans who have old addresses bookmarked, but it's infuriating when search engines keep eating 301s. What do they think-- that you'll move back, but not bother to put in a single link to the old/new URL?
Honestly. Even the post office only forwards mail for 6 months.
| 10:00 am on Mar 23, 2012 (gmt 0)|
Well. That is different issue.
"Forgetting" simply means that if the page is in Google's archives (and hence will be retried), the 301 will fairly quickly make them forget the old URL. The new page will be archived instead.
If GoogleBot during its scans of other sites keep finding links to the old page, it will be obviously be revived again as an "interesting object to check out" so to speak. But because of the redirect, the new page stays the one indexed.
What did you think it meant that Google would "forget" the old URI? That they in addition to dropping the old URI from index and indexing your new page instead would also implement some sort of firewall list of dead URIs to remind themselves never to check that path, if someone else tells them about it once again? That is likely too much to expect. :)
Plus, they don't know up front. You could have revived it yourself. They'll have to check. Even if you 410/404 the old pages, GoogleBot would still come back to check them, but instead hitting errors.
The only way to fix the "revivals", I think, is to ask the linking sites to change their links so GoogleBot can stop finding the old links over and over.
| 1:07 pm on Mar 23, 2012 (gmt 0)|
Yes seems like they never drop them. I see accesses for years now on ancient links and even if they don't exist elsewhere google still keeps a record.
The problem with serving the "highly relevant alternatives" is that you cannot tell in advance the kind of requests many of them artificial putting stretch on the server doing various queries to find suggestions. A redirect path filters many of them, if done correctly it can put the visitor to the nearest page relevant to the request made.
| 9:13 pm on Mar 23, 2012 (gmt 0)|
DeeCee, I'm not talking about pages with old addresses being reinforced from outside. These are pages that have no current existence outside the minds of search engines.
If you go to the electronics store and meet a sign saying "We've Moved!" with a new addresss, you might absent-mindedly go back to the old location once or twice. But it wouldn't take you a year to update your address book. Wipe the old one, put up the new one.
And if you did revive dead pages, you would be linking to them yourself, or putting them on a current site map. No need for g### to "remember" the URL; it will be given a new one.
| 10:05 pm on Mar 23, 2012 (gmt 0)|
Sounds strange, I'd have to admit, if there are no references to them from the outside. :)
I have noticed that Google in some cases have a hard time letting go.
A domain I have previously had a web-site on it. At some point maybe a year ago, I killed the web-site and turned the domain into a scammer trap. It then had a robots.txt with a Deny for all user-agents. So GoogleBot stayed away.
The result was, that WMT keep complaining to me, that this domain has about 1100 pages "blocked".
The pages have not been in the indexes for likely more than a year, cannot be found on search obviously, and yet WMT keep claiming that I was blocking a lot of data. And in WMT's front-page they list the domain as having "Severe health problems" because of the blocking robots.txt. :)
I recently added an Allow for GoogleBot specifically, and am now waiting to see if it will start visiting all those old URL that has not been seen for more than a year. They no longer exist on the new use of the domain, and will become 404s (hopefully). I'll await to see whether the count of URLs will start dropping in WMT.