Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to remove deleted forum pages from Google index

What would the best solution?

         

youfoundjake

7:43 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm posting it here because yahoo and msn don't have the pages cached and really I worry about how google views that site since they are the dominating search engine for the time being.
I have 50 pages that were at one time in the forum, based on states, but I deleted them all. Hindsight should have been to just not publish them, but ehh..
I am wondering what is the best way to get it out of Googles cache, as I don't want to be penalized. If you click on the link from Googles sitemaps, you will go to a page, just no content there. I would almost be willing to remove the whole forum from the cache, there are only 7 or 8 posts, but I'm wondering is the url removal tool going to wipe out the whole domain, or should a 301 be set up for each page that is no longer valid to point back to the root of the forum?
There are a bunch of entries that look like below:

www.example.com/forum/viewforum.php?f=4&mark=topics&sid=8675309allyourbasecfc86b3ca2d2ee38c660a24f828

What is the best way to handle this?

g1smd

11:54 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If they are shown as Supplemental Results then simply serve a "404" HTTP status code for those errors, along with a custom error page containing a basic sitemap.

The URLs will continue to be listed for one year before being dropped out of the index. In the meantime, your custom error page will give the visitor an easy way to find the content that they require.

youfoundjake

12:00 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They are in fact showing up as supplemental. But the pages are there. Just no content on it. Navigational links and all, heck even my adsense ads, but that actual link that is supposed to bring to that page has been removed, unless you look at google's cache....
I saw in an earlier post (unfortunately after writing this) about about using the url removal tool with a wild card, because of sessionids, which is what Im thinking of. And again, since google is the only one that has it, visitors will never be able to find it and see the custom 404 with basic sitemap.

g1smd

12:03 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> But the pages are there. Just no content on it.

Serve a HTTP status of "404" for each one and Google will eventually drop them out of the visible index.

youfoundjake

1:15 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



okie dokie

Lobo

1:19 am on Oct 4, 2006 (gmt 0)

10+ Year Member



All publicity is good publicity?

redirect to a welcome page inviting them to sign up.. offering more in depth search on your forum .. if it brings traffic to your site try and capture it?

fjpapaleo

1:53 am on Oct 4, 2006 (gmt 0)

10+ Year Member



Actually, a "410 gone" would be the correct response to give. You can create a custom page telling your visitors to visit the other parts of your site. Just do what's right, if Google can't handle it that's their problem, not mine. That's the way I look at it.

Marcia

2:04 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Actually, a "410 gone" would be the correct response to give

As long as it's posted, how about telling how that's done.

fjpapaleo

2:19 am on Oct 4, 2006 (gmt 0)

10+ Year Member



"As long as it's posted, how about telling how that's done."

I guess it would depend on the type of site, server, language etc.
On Apache with .htaccess you would do this:

Redirect gone /page.html

ErrorDocument 410 /notfound.html

You can put whatever you like in the error document......."sorry, this page no longer exists, please visit our homepage at www....."

Or you could do it in php or asp or whatever, just as you would do any other re-direct.

I'm sure there are better webmasters in here than I who can help with this.

jk3210

2:44 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've used 410 GONE on numerous pages and my experience is that it doesn't make any difference over doing it the way g1smd recommends with 404s.

Using either 404 or 410, the key element for Google seems to be the one year time period as g1 mentions. (ymmv)

tedster

2:53 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In Feb 2006, Matt Cutts blogged that Google treats 404 and 410 indentically.

[mattcutts.com...]

JackR

3:26 am on Oct 4, 2006 (gmt 0)

10+ Year Member



I have just added a .301 to around 15 old URLs - redirecting visitors to the homepage. The URLs were all listed as Supplemental results.

Can someone please tell me if this is the best thing to do, and if not whether a .404 is better.

I'd just like to see the pages dropped from the index entirely.

tedster

3:42 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In my opinion, 404 is much better because it describes the real situation. The Home Page does not contain the same content that those 15 old urls did, so that content is not "301 moved permanently". Instead it is either "404 not found" or "410 gone".

I don't have enough data, yet, to say this next bit for sure -- but I do suspect that Google now looks at their historical version of a newly redirected page and "compares it" to the content of the new target page for the redirect. There certainly is something going on in some cases that slows down indexing of new 301 redirects, compared to links.

So I always recommend that a url-specific 301 be reserved for actually MOVED content. Of course, that's me. I'm very cautious most of the time, and your results may differ.

fjpapaleo

3:44 am on Oct 4, 2006 (gmt 0)

10+ Year Member



"In Feb 2006, Matt Cutts blogged that Google treats 404 and 410 indentically."

That's probably true if you're only talking about Google but I said the "correct response". Google can't seem to get either one right so it really doesn't matter.

From the W3C:

>>If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.

The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed.<<

The part that caught my attention was the "response is cacheable". I play it safe and go with the 410.

JackR

3:47 am on Oct 4, 2006 (gmt 0)

10+ Year Member



The .410 it is. Thank you both for your replies.

Marcia

4:15 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've read about 404 compared to 410 before, and according to the definition, quoted here, the 410 accurately describes exactly what the situation is with a removed page. 404 is too ambiguous and doesn't have the accuracy. That has nothing to do with Google, and doing it how they handle it is the tail wagging the dog, not the other way around as it should be.

youfoundjake

3:13 pm on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Alot of good discussion on this, thanks...
So I have gone from a 301 to a 404 and ultimately to a 410 which is what I think would fit my situation best.
I'm running on an apache server and my forum is mysql with phpbb. I removed all the topics that weren't needed out of the database around 6 months ago, but when you click on the link in googles cache, it goes to a page on my forum that has no content, just the navigational breadcrumbs and log in, profiles, etc.. so there is still a page there.
I've tried searching the db using the session id and found nothing. As I mentioned I only have 7 or 8 posts, would it be better to just blow out the whole forum and start a new one, and 410 the old one, or do I have to set up my .htaccess for each url that google has cached and direct it to a 410?

g1smd

7:00 pm on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Whatever the no-content-here pages return as a HTTP status code: 301 or 404 or 410, the key point that I want to make here is that Google will continue to show those URLs in the SERPs as Supplemental Results for one whole year - so your site should allow the visitor to see some sort of error message and some basic site navigation to get them on their way whenever anyone arrives at that URL from another site or from a search result.