| 4:44 pm on Mar 8, 2004 (gmt 0)|
If you want it to go away, just remove it. It is not illegal to return a 404 Not Found response :-)
| 5:04 pm on Mar 8, 2004 (gmt 0)|
delete it and submit the previous URL to Google. It will eventually visit and find nothing.
| 10:01 pm on Mar 8, 2004 (gmt 0)|
Can take a while. I have a 11 feb Google cache page for a PR5 page I deleted 14 Feb. I have a link to the defunct url from PR6 index and it gets perm redirected back to the index page. I thought the link would mean Google would drop it fast, since new pages get up in a matter of days.
| 10:44 pm on Mar 8, 2004 (gmt 0)|
For search terms where there are less than ~50 results returned, the page could continue to show up as a Supplemental Result almost forever.
Google keeps those critters for a verrry long time.
| 10:28 am on Mar 9, 2004 (gmt 0)|
Yep, I can see orphans from November. Linking to the URL and returning 404 from it is the quickest easy way to get rid of URLs you don't want.
| 10:33 am on Mar 9, 2004 (gmt 0)|
There seem to be quite a few 404s out there in the index at the moment :(
| 11:32 am on Mar 9, 2004 (gmt 0)|
sem4u, 404s normally remain in Google when the URLs are /robots.txt excluded (in which case Googlebot cannot see the HTTP header) or when there is no link to the page (in which case it can take a long time while for the robot to request it).
I don't think I've come across a case of Google fetching URLs that return 404, and then not removing them speedily.
| 3:41 pm on Mar 9, 2004 (gmt 0)|
I have had similar errors with google continuing to index orphan pages.
There is likely to be a period of time where the crawler is served a 404 but does not remove the page from the listing in case it was an accidental deletion.
Not sure what this is though. What I have done where major reconstructions have resulted in 404 pages being returned frequently until a reindex is to customise the 404 to provide more helpful information similar to that of a sitemap.
| 4:32 pm on Mar 9, 2004 (gmt 0)|
|If you want it to go away, just remove it. It is not illegal to return a 404 Not Found response :-) |
Better yet, remove it and set up a redirect that returns a 410 Gone response. Google might drop the dead listing quicker that way, since this makes clear that it's not a temporary error caused by forgetting to update a link.
| 7:47 pm on Mar 9, 2004 (gmt 0)|
I would be grateful for any clarification here.
I have several almost duplicate pages up at the moment as I don't want the visitor finding 404's Page Not Found. So should I remove the content from the "almost duplicate" pages and do a redirect? Are there different flavours of redirect? Is there some flavour of redirect that Google doesn't like?
Thanks for any help.
| 9:44 pm on Mar 9, 2004 (gmt 0)|
Interesting that for a domain long gone, but with many links still pointing to it, a link:www.domain.com/ still brings up a valid list.
Putting the domain back on line, resulted in a #1 listing about 3 days later. Have now donated the domain to someone else (related subject) to put their new content on.
| 3:52 am on Mar 11, 2004 (gmt 0)|
>So should I remove the content from the "almost duplicate" pages and do a redirect? Are there different flavours of redirect?
Do a 301 permanent redirect from the URL's that you are moving to the pages you want to keep.
You defintely need to delete or rename pages that you don't want found- I have a few that have been orphans for a year and a half that still get traffic from Google for obscure searches.
| 7:50 am on Mar 11, 2004 (gmt 0)|
Someone from Google says that Googlebot can find pages unlinked.
|If you have content you don't want the bot to find be sure to put the robots.txt file up to keep it out. The bot, says Craig, can find content that's unlinked. That's right, the Google bot can find single pages dangling unlinked in space. He didn't explain how this happens. |
Details can be found here.
| 2:21 pm on Mar 11, 2004 (gmt 0)|
I imagine the Google Toolbar might have something to do with that...
| 2:59 pm on Mar 11, 2004 (gmt 0)|
A nice potential spam technique (I give this a lot of thought these days, ever since Google started awarding spammers with top positions.)
Put up several dupe pages, each with different word stems (esp. in title) to ensure that at least one will rank highly. Wait for the new daft Google to pick them up, then put a robots exclude on those pages.
They'll be in the index for weeks, because G is very slow to orphan pages.
You can't be done for it either - after all - there's a no-index in place.
Do you think I approve of it? - well I don't.
Have I done it? - no.
So why am I suggesting it?
Simple - Google is turning us all into spammers. Small niche sites have suffered, but the big ones have risen, like turds in dark water, to the top.
Might as well subvert it.
| 4:37 pm on Mar 14, 2004 (gmt 0)|
>I have isloated a page that now has no inbound links from other sites and the site itself.
Are you absolutely, poitively sure of that? Somebody could have posted that URL on some minor message board, and Google keeps finding this page that way. And, it should be noted that even if there are really no inbound links now, this could change tomorrow. Someone might find that URL tomorrow searching Google, and post it on a message board on the Net somewhere. So long as you leave this page up, it may never disappear from Google.
| 11:57 am on Mar 15, 2004 (gmt 0)|
If you use a custom 404 error page, how do you return a 404 error code or even a 410 error code as someone suggested?
I have just added the following code to our custom 404 error pages - hopefully this will remove all our old pages?
<meta name="robots" content="noindex">