Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: open
joined:Dec 29, 2003
I don't think I've come across a case of Google fetching URLs that return 404, and then not removing them speedily.
There is likely to be a period of time where the crawler is served a 404 but does not remove the page from the listing in case it was an accidental deletion.
Not sure what this is though. What I have done where major reconstructions have resulted in 404 pages being returned frequently until a reindex is to customise the 404 to provide more helpful information similar to that of a sitemap.
If you want it to go away, just remove it. It is not illegal to return a 404 Not Found response :-)
Better yet, remove it and set up a redirect that returns a 410 Gone response. Google might drop the dead listing quicker that way, since this makes clear that it's not a temporary error caused by forgetting to update a link.
I have several almost duplicate pages up at the moment as I don't want the visitor finding 404's Page Not Found. So should I remove the content from the "almost duplicate" pages and do a redirect? Are there different flavours of redirect? Is there some flavour of redirect that Google doesn't like?
Thanks for any help.
Putting the domain back on line, resulted in a #1 listing about 3 days later. Have now donated the domain to someone else (related subject) to put their new content on.
Do a 301 permanent redirect from the URL's that you are moving to the pages you want to keep.
You defintely need to delete or rename pages that you don't want found- I have a few that have been orphans for a year and a half that still get traffic from Google for obscure searches.
If you have content you don't want the bot to find be sure to put the robots.txt file up to keep it out. The bot, says Craig, can find content that's unlinked. That's right, the Google bot can find single pages dangling unlinked in space. He didn't explain how this happens.
Details can be found here.
Put up several dupe pages, each with different word stems (esp. in title) to ensure that at least one will rank highly. Wait for the new daft Google to pick them up, then put a robots exclude on those pages.
They'll be in the index for weeks, because G is very slow to orphan pages.
You can't be done for it either - after all - there's a no-index in place.
Do you think I approve of it? - well I don't.
Have I done it? - no.
So why am I suggesting it?
Simple - Google is turning us all into spammers. Small niche sites have suffered, but the big ones have risen, like turds in dark water, to the top.
Might as well subvert it.
Are you absolutely, poitively sure of that? Somebody could have posted that URL on some minor message board, and Google keeps finding this page that way. And, it should be noted that even if there are really no inbound links now, this could change tomorrow. Someone might find that URL tomorrow searching Google, and post it on a message board on the Net somewhere. So long as you leave this page up, it may never disappear from Google.
I have just added the following code to our custom 404 error pages - hopefully this will remove all our old pages?
<meta name="robots" content="noindex">