Removing auto generated orphan pages (now supplemental) from Google

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Removing auto generated orphan pages (now supplemental) from Google

MrMoore

11:03 am on Jun 13, 2007 (gmt 0)

We have a website that has a lot of auto generated orphan pages in the Google supplemental index. What is the best way to remove them? I was thinking of using our robots.txt or would it be simply better to just make sure that they return a status code of 404?

Silvery

7:18 pm on Jun 15, 2007 (gmt 0)

You haven't described much about what is happening. Technically, an orphaned page would be one that doesn't have any links to it, but one supposes that you do indeed have links to these, else Google wouldn't have indexed them.

In any case, there are a few ways to handle. Perhaps these are duplicates that you don't want/need indexed? Like the print versions of article pages or something?

If you have links on your site that point to duplicate pages or pages which are undesirable to have indexed, you might place the rel="NOFOLLOW" attribute within their A HREF tags.

Also, you could set up their META Robots tag to specify that the page should be NOINDEXed.

Finally, if they all occupy the same subdirectory on your server, you could specify that they not be indexed in the site's robot.txt file.

MrMoore

2:02 pm on Jun 18, 2007 (gmt 0)

Thanks’ for the reply. I’ll try to elaborate, for one reason or another we have loads of pages (2,000 plus) that we wish to no longer be indexed because they’re either duplicates or add very little value for the user.

At the moment they appear to be orphan pages and my guess is that they were linked to at some time in the past on a previous incarnation of the site.

The problem is the best way to get rid of them? Either remove them from the index by using the methods you outlined or just by simply deleting them from the site.

The robot.txt sounds like the best idea because they’re all in roughly the same directories. Do you think removing lot’s of pages in one go is a good idea or should I try and phase the removal in gradually?

Halfdeck

4:54 am on Jun 25, 2007 (gmt 0)

Two possible solutions:

1. Stop linking to those pages.
2. Use robots.txt disallow.

As long as you aren't pointing your links at those pages, they aren't hurting your site.

walkman

2:07 pm on Jun 25, 2007 (gmt 0)

use the removal feature from google.