Getting Pages Removed

Forum Moderators: open

Message Too Old, No Replies

Getting Pages Removed

tedster

7:25 am on Jul 10, 2004 (gmt 0)

What's the most efficient way to have Google remove pages that are already indexed, without creating "issues"?

One of my clients has a 2,000 page website and has begun developing another domain. But some <expletive> took a good chunk of the copy intended for the new domain and put it on the old domain.

So that content is now indexed and showing up in the SERPs - and their content editor is being more than a bit dense about allowing it to be removed/replaced.

When the new domain launches in a few weeks, there's no way I want it to compete against duplicate copy on the old PR7 domain.

In this situation, would you suggest:

1. robots.txt
2. robots meta tag
3. Google's removal process
4. some combination of 1,2,3
5. something else altogether
5. nothing at all

robotsdobetter

8:23 am on Jul 10, 2004 (gmt 0)

I would use both 1 and two, why don't you just redirect it to the new pages?

tedster

9:25 am on Jul 10, 2004 (gmt 0)

Thanks.

It's a bit too complex for a simple re-direct. The outlaw pages were placed in a highly integrated position in the site's information architecture - so we don't want to send those visitors to a different domain, we want them to have the full nav template for the existing domain.

Have you heard or seen problems using Google's "remove" process?

ciml

10:00 am on Jul 10, 2004 (gmt 0)

> 1. robots.txt

This would save on bandwidth (not an issue I expect) and will (eventually) remove the problem of near-duplicate pages. It would still leave Google cluttered with URL-only listings if there are links to the pages.

> 2. robots meta tag

Although you have to wait for the page to be re-crawled, this completely removes the listing from Google.

> 3. Google's removal process

This would be time consuming, and I would consider it overkill for cases where these isn't some legal/embarrassment issue with the content.

> 4. some combination of 1,2,3

The removal process requires either /robots.txt or robots meta exclusion.

There is no point using the robots meta tag is you have /robots.txt as the URLs will never be fetched and Google won't know to exclude them (as URL-only listings) from the index.

> 5. something else altogether

A 'This page has moved' page, with a large text link. You'd get to keep the PageRank (losing only one thirtieth of one notch on the Toolbar) but it's an inelegant approach and some small percentage of human visitors would not follow the link.

I'd use 301 redirects. Usually, Google do the right thing and list the redirect destination URL, assigning it the links and PR of the redirect source URL.
`
> 5. nothing at all

Not a disaster. The 'duplicate content penalty' is a myth IMO but you may find the pages on the old domain listed instead of the pages on the new domain. It fails to give the new site the link benefit of the old.

cabbie

10:01 am on Jul 10, 2004 (gmt 0)

No 2 has worked for me with google.I guess you would want no 1 for all SE's.
Googles removal submission has been reported to have been working too easily lately.
But I have used content from one of my sites on another one without any problems with G by using GOOGLEBOT NOINDEX FOLLOW.
i didn't even bother to disallow from following.

EarWig

10:13 am on Jul 10, 2004 (gmt 0)

tedster
Hope this helps

GG has stated:
"If a page is in robots.txt, we won't crawl it, but we can still return it as a search result if we have good evidence that the page is relevant to a query. In this case, we'll return just the url (no title and no cached page because we didn't fetch the page itself).

If you don't want the page to show up at all, you can guarantee that by letting Google see the noindex meta tag by fetching that page."

Regards
Ray
EW

tedster

11:48 am on Jul 10, 2004 (gmt 0)

Thanks everyone. I've never had to deal with anything like this before, but you've helped clarify and confirm my thinking.

The main job right now is not to let the large, established site grab the rankings that we want to see for the new domain. I have no concern about a 'penalty' as such, just the competition where Google may choose only to list one domain.

Eventually (who knows how soon) we'll replace that content that should never have been published in the first place and then lift the robots meta tag (that's all I think I'm going to do here.) Probably we'll create new file names for the new content.

This was a strange one - the so-called content creator somehow found this content on the organization's network, buried in their CMS somewhere. He didn't know why it was being developed, and probably thought it was an abandoned project. So he published it claiming he wrote it, and the content manager didn't know any better. He was paid for it, but is no longer on the job.

All this proves my theory about CMS - no content management system can be any better than the content manager. And many times it's a better manager, and not a better CMS, that is the real need.

By the way - this is over 100 pages worth of content, essentially most of the new website which has been painstakingly developed over 18 months. It's intended to be an important site for this organization going forward for many years, and tied into print and broadcast marketing campaigns.