"NOINDEX, NOFOLLOW", how long?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

"NOINDEX, NOFOLLOW", how long?

jack38

7:41 am on Jul 11, 2005 (gmt 0)

Just added <meta name="ROBOTS" content="NOINDEX, NOFOLLOW"> to a couple of pages.

How long does it take for pages to be dropped from Google's index?

Mito99

1:55 pm on Jul 11, 2005 (gmt 0)

I added it almost two months ago and nothing. Google seems to be busy with other things right now. I finally got rid of those pages using the removal tool.

webspy

9:25 pm on Jul 11, 2005 (gmt 0)

You definitely have to use the removal tool if you want to get rid of 'special' pages.

g1smd

9:46 pm on Jul 11, 2005 (gmt 0)

Two years ago, adding that to a page would get Google to drop it within a month or so.

Nowadays, you have to force Google to remove stuff that you don't want indexed - but be aware that they will automatically add it back in again after 180 days.

caveman

10:03 pm on Jul 11, 2005 (gmt 0)

I've not added the NOINDEX to any pages (that were already indexed) in a long while. Is this now a widely held view that webmasters must use the removal tool, and that the pages will come back to haunt in 180 days anyway, or perhaps just related to burps associated with recent updates?

I may need to do this soon, but I have little enthusiasm for submitting pages to the removal tool, especially in cases where there are a lot of pages (necessary site redesign). I know, I know. :/

g1smd

10:18 pm on Jul 11, 2005 (gmt 0)

Depending on exactly how you removed them, they come back after exactly 90 or 180 days - without fail.

jdMorgan

10:19 pm on Jul 11, 2005 (gmt 0)

It has to do with how often those pages are spidered, followed by the delay until the changes make it into the next update.

Basically, if you are in a hurry, use the removal tool. Then leave the noindex directive on the page to prevent it from reappearing in the SERPs after 180 days.

Jim

Rollo

10:27 pm on Jul 11, 2005 (gmt 0)

I think the key is to do a noindex,nofollow before it gets indexed in the first place. If you want it out, the fastest thing to do is change the URL first adding your robots tag. Google should drop the old URL(now a 404) pretty fast.

Span

10:47 pm on Jul 11, 2005 (gmt 0)

Would negative cloaking work? Serving only Google a 410 Gone? Something like this:

Redirect gone /folder/file.html

or, for a lot of files:

RedirectMatch gone ^/folder/(subfolder1¦subfolder2¦subfolder3¦subfolder4)/.*

These are examples from pages that were really gone. I noticed Googlebot stops asking for files that are served a 410 Gone for and they stop showing up in the SERPs. So maybe there's a way, using mod_rewrite, to serve only Gbot a 410? Something like this:


RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteCond %{REQUEST_URI} ^/folder/file.html$
RewriteRule .* - [G]

I have not tested this, it's just an idea.
From reading these forums I understand Google uses "not so wellknown IPs" and at least one other UA string (Python?) so be careful.

Anyways, the noindex,nofollow meta tag alone won't work. Google has to fetch the page to read that tag.

caveman

10:56 pm on Jul 11, 2005 (gmt 0)

g1smd/jdMorgan, thanks. So, if I'm not in a hurry, I can reliably use the NOINDEX to eventually get them out...and keep them out?

jk3210

11:46 pm on Jul 11, 2005 (gmt 0)

<<Depending on exactly how you removed them, they come back after exactly 90 or 180 days - without fail>>

I just removed 4,000+ pages using the removal console, now I'm going to have to do it again?

Won't "disallow: /removed-pages" in the robots file keep them from coming back?

jdMorgan

12:45 am on Jul 12, 2005 (gmt 0)

Google's quest for the 'hidden Web' has complicated things a bit more than they used to be, but let's not add confusion to that complication.

Regarding cloaking for Googlebot and serving a 410 response; Yes it would probably work. But then you'd presumably have on-site and inbound links to a page that returns 410 when Gbot tries to spider it. While the obsolete inbounds won't matter for a long time or maybe never, links remaining on your site pointing to a 'dead page' might be one of those 'more than a hundred' indicators of quality pages/sites that Google uses.

Many people get in trouble when they (mis)use HTTP response codes and mechanisms as 'quick' or 'easy' ways to do something other than exactly what these codes were intended to do and to mean... Disasters such as the 'easy' method of invoking a PHP content-handler script by defining that script as a custom error document for 404-Page Not Found. Sure, it will serve pages, but each one with a 404-Not Found error code! And then people wonder why the pages don't rank... I advise using the HTTP server response codes and mechanisms in a simple, straightforward manner only.

If you want to keep page content out of Google, Disallow the page in robots.txt. If you don't want Google to even mention a URL in search results, then don't Disallow it in robots.txt; Put a <meta name="robots" content="noindex"> tag on the page instead.

Pages disallowed in robots.txt often get a URL-listing-only if Googlebot finds a link to them; Google has complied with robots.txt by not fetching the page, but they can use the link text they find as keywords to return the URL in search results.

On the other hand, if you put the noindex meta-tag on the page and allow Googlebot to fetch it (so it can read the tag), then the page and URL should stay out of the index.

Anyway, if you've got pages you want removed fast and you want them to stay removed, the use the removal tool and also follow the Disallow or Noindex procedure above.

If you're not in a hurry, then the same steps apply, but you needn't use the removal tool.

Jim

sit2510

5:14 am on Jul 12, 2005 (gmt 0)

From [google.com...]

"If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, the webmaster must first insert the appropriate meta tags into the page's HTML code. Doing this and submitting via the automatic URL removal system will cause a temporary, 180-day removal of these pages from the Google index, regardless of whether you remove the robots.txt file or meta tags after processing your request."

jack38

7:19 am on Jul 12, 2005 (gmt 0)

This is what the removal tool says:

"Please keep in mind that submitting via the automatic URL removal system will cause a temporary, six months, removal of your site from the Google index. You may review the status of submitted requests in the column to the right."

I assume that my whole site wont be removed, just the few url's that I submit?

Wizard

8:38 am on Jul 12, 2005 (gmt 0)

Then leave the noindex directive on the page to prevent it from reappearing in the SERPs after 180 days

Are you sure it will work? In my case, returning 404 wasn't enough to prevent page reappearance, so is it better to return 200 and noindex?