Welcome to WebmasterWorld Guest from 54.224.11.137

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

"NOINDEX, NOFOLLOW", how long?

     
7:41 am on Jul 11, 2005 (gmt 0)

Junior Member from IL 

10+ Year Member

joined:July 9, 2005
posts:53
votes: 0


Just added <meta name="ROBOTS" content="NOINDEX, NOFOLLOW"> to a couple of pages.

How long does it take for pages to be dropped from Google's index?

1:55 pm on July 11, 2005 (gmt 0)

New User

10+ Year Member

joined:Apr 5, 2004
posts:21
votes: 0


I added it almost two months ago and nothing. Google seems to be busy with other things right now. I finally got rid of those pages using the removal tool.
9:25 pm on July 11, 2005 (gmt 0)

New User

10+ Year Member

joined:July 5, 2005
posts:11
votes: 0


You definitely have to use the removal tool if you want to get rid of 'special' pages.
9:46 pm on July 11, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0



Two years ago, adding that to a page would get Google to drop it within a month or so.

Nowadays, you have to force Google to remove stuff that you don't want indexed - but be aware that they will automatically add it back in again after 180 days.

10:03 pm on July 11, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member caveman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 17, 2003
posts:3744
votes: 0


I've not added the NOINDEX to any pages (that were already indexed) in a long while. Is this now a widely held view that webmasters must use the removal tool, and that the pages will come back to haunt in 180 days anyway, or perhaps just related to burps associated with recent updates?

I may need to do this soon, but I have little enthusiasm for submitting pages to the removal tool, especially in cases where there are a lot of pages (necessary site redesign). I know, I know. :/

10:18 pm on July 11, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Depending on exactly how you removed them, they come back after exactly 90 or 180 days - without fail.
10:19 pm on July 11, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


It has to do with how often those pages are spidered, followed by the delay until the changes make it into the next update.

Basically, if you are in a hurry, use the removal tool. Then leave the noindex directive on the page to prevent it from reappearing in the SERPs after 180 days.

Jim

10:27 pm on July 11, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:July 5, 2004
posts:470
votes: 0


I think the key is to do a noindex,nofollow before it gets indexed in the first place. If you want it out, the fastest thing to do is change the URL first adding your robots tag. Google should drop the old URL(now a 404) pretty fast.
10:47 pm on July 11, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 30, 2004
posts:712
votes: 0


Would negative cloaking work? Serving only Google a 410 Gone? Something like this:

Redirect gone /folder/file.html

or, for a lot of files:

RedirectMatch gone ^/folder/(subfolder1¦subfolder2¦subfolder3¦subfolder4)/.*

These are examples from pages that were really gone. I noticed Googlebot stops asking for files that are served a 410 Gone for and they stop showing up in the SERPs. So maybe there's a way, using mod_rewrite, to serve only Gbot a 410? Something like this:


RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteCond %{REQUEST_URI} ^/folder/file.html$
RewriteRule .* - [G]

I have not tested this, it's just an idea.
From reading these forums I understand Google uses "not so wellknown IPs" and at least one other UA string (Python?) so be careful.

Anyways, the noindex,nofollow meta tag alone won't work. Google has to fetch the page to read that tag.

10:56 pm on July 11, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member caveman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 17, 2003
posts:3744
votes: 0


g1smd/jdMorgan, thanks. So, if I'm not in a hurry, I can reliably use the NOINDEX to eventually get them out...and keep them out?
11:46 pm on July 11, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 12, 2001
posts:1150
votes: 0


<<Depending on exactly how you removed them, they come back after exactly 90 or 180 days - without fail>>

I just removed 4,000+ pages using the removal console, now I'm going to have to do it again?

Won't "disallow: /removed-pages" in the robots file keep them from coming back?

12:45 am on July 12, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Google's quest for the 'hidden Web' has complicated things a bit more than they used to be, but let's not add confusion to that complication.

Regarding cloaking for Googlebot and serving a 410 response; Yes it would probably work. But then you'd presumably have on-site and inbound links to a page that returns 410 when Gbot tries to spider it. While the obsolete inbounds won't matter for a long time or maybe never, links remaining on your site pointing to a 'dead page' might be one of those 'more than a hundred' indicators of quality pages/sites that Google uses.

Many people get in trouble when they (mis)use HTTP response codes and mechanisms as 'quick' or 'easy' ways to do something other than exactly what these codes were intended to do and to mean... Disasters such as the 'easy' method of invoking a PHP content-handler script by defining that script as a custom error document for 404-Page Not Found. Sure, it will serve pages, but each one with a 404-Not Found error code! And then people wonder why the pages don't rank... I advise using the HTTP server response codes and mechanisms in a simple, straightforward manner only.

If you want to keep page content out of Google, Disallow the page in robots.txt. If you don't want Google to even mention a URL in search results, then don't Disallow it in robots.txt; Put a <meta name="robots" content="noindex"> tag on the page instead.

Pages disallowed in robots.txt often get a URL-listing-only if Googlebot finds a link to them; Google has complied with robots.txt by not fetching the page, but they can use the link text they find as keywords to return the URL in search results.

On the other hand, if you put the noindex meta-tag on the page and allow Googlebot to fetch it (so it can read the tag), then the page and URL should stay out of the index.

Anyway, if you've got pages you want removed fast and you want them to stay removed, the use the removal tool and also follow the Disallow or Noindex procedure above.

If you're not in a hurry, then the same steps apply, but you needn't use the removal tool.

Jim

5:14 am on July 12, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Jan 3, 2003
posts:325
votes: 0


From [google.com...]

"If you believe your request is urgent and cannot wait until the next time Google crawls your site, use our automatic URL removal system. In order for this automated process to work, the webmaster must first insert the appropriate meta tags into the page's HTML code. Doing this and submitting via the automatic URL removal system will cause a temporary, 180-day removal of these pages from the Google index, regardless of whether you remove the robots.txt file or meta tags after processing your request."

7:19 am on July 12, 2005 (gmt 0)

Junior Member from IL 

10+ Year Member

joined:July 9, 2005
posts:53
votes: 0


This is what the removal tool says:

"Please keep in mind that submitting via the automatic URL removal system will cause a temporary, six months, removal of your site from the Google index. You may review the status of submitted requests in the column to the right."

I assume that my whole site wont be removed, just the few url's that I submit?

8:38 am on July 12, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 15, 2005
posts:380
votes: 0


Then leave the noindex directive on the page to prevent it from reappearing in the SERPs after 180 days

Are you sure it will work? In my case, returning 404 wasn't enough to prevent page reappearance, so is it better to return 200 and noindex?