NoIndex Clarification

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

NoIndex Clarification

Trying to Stop Duplicate Content Indexing.

Kelcor

9:24 pm on Sep 27, 2006 (gmt 0)

We are trying to stop any duplicate content caused when page parameter is missing from the URL.

We setup a vb sub that will check for these parameters in the URL. If these parameters are MISSING it will display <META NAME="Robots" CONTENT="noindex"> If the parameters are INCLUDED in the url it will display <META NAME="Robots" CONTENT="index, follow">

Will this stop ONLY the unwanted url from being listed in google, or will this cause the entire page file to be dropped?

After reading many posts by g1smd, I think this setup will work, but this seems important enough to ask before implementation.

wiseapple

10:24 am on Sep 28, 2006 (gmt 0)

We are using the "noindex, no follow" tag on may things. Things will eventually delist. The keyword is "eventually". Not sure how long it actually takes. (Weeks? Months?) -- I would guess that it will be months.

g1smd

10:35 am on Sep 28, 2006 (gmt 0)

If the URLs are in the normal index they will get dropped in just a few weeks.

If they are Supplemental, they will take a lot longer to disappear, but Google will get rid of them eventually. It may be months. It might be a year.

Only the URLs that serve a noindex tag will be dropped. Others will remain.

itloc

10:37 am on Sep 28, 2006 (gmt 0)

Hi Kelcor

From my experience that will work quiet well. Anyway - if you are able to find another solution it might be better. Googlebot has to read all of your documents first. And then they need to be processed. That takes some time...

If some of your documents are already marked as supplemental it may take a very long time to remove them.

Do you have lots of urls?

Regards

itloc

Patrick Taylor

10:48 am on Sep 28, 2006 (gmt 0)

I'm doing pretty much the same thing, except that I'm trying to stop any duplicate content caused when a parameter is present in the URL. I want only the plain URL to be indexed and my pages throw up the noindex meta when there's a parameter.

However, I've noticed that Googlebot is crawling the parametered URLs on a daily basis. The pages aren't indexed, as they were done this way from new, but I'm surprised at the regular ongoing crawling of pages with 'noindex'.

MSN, incidentally, has completely ignored the noindex and has indexed all the parametered URLs.

[edited by: Patrick_Taylor at 10:51 am (utc) on Sep. 28, 2006]

g1smd

10:51 am on Sep 28, 2006 (gmt 0)

Once Google knows about a URL they will crawl it forever looking to see if the status they have for that URL is still correct.

It must work that way, otherwise changes that you make will never be picked up.

What they crawl is a larger number of URLs than what they index content for. What they index is a larger number of URLs than they show in the search results.

Patrick Taylor

10:55 am on Sep 28, 2006 (gmt 0)

It must work that way, otherwise changes that you make will never be picked up.

Yes, I suppose so. Thanks. The odd thing is that these are the pages most frequently crawled (at present).