Welcome to WebmasterWorld Guest from 54.145.144.101

Message Too Old, No Replies

Is noindex, follow is better than robots.txt?

Because Google sees that you hide nothing.

   
3:18 am on Mar 29, 2005 (gmt 0)

10+ Year Member



I have a vague feeling that it would be safer to use noindex follow in the meta tag rather than robots.txt.

The idea is that it gives Google a chance to be sure that I have nothing to hide and no filter will be applied to my site.

Is it really so or I am wrong? Have you any evidence for or against?

On the other hands, if Google does not index a page with noindex, does it still use this page to determine the topic of the site? Does someone knows something about this?

Thank you,
Vadim

10:06 am on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



if Google does not index a page with noindex, does it still use this page to determine the topic of the site?

Google doesn't determine the topic of sites, it determines the topic of pages.

10:13 am on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



if Google does not index a page with noindex, does it still use this page to determine the topic of the site?
If Google doesn't spider the page how would they really know anyway?
10:47 am on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



it is perfectly legitimate to ban sections of a site using robots.txt

check out how huge sites such as amazon use robots.txt for this purpose

11:11 am on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's two different things.

robots.txt will tell Google not to spider a page. If that page is already in the index, it will not be spidered anymore - however, it will not be removed because you put it in robots.txt.

Noindex, otoh, removes the page from the index.

1:14 pm on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



"noindex,follow" is much more flexible than a robots.txt. One example how I use it.

I have a personal blog where I write my daily live: about once each day a new post. One day I noticed that the front page and the data pages (for example www.example.com/2005/03/29) were present in the Google SEPRs, but the real articles were not. Because I write about one page per day, the content of the day-list is almost identical to the content of the real article. Google used the dupe content filter and decided that the date pages were more important than the real articles because of the higher amount of incomming links.

I didn't like Google's descision, because now pages were indexed in Google without proper path name and title.

Therefore I changed the weblog software in such a way that "noindex,follow" is added to all date pages, front pages, category lists etc. Now all date pages, aggregate pages etc. have disappeared from the SERPs and they are replaced by the articles with proper titles.

It would have been very difficult to do this with a robots.txt because the content of the site changes on a daily base. Furthermore robots.txt stops spidering so denying the weblog root would probably cause the total weblog to disappear from the SERPs. the "noindex,follow" only stops indexing but deeper pages are still accessible and Google indexes them without problems.

1:31 pm on Mar 29, 2005 (gmt 0)

10+ Year Member



Noindex, otoh, removes the page from the index.

Sort of. I have a few 'dead' pages that I overwrote with a simple 'this page has moved to [new url]'* and added a robots 'no index,follow' meta tag. This change was 14th Dec 2004. Today Google still includes the URL in a site:www.mydomain.com search, and its title and snippet is from the OLD page content. The cache link returns a 'Your search did not match any documents'. Looks like Google is obeying the 'no index' for the current content, but still has the URL indexed and retains the old, pre-noindex, content for titles and snippets.

*Don't ask why I didn't just 301 it, there was a reason but I don't recall now. Also, this was just a simple link, no on-page redirect.

1:38 pm on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It would have been very difficult to do this with a robots.txt because the content of the site changes on a daily base. Furthermore robots.txt stops spidering so denying the weblog root would probably cause the total weblog to disappear from the SERPs

disallow: /2003
disallow: /2004
disallow: /2005
etc..

Wouldn't that be easier?

2:02 pm on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



disallow: /2003
disallow: /2004
disallow: /2005

Wouldn't that be easier?

No, my weblog software by default stores the posts in a directory structure with the date attached, for example

www.example.com/2005/03/29/my-day-at-work.html

3:12 pm on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



www.example.com/2005/03/29/my-day-at-work.html

Ah, sorry, I thought those were the files you wanted excluding.

So what are these www.example.com/2005/03/29 pages? What's their purpose and how is Googlebot finding them?

3:24 pm on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



My weblog software automatically makes threads with all posts from one day. These day-threads are named like www.example.com/2005/03/29. These threads are accesible from a calendar in the margin and I do not want to remove the calendar because it is an easy feature to browse the weblog.

No problem if I post many times a day, but with one post each day the content of this day-thread and the post itself only differ in the filename and the title. The dupe content filter sees that many pages have the calendar in the margin so there are many links to the day-thread, but only one or two links are pointing to each individual posts. Therefore the individual post is marked duplicate and the day-thread is indexed.

8:06 pm on Mar 29, 2005 (gmt 0)



"Noindex, otoh, removes the page from the index."

if I noindex /page.html, and 1 month later I remove the noindex, will G index it again, with no problems /penalties because of the previous noindex?

I am replacing /moving some content to a new domain and need something like this.

10:02 pm on Mar 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm sorry if i was wrong about noindex. It seems to have worked for me, but perhaps i have requested removal as well, i tend to do that occasionally with obsolete pages.
2:13 am on Mar 30, 2005 (gmt 0)

10+ Year Member



If Google doesn't spider the page how would they really know anyway?

With noindex *follow* Google probably at least reads the content of the page to follow the links. Google naturally does't use the content for their main index, but Google may still use it to detemine the topic of the page and may be the topic of the site.

Google may also use the content to see that there are no cloaking.

The question is: does Google really do this?

Vadim.

Vadim.

6:22 am on Mar 30, 2005 (gmt 0)

10+ Year Member



Google has cached multiple versions of my pages (upper-case/lower-case characters). To clean it up, I put the NOFOLLOW tag on and submit the unwanted URLs to the Google removal tool.

This works well but I'm concerned about what will happen if Googlebot visits while I'm doing this. Can anyone tell me if this might cause Google to stop crawling my site?

Sorry if this is a bit off topic.

7:48 am on Mar 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> if this might cause Google to stop crawling my site?

NOFOLLOW, NOINDEX, and NOARCHIVE are tags for individual pages. Having them on one page does not stop Googlebot from crawling and/or indexing another.

1:22 pm on Mar 30, 2005 (gmt 0)



"NOFOLLOW, NOINDEX, and NOARCHIVE are tags for individual pages."

what if a week later you remove the NOINDEX tag? will google index the page then?

2:53 pm on Mar 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dunno, i have actually never tried that, but my guess is "yes". Anybody tried it?

*bump*

11:55 am on Mar 31, 2005 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Great question.
> have a vague feeling that it would be safer to use
> noindex follow in the meta tag rather than robots.txt.

I think the bottom line answer is that there is no functional difference in the real world.

From what we have seen here talking with so many people - Google believes that the robots.txt settings are a NO INDEX setting. The do not believe that "disallow" whens they can not legitimatly spider that data and use it. That means, they feel that spidering and using any data from your site that they wish - but not listing it - is ok.

We've seen alot of comments over the years by people saying that Gbot does not follow robots.txt because Google will not index - but will spider and visit pages listed in robots.txt.

I have never seen a page that is "delisted" any differently because of a robots.txt entry or because of a NO INDEX tag added to a page. I will seek clarification on this point.

> remove noindex

Yes, I recently did that on an entire site and Google picked right up on it and started listing the site with in a couple weeks.

2:57 pm on Mar 31, 2005 (gmt 0)



thank you for the clarification Brett.
12:35 am on Apr 1, 2005 (gmt 0)

10+ Year Member



Thanks a lot for all who answered my post.

Special thanks for Brett who cleared up the question.

Vadim.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month