homepage Welcome to WebmasterWorld Guest from 54.197.171.109
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Is noindex, follow is better than robots.txt?
Because Google sees that you hide nothing.
Vadim




msg:744042
 3:18 am on Mar 29, 2005 (gmt 0)

I have a vague feeling that it would be safer to use noindex follow in the meta tag rather than robots.txt.

The idea is that it gives Google a chance to be sure that I have nothing to hide and no filter will be applied to my site.

Is it really so or I am wrong? Have you any evidence for or against?

On the other hands, if Google does not index a page with noindex, does it still use this page to determine the topic of the site? Does someone knows something about this?

Thank you,
Vadim

 

mrMister




msg:744043
 10:06 am on Mar 29, 2005 (gmt 0)

if Google does not index a page with noindex, does it still use this page to determine the topic of the site?

Google doesn't determine the topic of sites, it determines the topic of pages.

robotsdobetter




msg:744044
 10:13 am on Mar 29, 2005 (gmt 0)

if Google does not index a page with noindex, does it still use this page to determine the topic of the site?
If Google doesn't spider the page how would they really know anyway?
topr8




msg:744045
 10:47 am on Mar 29, 2005 (gmt 0)

it is perfectly legitimate to ban sections of a site using robots.txt

check out how huge sites such as amazon use robots.txt for this purpose

claus




msg:744046
 11:11 am on Mar 29, 2005 (gmt 0)

It's two different things.

robots.txt will tell Google not to spider a page. If that page is already in the index, it will not be spidered anymore - however, it will not be removed because you put it in robots.txt.

Noindex, otoh, removes the page from the index.

lammert




msg:744047
 1:14 pm on Mar 29, 2005 (gmt 0)

"noindex,follow" is much more flexible than a robots.txt. One example how I use it.

I have a personal blog where I write my daily live: about once each day a new post. One day I noticed that the front page and the data pages (for example www.example.com/2005/03/29) were present in the Google SEPRs, but the real articles were not. Because I write about one page per day, the content of the day-list is almost identical to the content of the real article. Google used the dupe content filter and decided that the date pages were more important than the real articles because of the higher amount of incomming links.

I didn't like Google's descision, because now pages were indexed in Google without proper path name and title.

Therefore I changed the weblog software in such a way that "noindex,follow" is added to all date pages, front pages, category lists etc. Now all date pages, aggregate pages etc. have disappeared from the SERPs and they are replaced by the articles with proper titles.

It would have been very difficult to do this with a robots.txt because the content of the site changes on a daily base. Furthermore robots.txt stops spidering so denying the weblog root would probably cause the total weblog to disappear from the SERPs. the "noindex,follow" only stops indexing but deeper pages are still accessible and Google indexes them without problems.

quiet_man




msg:744048
 1:31 pm on Mar 29, 2005 (gmt 0)

Noindex, otoh, removes the page from the index.

Sort of. I have a few 'dead' pages that I overwrote with a simple 'this page has moved to [new url]'* and added a robots 'no index,follow' meta tag. This change was 14th Dec 2004. Today Google still includes the URL in a site:www.mydomain.com search, and its title and snippet is from the OLD page content. The cache link returns a 'Your search did not match any documents'. Looks like Google is obeying the 'no index' for the current content, but still has the URL indexed and retains the old, pre-noindex, content for titles and snippets.

*Don't ask why I didn't just 301 it, there was a reason but I don't recall now. Also, this was just a simple link, no on-page redirect.

mrMister




msg:744049
 1:38 pm on Mar 29, 2005 (gmt 0)

It would have been very difficult to do this with a robots.txt because the content of the site changes on a daily base. Furthermore robots.txt stops spidering so denying the weblog root would probably cause the total weblog to disappear from the SERPs

disallow: /2003
disallow: /2004
disallow: /2005
etc..

Wouldn't that be easier?

lammert




msg:744050
 2:02 pm on Mar 29, 2005 (gmt 0)

disallow: /2003
disallow: /2004
disallow: /2005

Wouldn't that be easier?

No, my weblog software by default stores the posts in a directory structure with the date attached, for example

www.example.com/2005/03/29/my-day-at-work.html

mrMister




msg:744051
 3:12 pm on Mar 29, 2005 (gmt 0)

www.example.com/2005/03/29/my-day-at-work.html

Ah, sorry, I thought those were the files you wanted excluding.

So what are these www.example.com/2005/03/29 pages? What's their purpose and how is Googlebot finding them?

lammert




msg:744052
 3:24 pm on Mar 29, 2005 (gmt 0)

My weblog software automatically makes threads with all posts from one day. These day-threads are named like www.example.com/2005/03/29. These threads are accesible from a calendar in the margin and I do not want to remove the calendar because it is an easy feature to browse the weblog.

No problem if I post many times a day, but with one post each day the content of this day-thread and the post itself only differ in the filename and the title. The dupe content filter sees that many pages have the calendar in the margin so there are many links to the day-thread, but only one or two links are pointing to each individual posts. Therefore the individual post is marked duplicate and the day-thread is indexed.

walkman




msg:744053
 8:06 pm on Mar 29, 2005 (gmt 0)

"Noindex, otoh, removes the page from the index."

if I noindex /page.html, and 1 month later I remove the noindex, will G index it again, with no problems /penalties because of the previous noindex?

I am replacing /moving some content to a new domain and need something like this.

claus




msg:744054
 10:02 pm on Mar 29, 2005 (gmt 0)

I'm sorry if i was wrong about noindex. It seems to have worked for me, but perhaps i have requested removal as well, i tend to do that occasionally with obsolete pages.

Vadim




msg:744055
 2:13 am on Mar 30, 2005 (gmt 0)

If Google doesn't spider the page how would they really know anyway?

With noindex *follow* Google probably at least reads the content of the page to follow the links. Google naturally does't use the content for their main index, but Google may still use it to detemine the topic of the page and may be the topic of the site.

Google may also use the content to see that there are no cloaking.

The question is: does Google really do this?

Vadim.

Vadim.

Vec_One




msg:744056
 6:22 am on Mar 30, 2005 (gmt 0)

Google has cached multiple versions of my pages (upper-case/lower-case characters). To clean it up, I put the NOFOLLOW tag on and submit the unwanted URLs to the Google removal tool.

This works well but I'm concerned about what will happen if Googlebot visits while I'm doing this. Can anyone tell me if this might cause Google to stop crawling my site?

Sorry if this is a bit off topic.

claus




msg:744057
 7:48 am on Mar 30, 2005 (gmt 0)

>> if this might cause Google to stop crawling my site?

NOFOLLOW, NOINDEX, and NOARCHIVE are tags for individual pages. Having them on one page does not stop Googlebot from crawling and/or indexing another.

walkman




msg:744058
 1:22 pm on Mar 30, 2005 (gmt 0)

"NOFOLLOW, NOINDEX, and NOARCHIVE are tags for individual pages."

what if a week later you remove the NOINDEX tag? will google index the page then?

claus




msg:744059
 2:53 pm on Mar 30, 2005 (gmt 0)

Dunno, i have actually never tried that, but my guess is "yes". Anybody tried it?

*bump*

Brett_Tabke




msg:744060
 11:55 am on Mar 31, 2005 (gmt 0)

Great question.
> have a vague feeling that it would be safer to use
> noindex follow in the meta tag rather than robots.txt.

I think the bottom line answer is that there is no functional difference in the real world.

From what we have seen here talking with so many people - Google believes that the robots.txt settings are a NO INDEX setting. The do not believe that "disallow" whens they can not legitimatly spider that data and use it. That means, they feel that spidering and using any data from your site that they wish - but not listing it - is ok.

We've seen alot of comments over the years by people saying that Gbot does not follow robots.txt because Google will not index - but will spider and visit pages listed in robots.txt.

I have never seen a page that is "delisted" any differently because of a robots.txt entry or because of a NO INDEX tag added to a page. I will seek clarification on this point.

> remove noindex

Yes, I recently did that on an entire site and Google picked right up on it and started listing the site with in a couple weeks.

walkman




msg:744061
 2:57 pm on Mar 31, 2005 (gmt 0)

thank you for the clarification Brett.

Vadim




msg:744062
 12:35 am on Apr 1, 2005 (gmt 0)

Thanks a lot for all who answered my post.

Special thanks for Brett who cleared up the question.

Vadim.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved