I have a question for you guys. If i want to stop google crawl and index some of my pages, which attribute is better to use - no follow or noindex? Can't seem to understand which one.
What about noodp,noydir?
not2easy
2:02 pm on Apr 4, 2014 (gmt 0)
Crawling and indexing are two different goals. You can use a noindex metatag on pages you do not want indexed. Those pages may still be crawled, especially if there are links to those pages anywhere online including your sitemaps. The most certain way to prevent crawling and indexing by Google is to disallow the page, folder or URL syntax in your robots.txt file and use the noindex tag. I don't know of any way to guarantee it won't happen anyway, there have been several discussions here on the topic.
n0tSEO
4:58 pm on Apr 4, 2014 (gmt 0)
You can robot out those pages and be done with it. :)
lucy24
7:57 pm on Apr 4, 2014 (gmt 0)
When you say "nofollow" do you mean links on the pages you want to exclude, or leading to the pages? Either way, <nofollow> doesn't mean "pretend you haven't seen this link", it only means "don't tell them I sent you".
m112
8:28 pm on Apr 4, 2014 (gmt 0)
I always do noindex + blocking them in the robots.txt to be sure
phranque
1:13 am on Apr 5, 2014 (gmt 0)
nofollow is about the links on a page. noindex is about the content on a page. you want noindex.
if you exclude a bot from crawling with robots.txt then it won't see the noindex because it won't request the url.
sven123
1:32 pm on Apr 5, 2014 (gmt 0)
Use nofollow if you want google to not index a link on your site. Otherwise, use noindex and block it on robots.txt also if you still paranoid. Better safe than sorry.
aakk9999
10:57 pm on Apr 5, 2014 (gmt 0)
I can see there is lots of confusion around this subject. Lets try to explain this with library and books analogy...
If you block it via robots.txt I am not allowed to go to the shelf to open that book. In fact I cannot even check whether the book really exists on that shelf. But if someone else (not you) is talking about this book, I may catalogise it, and have it in my library index, based on what others said about it, and may offer it to my visitors if they ask me about the subject and I think your book is a good resource, based on what others said about it (because I was not allowed to go and have a look myself). In fact, I may catalogise it anyway, just in case someone asks about this exact book location. I can then tell - yes, I know it exists, but I do not know what is in there.
If you add meta robots noindex directive I know where the book is and I am allowed to get it. Only when I open the book up, then I can see whether I am allowed to add it to the index or not, and if it is noindex directive, I am not. But I am allowed and will still read it and if this book mentions other books, I will make a note of it, so that I can check these too.
If you add meta robots nofollow or rel="nofollow" I know where the book is. I can open it up and read it. I am allowed to catalogise this book so I can add it to my index. But when I find that this book is talking about other book(s), I should not pay attention. But (importantly) despite "nofollow", I may still go and check these other books that were mentioned if someone else mentions them. The fact that you have nofollow does not mean "not allowed to go there", it more means "you did not hear about it from me and if I am the only one mentioning it, pretend you did not hear it".
n0tSEO
7:37 am on Apr 6, 2014 (gmt 0)
aakk9999, that's a genial way to put it. ;-)
andrewc, if you need the stronger type of exclusion, got for the robots.txt method.
ergophobe
4:19 pm on Apr 6, 2014 (gmt 0)
>>stronger exclusion
It isn't "stronger" (quantitative difference) but "different" (qualitative difference). The robots exclusion is not a stronger version of noindex, it is a completely different thing with a different purpose.
What's stronger, forged steel or cayenne pepper?
As aakk9999 has pointed out, you must *first* noindex and wait for Google to take those pages out of the index before you block the crawl.
If I add the noindex and the robots.txt exclusion at the same time, Google will never crawl those pages again, which means it will never read them, which means it will never see the noindex tag which means the page will be semi-permanently in the index.
First noindex and test periodically to see when the page has fallen from the index.
Once it has, then you can add your robots exclusions.
andrewc
9:51 am on Apr 7, 2014 (gmt 0)
aakk9999 thank you for your great response! I got it now!