Force the spider to not index pages

Forum Moderators: goodroi

Message Too Old, No Replies

Force the spider to not index pages

cfmtravel

9:29 pm on Aug 26, 2014 (gmt 0)

Hi
I'm thinking to force the spider to not index pages like "contact" or similar (for example pages where the users can modify some information, suggest me some modification and so on).

The idea was to place on the HEAD of those pages the tag
<META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW">

This instruction will be respect by the serious spiders I presume so I know that the "bad guys" will still pick up those pages.

What do you think?
It does not seem important to me to have those pages indexed in google, yahoo and so on.

Waiting for your opinion

Thanks a lot

krishseo

12:02 pm on Sep 5, 2014 (gmt 0)

instead of using meta robot tag, use robots.txt. Place your pages in robots.txt file with disallow:

disallow: /contact/

sure by the above process, your page will not index in any of the top search engines

once your page is updated in robots.txt file, then submit robots.txt file in webmaster tools

not2easy

3:18 pm on Sep 5, 2014 (gmt 0)

Do not use robots.txt instead of the metatag "noindex" or it will stay in the index. If you use robots.txt without first adding the noindex metatag you outlined above, you will prevent compliant robots like Google's from knowing that you want the pages to be noindexed. Blocking a page or directory in robots.txt prevents the robot from reading the noindex meta tag and their last (previous) crawl of the page did not have that instruction so it will still be in the index, but with a description that says your robots.txt file prevents an accurate description. You don't want that.

IF it is just a few pages, add in the metatag, go to your GWT account and use the "Fetch" tool to make sure they have seen the noindex tag. There is no need to block the pages in robots.txt, but if you are going to do that, use the proper syntax: Disallow: (not disallow: ). Remove the noindexed pages from your GWT sitemap.

lucy24

7:13 pm on Sep 6, 2014 (gmt 0)

I'm thinking to force the spider to not index pages

Fundamental error here: the spider doesn't index anything. The spider only crawls. Indexing is a completely separate process.

If you use robots.txt to block crawling, then the search engine will never see the "noindex" directive. There are a couple of recent threads that explain the crawl vs. index difference in great detail. (not2easy? How's your memory?)