How to include a single file for crawling?

Forum Moderators: open

Message Too Old, No Replies

How to include a single file for crawling?

Crawling

manomahendran

6:12 am on Jan 18, 2003 (gmt 0)

Hello,
I would like to crawl
[mydomain.com...]
I wish to crawl just that one file (index.htm) in my domain such that it appears in search results if appropriate all th other file must not be crawled. I have more than 40 such domains and files (one file per domain).
Thanks,
M

jomaxx

8:08 am on Jan 18, 2003 (gmt 0)

This has already been answered once (by me) in the last thread you started, but anyway here is a link to a page where Google answers this question thoroughly:
[google.com...]

Chico_Loco

8:35 am on Jan 18, 2003 (gmt 0)

Jo Maxx

You're kinda right, but then again you're not..

Using the NOARCHIVE tag will eliminate the cached snapshot, but the page will still be indexed.

Nomally I'd suggest the ROBOTS, NOINDEX,NOFOLLOW tag, but that isn't even a great solution to this problem as you specifically want to know how to prevent bots from crawling pages, not just refraining them from being indexed..

The only answer is to use the robots.txt file and ban each file separatly (or the single directory inside of which all files not be crawled reside?).

jomaxx

4:42 pm on Jan 18, 2003 (gmt 0)

Did I say NOARCHIVE? I obviously should have said NOINDEX in that part of the response and I apologize for that.

manomahendran

4:41 pm on Jan 20, 2003 (gmt 0)

Hi All,
I want Google to crawl this url and 40 others like this with just one page from each. Also I do not control the 40 URL's I am trying to crawl so I can not use robots.txt file.
I was wondering if their is a way in the appliance to crawl just one file in a site.
Thanks,
M