Forum Moderators: open

Message Too Old, No Replies

Using robots.txt with a FrontPage Search Bot

         

jenlynn

7:44 pm on Jan 13, 2004 (gmt 0)

10+ Year Member



I would like to exclude a 'test' directory of html files from my web site's search form. The search form uses a FrontPage search web bot. I have attempted to do so using a robots.txt file that reads:

User-agent: *
Disallow: /test/

However the pages in the test directory still appear in the search results. The robots.txt file is located in the root directory along with the default page. Also I recalculated all hyperlinks in FrontPage.

I've wasted half of the day on searching for a solution. Can anyone help me out?

Also, I if anyone would like to share any knowledge of any other simple site search tool it would be greatly appreciated.

Thank you.

troels nybo nielsen

8:12 pm on Jan 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld, jenlynn.

Not sure that I completely understand your problem. Might it be that search engines simply need some time to let that listing disappear from their databases? Global search engines do that.

I'd suggest that you move your test files to a new directory. You might create a small labyrinth of directories inside directories and only use one of these sub-sub-...-directories. And do remember to disallow those directories at once.

<added>One of the things that confuse me is this question: Is it a remotely hosted service or is the search tool on your own server?</added>

pageoneresults

8:26 pm on Jan 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello jenlynn, Welcome to WebmasterWorld!

You can exclude your /test/ sub-directory from the FP search results by turning it into a sub-web. While viewing your folder list, right click the /test/ sub-directory. Select convert to web from the dialog box. This will now make that /test/ a sub-web. When you edit the sub-web, a new FP will open and treat that sub-web as a separate site.

P.S. The robots.txt protocol has no bearing on the FP bots.

wickydoodah

10:01 pm on Jan 13, 2004 (gmt 0)

10+ Year Member



Yes, "pageoneresults" is correct. The FP search bot does not use the robots.txt file to determine what gets indexed.

However, rather than moving your /test/ sub-directory to a subdomain, if I remember correctly you can simply rename that sub-directory to begin with an underscore character and the FP search bot will not index it. For example, rename your /test/ sub-directory to /_test/. That's what I did (some years ago with FP2000) and I was able to exclude the contents of those sub-directories from my FP search pages. It should also work for individual files. Anything that begins with an underscore will be skipped by the search bot.

troels nybo nielsen

10:20 pm on Jan 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oops! After this no-one needs to doubt my ignorance about FrontPage. Must be one of the most irrelevant posts that I ever made at WebmasterWorld. Mods and admins please feel free to delete it (and this one too) if you think that this would be the best thing to do.

jenlynn

3:41 pm on Jan 14, 2004 (gmt 0)

10+ Year Member



Thanks to all that responded.

pageoneresults -

Your suggestion worked. However, now I am unable to view the pages in the test subdirectory through a browser, using 'www.domainname.com/test/index.shtml.

wickydoodah -

When I renamed the test sub-directory to _test, the page names still appeared in my search result list, although the actual pages did not display ('The page cannot be displayed'), only the page URL. I also was not able to view the pages through a browser.

My major objective is to create a secondary web site within the 'test' directory with a non-FrontPage editor (i.e. FirstPage 2000). I believe I will need to transfer the files with FTP vs. FrontPage to avoid corrupting the current FrontPage site that exists in the root web. Therefore, I don't think I should use the sub web route.

Does anyone know if it is even possible to do such a thing, or will I need to host two separate accounts with my ISP?

Thanks again.

pageoneresults

3:55 pm on Jan 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



jenlynn, unfortunately the only way to work with the FP searchbot is by using the sub-web method. This is documented in the help features of FP.

Right click that sub-web /test/ and go to properties. Make sure that both of the boxes are ticked...

Allow scripts to be run.
Allow files to be browsed.

P.S. As a side note, make sure the host has setup index.shtml as a recognized home page for that sub-directory. From what I've seen, index.shtml is not a standard extension in the list, it will need to be added.

wickydoodah

8:16 pm on Jan 14, 2004 (gmt 0)

10+ Year Member



jenlynn, not sure why adding an underscore to your sub-directory didn't work. It should operate the same as the standard _private sub-directory (remain hidden).

According to the Help section in FP2003:

To add a hidden subfolder, type the name of the new subfolder, preceding its name with an underscore (for example, _database), and then press ENTER.

This is what I did and those folders and their contents don't show up on my web bot search results. I have over a 100 pages hidden from the web bot. This also worked for me in FP2000, FP2002, as well as FP2003. Unfortunately, I can't tell you why it's not working in your case.

pageoneresults

8:46 pm on Jan 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It should operate the same as the standard _private sub-directory (remain hidden).

Not really. The _private is just that, it is a private folder that cannot be accessed by visitors, you should not be able to browse to the contents of that _private folder. The _private folder is the only one by default that does not allow browsing. If you precede any other sub-directories with an underscore, you can browse to those files.

In this case, when you use the term hidden, it refers to the fact that the folder is hidden from your folder list view while editing the site. You can change your preferences to view all hidden files and then you will see those folders with the preceding underscore.

I've not used the FP search function in years because it does not produce consistent results and is a pain to format. According to the documentation, making your /test/ sub-directory a sub-web is probably the best alternative. According to the above post from wickydoodah, using hidden sub-directories also seems to work. I just don't like using underscores in URI paths.

wickydoodah

9:14 pm on Jan 14, 2004 (gmt 0)

10+ Year Member



If you precede any other sub-directories with an underscore, you can browse to those files.

Yes, you are right about still being able to browse those files, even if preceeded by an underscore. My mistake. Age has a way of screwing with a person's memory!

But the underscore does keep those files from appearing on the FP search page. I just ran a bunch of searches again on one of my sites and indeed all the "hidden" files are not listed on the search page. It's what I've always used to exclude non-relevant pages from the FP seach results.

And I agree that the FP search bot is not the greatest, but it serves its purpose for what little searching our users do.

Cheers!