Forum Moderators: goodroi

Message Too Old, No Replies

Disallow ALL folders?

but allow top level html

         

Reno

3:26 pm on Jan 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As I mentioned in my recent posting about a problem with the msnbot, I want to control the amount of bandwidth that bot's eat up every day.

This morning I've been reading some online tutorials but cannot find the answer to this question, so thought I'd better post it here:

Is there a way to indicate with one line of text that ALL sub-folders are to be disallowed? The key here is to not block the bot from indexing the top level html pages, as that is where the important content resides.

For example, would this work:

Disallow: /*/

Or does that block all the top level html files as well? I realize I can individually list each and every folder, but am hoping for a more efficient solution.

As always, any advice is appreciated...

ps. Am sorry to say that after my initial success with msnbot to only 4 MB bandwidth a day (down from 40+), they are now up to 14MB/day, with very little new content added to the site during this time period. Very perplexing.

.......................................

Lord Majestic

3:34 pm on Jan 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can't do it like this to have it working with all bots - you will have to list subdirs separately. Some bots support regular expressions (ie wildcard you used), but this is not universal.

Reno

4:28 pm on Jan 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That is what I suspected, since no tutorial mentioned it as an option. Thanks for the confirmation....

Lord Majestic

4:30 pm on Jan 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just list existing subdirs and make sure you create new subdirs only inside existing - this way you won't have to change your robots.txt.

Pfui

8:57 pm on Jan 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi again, Reno:)

A really good way to see how robots.txt files are done properly is to check major sites' files. E.g.:

eBay [ebay.com]
craigslist [craigslist.com]

And here's a whopper:

Google [google.com]

And last but not least, a blogger!:)

WebmasterWorld [webmasterworld.com]
(actual [webmasterworld.com]; original [webmasterworld.com])

.
P.S. to Newcomers
The Web Robots FAQ [robotstxt.org]

Reno

9:14 pm on Jan 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Thanks LM and Pfui for the suggestions. It never occurred to me to simply put "/robots.txt" after a major site domain -- very illuminating!