Welcome to WebmasterWorld Guest from 34.228.115.216

Forum Moderators: goodroi

Message Too Old, No Replies

Is there a way to do this?

inverted robots

     
2:11 pm on Mar 4, 2015 (gmt 0)

New User

5+ Year Member

joined:Dec 28, 2013
posts: 2
votes: 0


I am trying to reconfigure a robots.txt file. I know this approach may be impossible but... I want to exclude everything except certain specified directories (instead of allowing everything except certain paths/files)

Consider this block:

User-agent: *
Disallow: /
Allow: /Dir1/
Allow: /Dir2/
Allow: /Dir3/
Allow: /Dir4/


This works except for one fatal flaw. It blocks the use of the default home page referenced by the url domain name alone, such as:

www.domainname.com


Since the 'index.htm' or whatever default file returned by the web-server is implied and not implicit the rule fails for the domain name by itself. I don't care much for the idea of allowing everything by default and then having to hunt down everything I don't want indexed/crawled. Whoever came up with this idea was creating crawlers

I know you can allow subdirs after a disallow statement but how then can you handle anything in the root? Hell, that's the one place I want to limit. It seems like it would be much simpler to be able to just list areas of a site you want crawled, not the other way around. Am I crazy? Or is this just stupid?

Any workarounds I can't see?
2:19 pm on Mar 4, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 8, 2004
posts: 565
votes: 1


The "Disallow: /" blocks bots from going anywhere on the site. The "Allow:" statements are useless. The only thing that should be in the Robots.txt file is what is NOT allowed.

From - [robotstxt.org...]
To exclude all files except one

This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:
2:33 pm on Mar 4, 2015 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4161
votes: 262


Google does recognize "Allow:" but only to modify a prior "Disallow:" setting. Other bots may be left out.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members