Blan

msg:3981945 | 1:13 pm on Sep 1, 2009 (gmt 0) |
IMO, it will disable the page http://www.example.com/pages only. :)
# For most search engines' bots... User-agent: * Disallow: /pages/subfolder_1/ # the subforlder_1 includes the pages you want to block.
# Google support some pattern matching, the following blocks the pages in the sub-directory of the directory *pages*, but allow other pages exist in the directory pages directly. User-agent: * Disallow: /pages/*/*
I don't test them. Google webmaster world give us a wonderful tool to test robots.txt. Why do you test it by yourself? :)
|
SwipeTheMagnets

msg:3984241 | 6:56 pm on Sep 4, 2009 (gmt 0) |
I have a question for the group - Does anyone see a problem with the following: # Allow Google User-agent: googlebot Disallow: /example.html # Allow Yahoo User-agent: Slurp Disallow: /example.html # Allow MSN User-agent: msnbot Disallow: /example.html # Restrict All Crawlers But The Ones Above User-agent: * Disallow: / Any feedback is greatly appreciated!
|
seomonster

msg:3984682 | 12:25 pm on Sep 5, 2009 (gmt 0) |
I've just tested this in GWT and Googlebot is allowed so I would assume the other major SE bots are also allowed and that Disallow: / is keeping all other bots out. I don't see a problem with it, even with the specific page restrictions it appears to work fine. Does anybody know of any other good tool to test robots.txt syntax other than in GWT?
|
jdMorgan

msg:3984696 | 1:43 pm on Sep 5, 2009 (gmt 0) |
Responding to the initial post, Disallow: /pages will disallow the file called "/pages", the directory called "/pages/", and all URL-paths below that directory. Anything that starts with "/pages" will be disallowed. Robots.txt uses prefix-matching, so any URL that matches the prefix that you put in the Disallow directive will be disallowed. While Googlebot and a few other search engines' robots support limited pattern-matching, and even an "Allow" directive in some cases, there is no 'universal' solution to this problem other than to fix the structure of your site, and to prevent the duplicate-content problems in the first place. Also, I'm not sure why you say that "301 redirects are out of the question," but if this is a limitation imposed by your hosting, then it's time to get a new host. Jim
|
Robert Charlton

msg:3989520 | 3:51 am on Sep 15, 2009 (gmt 0) |
To block a specific page only... use the meta robots tag instead of robots.txt, on the page you don't want indexed, in the head section. In this case, the syntax would be: <meta name="robots" content="noindex,follow">
|
AnkitMaheshwari

msg:3989530 | 4:11 am on Sep 15, 2009 (gmt 0) |
One of the easy ways is to using following code to block /pages/ only. User-agent: * disallow: /pages/index.html Assuming that the index.html is the main page for /pages/ directory.
|
|