Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt: Disallow everything except these pages

         

ron_ron

6:36 am on Sep 1, 2014 (gmt 0)

10+ Year Member



I have more pages I want to disallow than pages I do want to allow. So I made a robots.txt file like this:

User-agent: *
Allow: /index.asp
Allow: /index-1.asp
Allow: /index-2.asp
Allow: /index-3.asp
Disallow: /

But now on google searches, google shows this instead of a description:

A description for this result is not available because of this site's robot.txt

The site shows up in the SERPs but without my intended description. Why is google doing this?

Does it matter if I put my Disallow tag before the Allow tags?

not2easy

7:03 am on Sep 1, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google does not know you don't want those other pages indexed. It won't crawl any more because you have blocked it but it won't stop indexing the other pages because the last time it was able to crawl it did not see a noindex tag.

You don't mention how many pages are involved, if it is a dozen or so, you can add the noindex tags, then unblock them from crawling. If there is a reason they need to be removed from the index now, right away, you can use the tools for removing URLs in your GWT account. That will take them out right away. The pages will come back after a while (I believe it is 90 days, but it could be less or more) if Google is not allowed to see your noindex tags.

ron_ron

8:08 am on Sep 1, 2014 (gmt 0)

10+ Year Member



So Allow does not work in Google? Only Disallow?

I prefer to Disallow all pages on the site except the few I want to Allow. There are many more pages I do not want indext than the handful I want indexed.

I prefer not to use tools as I do not want to create any accounts with google. I am using robots.txt and not meta robots tags.

not2easy

2:27 pm on Sep 1, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Your robots.txt file is fine and will do what you want - except it will not ever keep those pages from being indexed with the "no description" description until you tell Google not to index those pages.

jay5r

1:01 pm on Sep 4, 2014 (gmt 0)

10+ Year Member Top Contributors Of The Month



There is no such thing as Allow in the robots.txt standard. You can only Disallow things…

[robotstxt.org...]

To exclude all files except one:
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory


Which, of course, is no solution at all.

not2easy

3:03 pm on Sep 4, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google accepts some syntax which is not included in The standard and they spell out how to use it. Allow is an acceptable way to have google crawl a few specific files when all of them are disallowed as a default such as Disallow: /*.php. If it didn't work like that they would not be able to retrieve my php sitemaps which they are doing regularly.

This is how you can disallow "all" and allow "some" for Google, it needs to be in this order:
Disallow: /directory/*.asp
Allow: /directory/filename.asp
Allow: /directory/filename-2.php

not2easy

3:30 pm on Sep 4, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I looked it up so you can go and read more about it if you like: [support.google.com...]