Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's treatment of allow and disallow combined in robots.txt?

         

naveen10

11:30 am on Dec 14, 2018 (gmt 0)

5+ Year Member



My robots.txt file looks like below:

User-agent: Googlebot
Disallow: /service/
Allow: /service/seo.html

Will google crawl/index the SEO service page or not?

phranque

8:48 am on Jul 29, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



yes.

however:
A robotted page can still be indexed if linked to from from (sic) other sites
While Google won't crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results. To properly prevent your URL from appearing in Google Search results, you should password-protect the files on your server or use the noindex meta tag or response header (or remove the page entirely).

(source:Understand the limitations of robots.txt [support.google.com])

tangor

7:37 am on Jul 30, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



robots.txt is a "request" it is not a "rule"... and only honorable bots will pay heed.

Just one of the tools in the box ... just the least effective for controlling bad actors. The good guys will honor ... but the rest are going to ignore it.

Look to .htaccess to control access as needed...

shashankx10

12:17 pm on Jul 30, 2019 (gmt 0)

5+ Year Member



Yes it will index the services/seo page but if in future you need to index any other page under /service, then you have to put instructions in robots file manually.