Forum Moderators: goodroi
At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry trumps the less specific (shorter) rule. In case of conflicting rules, including those with wildcards, the least restrictive rule is used.
To evaluate if access to a URI is allowed, a robot MUST match the paths in allow and disallow rules against the URI. The matching SHOULD be case sensitive. The most specific match found MUST be used. The most specific match is the match that has the most octets. If an allow and disallow rule is equivalent, the allow SHOULD be used. If no match is found amongst the rules in a group for a matching user-agent, or there are no rules in the group, the URI is allowed. The /robots.txt URI is implicitly allowed.
Octets in the URI and robots.txt paths outside the range of the US-ASCII coded character set, and those in the reserved range defined by RFC3986, MUST be percent-encoded as defined by RFC3986 prior to comparison.
If a percent-encoded US-ASCII octet is encountered in the URI, it MUST be unencoded prior to comparison, unless it is a reserved character in the URI as defined by RFC3986 or the character is outside the unreserved character range. The match evaluates positively if and only if the end of the path from the rule is reached before a difference in octets is encountered.
To evaluate if access to a URL is allowed, a robot must attempt to match the paths in Allow and Disallow lines against the URL, in the order they occur in the record. The first match found is used. If no match is found, the default assumption is that the URL is allowed.
The /robots.txt URL is always allowed, and must not appear in the Allow/Disallow rules.
The matching process compares every octet in the path portion of the URL and the path from the record. If a %xx encoded octet is encountered it is unencoded prior to comparison, unless it is the "/" character, which has special meaning in a path. The match evaluates positively if and only if the end of the path from the record is reached before a difference in octets is encountered.
[edited by: phranque at 2:35 am (utc) on Jun 15, 2021]
Disallow: /wp-includes/
Allow: /wp-includes/js/
Allow: /wp-includes/css/
Allow: /*.css
Allow: /*.js Googlebot is the only one that reliably uses the Allow syntax within a Disallow, and then the Allow must follow the Disallow
At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry trumps the less specific (shorter) rule. In case of conflicting rules, including those with wildcards, the least restrictive rule is used.