Forum Moderators: goodroi
User-Agent: this means you
User-Agent: and you
User-Agent: Knowledge
Disallow: /
and neither does this (in a block by itself): User-Agent: Knowledge
Disallow: /
but this does: User-Agent: The Knowledge AI
Disallow: /
... because, apparently, there are so many robots with “Knowledge” in their names, they couldn’t possibly have guessed that I meant them. (If they had had a proper UA string with contact/www information and so on, would they have demanded that I match the whole thing to the letter?) There have been no page requests since about a week ago, when I made this final change.
Of course you need to use the exact name. Always been that way...
User-agent
The value of this field is the name of the robot the record is describing access policy for.
...
The robot should be liberal in interpreting this field. A case insensitive substring match of the name without version information is recommended.
3.2.1 The User-agent line
Name tokens are used to allow robots to identify themselves via a simple product token. Name tokens should be short and to the point. The name token a robot chooses for itself should be sent as part of the HTTP User-agent header, and must be well documented.
These name tokens are used in User-agent lines in /robots.txt to identify to which specific robots the record applies. The robot must obey the first record in /robots.txt that contains a User-Agent line whose value contains the name token of the robot as a substring.
5.5.3. User-Agent
...
User-Agent = product *( RWS ( product / comment ) )
The User-Agent field-value consists of one or more product identifiers, each followed by zero or more comments (Section 3.2 of [RFC7230]), which together identify the user agent software and its significant subproducts. By convention, the product identifiers are listed in decreasing order of their significance for identifying the user agent software. Each product identifier consists of a name and optional version.
product = token ["/" product-version]
product-version = token
...
If you want to block or allow all of Google's crawlers from accessing some of your content, you can do this by specifying Googlebot as the user-agent.
it was almost 10 years ago when the "big 3" search engines (at the time) had an agreement to support a common set of directivesThen, as stated above and at the link I posted, they went different ways. Besides a few common supported directives, those that support robots.txt do it differently. I work with this every day.