Forum Moderators: goodroi

Message Too Old, No Replies

? and * in robots.txt

What stands a questionmark in the robots.txt for?

         

solandre

5:23 pm on Feb 9, 2006 (gmt 0)

10+ Year Member



Hi,

currently we are coding a webcrawler and i am searching thsi forum for problems and solutions regarding the robots.txt and the robots meta-tag.
a question still could not answer myself is what a questionmark (?) in allow/disallow-lines stands for?
at this point we think of ignorin this, cause it is not in the robotstxt.org-specifications.

can somebody give me a hint?
example from googles robots.txt would be:

User-agent: *
Allow: /searchhistory/
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Disallow: /nwshp
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /sorry/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
etc...

this example is plain text downloaded, so there should be no prob with encoding-standards, i guess. as long as we dont know about that we ignore all lines with questionmarks.
thanks on that, and yes, i have got another question:
what do webmasters mean by using an asterix after the name of the user-agent?

example:

User-agent: Xbot*

at this point we would just trim the asterix away and and regex the remainder with our botname.

read you

Lord Majestic

5:50 pm on Feb 9, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Treat question mark just like any other character - its start of query string, some people like having no filename but use query string.

Wildcard * is only valid on its own in:

User-agent: *

If its used in any other context then it should be treated as normal symbol - any webmaster who assumes that it will patterm match user-agent is mistaken.