? and * in robots.txt

Hi,

currently we are coding a webcrawler and i am searching thsi forum for problems and solutions regarding the robots.txt and the robots meta-tag.
a question still could not answer myself is what a questionmark (?) in allow/disallow-lines stands for?
at this point we think of ignorin this, cause it is not in the robotstxt.org-specifications.

can somebody give me a hint?
example from googles robots.txt would be:

User-agent: *
Allow: /searchhistory/
Disallow: /search
Disallow: /groups
Disallow: /images
Disallow: /catalogs
Disallow: /catalogues
Disallow: /news
Disallow: /nwshp
Disallow: /?
Disallow: /addurl/image?
Disallow: /pagead/
Disallow: /relpage/
Disallow: /sorry/
Disallow: /imgres
Disallow: /keyword/
Disallow: /u/
etc...

this example is plain text downloaded, so there should be no prob with encoding-standards, i guess. as long as we dont know about that we ignore all lines with questionmarks.
thanks on that, and yes, i have got another question:
what do webmasters mean by using an asterix after the name of the user-agent?

example:

User-agent: Xbot*

at this point we would just trim the asterix away and and regex the remainder with our botname.

read you

? and * in robots.txt

What stands a questionmark in the robots.txt for?

solandre

Lord Majestic

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week