Forum Moderators: goodroi
For verified sitemaps google will show you stats and information on each sitemap plus some other info on your site. It checked my robots.txt and said it had an error which amounted to being "over 2000 characters".
Has anyone heard of this? I searched a while and came up empty.
There is no character limit.
2.5 meg:
[lld.dk...]
800k
[vm.ibm.com...]
600k
[lifesite.net...]
500k
[chop.edu...]
I used their error check for my file which is basically the old WebmasterWorld robots.txt with some additions, and it came up in red "over 2000 characters". You can play with it there and retest. When I chopped it down to their size the message was gone.
Why have something like this without documentation?
IBM's lists directory contents down to the individual files. It would be a LOT shorter if Disallows 'ended' at the directory level. Is there some advantage to the full-path format?
LLD's, the really, REALLY big one, is the same way -- a gazillion directories, a gazillion contents -- like an inverse SiteMap up to six levels deep! Benefit?
CHoP's includes a mix of instructions, some of which are Disallows but they're missing spaces (e.g.: Disallow:http://www...). It also includes directory content down to the individual files, plus a ton of Allows (albeit dynamic) in full-path format. Wouldn't the latter be redundant? Or do you think they're included given that they're dynamic and, in the old days at least, didn't get spidered?
LifeSite's contains no robots instructions whatsoever, 'just' 14,561 URLs. Wouldn't that have the opposite effect -- to specifically offer-up those URLs for spidering? (Interesting idea, that...)