Why ever didn't I think of doing a search like that?
fmonk
9:36 am on Apr 8, 2003 (gmt 0)
I just installed MovableType on my host and ran ht://Dig to update my site search, it returned many errors when it hit the .rdf and .xml files that MT created, many of which were listed as "search-engine spamming."
Has me wondering how googlebot and other spiders will react, should I exclude these file types in my robots.txt file?
chiyo
10:15 am on Apr 8, 2003 (gmt 0)
There are hundreds of Mt sites with the default xml and rdf files (usually used for RSS delivery) indexed on Google. Becuase of their nature (just providing the last 10 or so links with short descriptions) of your last 10 entries, htdig may interpret these multiple index files as "duplicate content". I dont think Google would, but dont take my word for it. Its a guess.
sullen
10:23 am on Apr 8, 2003 (gmt 0)
I posted a thread on xml and google yesterday.
answer seems to be that Google reads and indexes them, but doesn't cache or decode them (i.e. it won't be able to tell if there are any links in the docs)
fmonk
10:24 am on Apr 8, 2003 (gmt 0)
Yeah, but it's an educated guess... as you said there are hundreds of MT sites indexed so maybe google is smart enough to overlook these files types.