Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: goodroi
Now I'm wondering whether all spiders read it the same way: Before I found out the hard way that Google sees a "minor" difference between index.cfm, InDeX.CfM and INDEX.CFM - I had used different spellings all over the website. So I included every way to spell "Index.cfm" I had used before but index.cfm in the robots.txt file. Googlebot understands this well, avoiding all other spellings.
But other spiders just seem to read robots.txt, find "INDEX.CFM" forbidden and go away.
What would you do? Drop the robots.txt since Google probably got the point and stopped reading the multiple filenames of the same file?
You should go back and use the same spelling "all over the website".
That's exactly what I do now (but I mixed spellings in my pre-WW-Life). Unfortunately Google keeps old copies of e.g. "index.CFM", "iNdeX.cFm" etc. That means,
1. people get old Google-Caches
2. Google might treat this old Versions as duplicate (multiple) content.
So I want it to drop all the old copies, by putting them into the robots.txt file. As I said, this works just fine with Google.
But I'm not sure how e.g. Altavista would react if it finds "iNdEX.CFM" disallowed. Would it read "index.cfm" anyway or just go away?