Welcome to WebmasterWorld Guest from 54.197.171.28

Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt case sensitive?

   
8:03 am on Aug 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I recently learnt what to do with a robots-txt file. (Where - here, of course :) )

Now I'm wondering whether all spiders read it the same way: Before I found out the hard way that Google sees a "minor" difference between index.cfm, InDeX.CfM and INDEX.CFM - I had used different spellings all over the website. So I included every way to spell "Index.cfm" I had used before but index.cfm in the robots.txt file. Googlebot understands this well, avoiding all other spellings.

But other spiders just seem to read robots.txt, find "INDEX.CFM" forbidden and go away.

What would you do? Drop the robots.txt since Google probably got the point and stopped reading the multiple filenames of the same file?

3:13 pm on Aug 18, 2003 (gmt 0)

10+ Year Member



What exactly are you trying to do? Do you want the robots to NOT index index.cfm, or do you want it indexed?

Filenames are case sensitive. You should go back and use the same spelling "all over the website".

7:48 am on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Ryan,

You should go back and use the same spelling "all over the website".

That's exactly what I do now (but I mixed spellings in my pre-WW-Life). Unfortunately Google keeps old copies of e.g. "index.CFM", "iNdeX.cFm" etc. That means,
1. people get old Google-Caches
2. Google might treat this old Versions as duplicate (multiple) content.

So I want it to drop all the old copies, by putting them into the robots.txt file. As I said, this works just fine with Google.

But I'm not sure how e.g. Altavista would react if it finds "iNdEX.CFM" disallowed. Would it read "index.cfm" anyway or just go away?