Welcome to WebmasterWorld Guest from 23.20.215.116

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Google indexing robots.txt file

     
11:19 am on Oct 5, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:July 2, 2003
posts:112
votes: 0


Interesting to see that Google has indexed and cached robots.txt file of reputed websites like nytimes, BBC and Google itself...

[google.com...]

11:38 am on Oct 5, 2006 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38060
votes: 13


ya, mentioned alot in the 4 years they've been doing it.
12:02 pm on Oct 5, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 30, 2004
posts:42
votes: 0


Is that because of the text/html mime-type in stead of the text/plain that it should be?
7:47 pm on Oct 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


It's because someone somewhere links to that file, so they treat it as content as well as its true purpose.
2:08 am on Oct 6, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Jan 14, 2006
posts:222
votes: 0


It's because someone somewhere links to that file, so they treat it as content as well as its true purpose.

The best example is in the search results you posted. #1 is Wikipedia expaining robots.txt. #2 is the White House robots.txt itself.

Look again at at the Wiki article and you'll see they link to the White House robots.txt

3:23 am on Oct 6, 2006 (gmt 0)

New User

10+ Year Member

joined:Dec 30, 2004
posts:28
votes: 0


Why Google database show robots.txt file?
3:38 am on Oct 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Why Google database show robots.txt file?

That sure seems like an obvious question to me too. It seems like it would be simple enough for Google to implement. Is anyone really interested in seeing the contents of a robots.txt file in their SE results?
4:00 am on Oct 6, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 10, 2006
posts:627
votes: 0


Number 4 is BT's robots.txt blog :)
number 5 is google's own robots txt file


Webmasterworld: Robots.txt
Brett Tabke experiments with writing a weblog in a text file usually read only by robots. Trenchant commentary on the world of search engine marketing.
www.webmasterworld.com/robots.txt - 2k - Cached - Similar pages

google's robots txt - [ Translate this page ]
User-agent: * Allow: /searchhistory/ Disallow: /news?output=xhtml& Allow: /news?output=xhtml Disallow: /search Disallow: /groups Disallow: /images Disallow: ...
www.google.com/robots.txt - 3k - Cached - Similar pages

4:23 am on Oct 6, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 1, 2006
posts:112
votes: 0


It's one of the few ways that Brett will have his blog found.