Welcome to WebmasterWorld Guest from 54.147.10.72

Message Too Old, No Replies

Google indexing robots.txt file

     

kunwarbs

11:19 am on Oct 5, 2006 (gmt 0)

10+ Year Member



Interesting to see that Google has indexed and cached robots.txt file of reputed websites like nytimes, BBC and Google itself...

[google.com...]

Brett_Tabke

11:38 am on Oct 5, 2006 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



ya, mentioned alot in the 4 years they've been doing it.

NedProf

12:02 pm on Oct 5, 2006 (gmt 0)

10+ Year Member



Is that because of the text/html mime-type in stead of the text/plain that it should be?

g1smd

7:47 pm on Oct 5, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It's because someone somewhere links to that file, so they treat it as content as well as its true purpose.

Jordo needs a drink

2:08 am on Oct 6, 2006 (gmt 0)

5+ Year Member



It's because someone somewhere links to that file, so they treat it as content as well as its true purpose.

The best example is in the search results you posted. #1 is Wikipedia expaining robots.txt. #2 is the White House robots.txt itself.

Look again at at the Wiki article and you'll see they link to the White House robots.txt

etgsgroup

3:23 am on Oct 6, 2006 (gmt 0)

10+ Year Member



Why Google database show robots.txt file?

GaryK

3:38 am on Oct 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why Google database show robots.txt file?

That sure seems like an obvious question to me too. It seems like it would be simple enough for Google to implement. Is anyone really interested in seeing the contents of a robots.txt file in their SE results?

Tastatura

4:00 am on Oct 6, 2006 (gmt 0)

5+ Year Member



Number 4 is BT's robots.txt blog :)
number 5 is google's own robots txt file


Webmasterworld: Robots.txt
Brett Tabke experiments with writing a weblog in a text file usually read only by robots. Trenchant commentary on the world of search engine marketing.
www.webmasterworld.com/robots.txt - 2k - Cached - Similar pages

google's robots txt - [ Translate this page ]
User-agent: * Allow: /searchhistory/ Disallow: /news?output=xhtml& Allow: /news?output=xhtml Disallow: /search Disallow: /groups Disallow: /images Disallow: ...
www.google.com/robots.txt - 3k - Cached - Similar pages

smells so good

4:23 am on Oct 6, 2006 (gmt 0)

5+ Year Member



It's one of the few ways that Brett will have his blog found.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month