Welcome to WebmasterWorld Guest from 54.224.230.193

Forum Moderators: Robert Charlton & aakk9999 & andy langton & goodroi

Message Too Old, No Replies

Google indexes ROBOTS.TXT

     
10:11 am on Sep 8, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member kaled is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 2, 2003
posts:3710
votes: 0


I know posting specific searches is frowned upon, but try searching for <snip> from google.co.uk

The second result is their robots.txt file - well, it made me laugh.

Kaled.

[edited by: Brett_Tabke at 1:36 pm (utc) on Sep. 9, 2005]
[edit reason] lets not do specifics... [/edit]

1:37 pm on Sept 9, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38060
votes: 13


Ya, alot of sites get their robots.txt indexed if they get pointed at.

Look at #4:
[google.com...]

1:46 pm on Sept 9, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Oct 6, 2003
posts:234
votes: 0


Having asked about this before the real question is why ... a txt file should surely be seen and not heard. It is almost as bad a putting an xml file in the serps.

txt = no formatting
xml = so much foramtting you cant read it

Both are silly results imo

1:50 pm on Sept 9, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


Both are silly results imo

They don't have to have high rank, and if there is no other good matches then better have text file or even XML than no matches at all?

2:03 pm on Sept 9, 2005 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38060
votes: 13


if it is digital, online, and accessible - google is going to index it no matter what.
2:05 pm on Sept 9, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 11, 2004
posts:147
votes: 0


I've heard G-men state that they actually consider .doc files to be highly relevant because they generally consist of nothing but text. I imagine the same could be true for .txt files.
6:57 pm on Sept 12, 2005 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


There are some very informative documents in .txt, particularly product and software manuals / FAQ files / Release/Change notes, etc.
7:09 pm on Sept 12, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 4, 2002
posts:1314
votes: 0


Reserving robots.txt as a file only for web bots is a convention. Its usage is neither compulsary nor universal.

If a site has a link to its robots.txt, then it is directing non-bot visitors to it. Makes it fair game for search engines to index.

7:18 pm on Sept 12, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member caveman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 17, 2003
posts:3744
votes: 0


> online, and accessible

Not everyone understands what this implies. A lot of things get indexed that surprise people.

We're much more careful now than we used to be, having got a few nasty surprises ourselves.

7:53 pm on Sept 12, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 1, 2003
posts:1203
votes: 0


>Not everyone understands what this implies. A lot of things get indexed that surprise people.

use a program like teleport and it may be surprising to a lot of webmasters what it finds that they didnt think was accesible. I assume gbot is far more effecient at finding things.

7:55 pm on Sept 12, 2005 (gmt 0)

New User

10+ Year Member

joined:Dec 21, 2004
posts:40
votes: 0


plug brett's search into google.. i wonder why the whitehouse doesnt want ALL that content crawled?
2:05 pm on Sept 13, 2005 (gmt 0)

New User

10+ Year Member

joined:June 21, 2005
posts:8
votes: 0


Here's a question I thought of while doing a google search on "robots.txt" file - why does the whitehouse site (which comes up about third or fourth) have a /text and /iraq extension on every url they're asking google to ignore? What's THAT about?

Henry