| 1:37 pm on Sep 9, 2005 (gmt 0)|
Ya, alot of sites get their robots.txt indexed if they get pointed at.
Look at #4:
| 1:46 pm on Sep 9, 2005 (gmt 0)|
Having asked about this before the real question is why ... a txt file should surely be seen and not heard. It is almost as bad a putting an xml file in the serps.
txt = no formatting
xml = so much foramtting you cant read it
Both are silly results imo
| 1:50 pm on Sep 9, 2005 (gmt 0)|
|Both are silly results imo |
They don't have to have high rank, and if there is no other good matches then better have text file or even XML than no matches at all?
| 2:03 pm on Sep 9, 2005 (gmt 0)|
if it is digital, online, and accessible - google is going to index it no matter what.
| 2:05 pm on Sep 9, 2005 (gmt 0)|
I've heard G-men state that they actually consider .doc files to be highly relevant because they generally consist of nothing but text. I imagine the same could be true for .txt files.
| 6:57 pm on Sep 12, 2005 (gmt 0)|
There are some very informative documents in .txt, particularly product and software manuals / FAQ files / Release/Change notes, etc.
| 7:09 pm on Sep 12, 2005 (gmt 0)|
Reserving robots.txt as a file only for web bots is a convention. Its usage is neither compulsary nor universal.
If a site has a link to its robots.txt, then it is directing non-bot visitors to it. Makes it fair game for search engines to index.
| 7:18 pm on Sep 12, 2005 (gmt 0)|
> online, and accessible
Not everyone understands what this implies. A lot of things get indexed that surprise people.
We're much more careful now than we used to be, having got a few nasty surprises ourselves.
| 7:53 pm on Sep 12, 2005 (gmt 0)|
>Not everyone understands what this implies. A lot of things get indexed that surprise people.
use a program like teleport and it may be surprising to a lot of webmasters what it finds that they didnt think was accesible. I assume gbot is far more effecient at finding things.
| 7:55 pm on Sep 12, 2005 (gmt 0)|
plug brett's search into google.. i wonder why the whitehouse doesnt want ALL that content crawled?
| 2:05 pm on Sep 13, 2005 (gmt 0)|
Here's a question I thought of while doing a google search on "robots.txt" file - why does the whitehouse site (which comes up about third or fourth) have a /text and /iraq extension on every url they're asking google to ignore? What's THAT about?