spiders and .txt files

Forum Moderators: phranque

Message Too Old, No Replies

spiders and .txt files

Will they show in the serps?

Stefan

8:02 pm on Nov 26, 2003 (gmt 0)

I've just put a database online, 1053 entries, that started as a dbf, was comma delimited, then had some editing done on it so it could be run through a geographic translator as a .dat file, then it finished up as a .txt file, which is now on the site, and linked to from a html database page. I can use Word or some such thing to turn the .txt into an html page, ('cause there's no way I'm spending a couple of days hand-coding it), but it's over 200k in htm versus the .txt which is 70k and loads very fast.

My question, if anyone managed to follow the preceding: Because the page has no <a href> links, being .txt, even if spiders find and crawl it, will it be seen as an "orphan" page and be ignored in the serps? I could chop the .txt into a few pieces and then into a few html files, but it's more bandwidth when they get downloaded. It's not critically important that the page gets in the serps but I wouldn't mind.

jdMorgan

8:31 pm on Nov 26, 2003 (gmt 0)

Stefan,

There are approximately 57,000 results for "info.txt" for a search on G [google.com], and many of them are plain-text files. So, I think your .txt pages will be indexed just fine.

Jim

Stefan

8:49 pm on Nov 26, 2003 (gmt 0)

Many thanks, jd. Guess I should have tried that myself... :-)

It's the first .txt I've put on the site... I realized as I was doing it, "hey, no links, this thing doesn't go anywhere..." Apparently it doesn't make any difference then. I put the site's email addy on it... think I'll add the URI of the index page then they can copy and paste it into the address bar.

Funny, I'd heard about orphan pages early on and was always careful to let a spider crawl on through a page to somewhere else, but a .txt is a dead-end. You have to wonder about the whole orphan page concept.

ADDED: Where is Google getting the titles from? Anchor text?

kevinpate

9:00 pm on Nov 26, 2003 (gmt 0)

> think I'll add the URI of the index page then
> they can copy and paste it into the address bar.

I do that for our .TXT files simply because it's a
much smaller file for dial-uppers to grab. If a
person happens to peek at the version in the
Google cache, the URL in the cached version of
the TXT file is clickable.

jdMorgan

9:29 pm on Nov 26, 2003 (gmt 0)

> ADDED: Where is Google getting the titles from? Anchor text?

I didn't dig very deeply, but the title text seems to be selected from one of the first few lines of text on the page itself.

Jim

Stefan

10:26 pm on Nov 26, 2003 (gmt 0)

I do that for our .TXT files simply because it's a
much smaller file for dial-uppers to grab.

For sure, Kevin, plus in this instance, because it's included in with some other databases on the page it's linked from, anyone who wants the info will probably be happy enough to get it in .txt. It's actually the only source for the info in existence, (list of caves with the positions given in the GPS datum rather than the original local datum... I hope the specific is ok... it's not a money-maker)

the title text seems to be selected from one of the first few lines of text on the page itself.

Yeah, Jim, that's what I thought must be happening when I checked the first few pages in your example, but I came across a couple that didn't seem to be doing that. I was thinking maybe anchor text but I didn't dig down into it. I'll figure it out later.