Forum Moderators: bakedjake

Message Too Old, No Replies

My pico-made, ftp-ed text files get served as html to bots?

Apache mischief

         

berli

3:44 am on Jun 21, 2003 (gmt 0)

10+ Year Member



I found out from the Google cache that my 100% handtyped ASCII text files which I ftped to my Apache server are being served as html files with the text in "pre" tags and special characters escaped into lovely stuff like ". Why?

It makes me look really stupid when someone views the Google cache. I'm sure it looks really dumb in some browsers, too. After all, the extension is .txt. I don't know what the MIME type is, but judging from the Greek my server gives me, it thinks they're text files.

So what do I do about it? If I wanted html, I'd write it myself, dammit.

On the flip side of this, in html files my server helpfully turns my &lt; and &gt; back into < and >, which means that I can't use < in my body text unless I vigorously "fix" it every time I make an edit, and it turns &amp; into &. On some pages I may have to use umlauted vowels (e.g. &auml; for ä) and I'm worried it will "fix" these too. I've had some real trouble with this before using certain server software. In time my umlauts simply disappeared . . .

Duckula

7:07 am on Jun 21, 2003 (gmt 0)

10+ Year Member



Go to the "control panel" here, click on "server headers" on the "plugins" section and check your .txt there.

If the "Content-Type:" is not "text/plain" but "text/html" you can be sure it's apache.

berli

1:31 am on Jun 22, 2003 (gmt 0)

10+ Year Member



Thanks.

I have a new problem, though . . . I seem to have blocked myself from cPanel using mod_rewrite. D'oh!

berli

11:18 pm on Jun 24, 2003 (gmt 0)

10+ Year Member



Got into cPanel and rooted around. It claims that my files are all text/plain which it defines in the mime types as ASCII text. If I ask the server to show me the file, I see text. Apparently Google was fed something else entirely. This is really bizarre.

I don't have the same control panel options that you do, apparently. However, my online file manager lets me choose text/plain or text/html. I think I'll fiddle with some new files and see what happens and report back.

Duckula

11:19 am on Jun 25, 2003 (gmt 0)

10+ Year Member



:) I said here, like, on the navigation bar above *this* page. If you can find something on cPanel you innovated.

I seem to have blocked myself from cPanel using mod_rewrite

Then check it again. If you messed it enough to keep you out, you probably made more mischief. Try moving it out of the way just to test for a minute.

Kackle

11:42 am on Jun 25, 2003 (gmt 0)



Same here. I have 134 files with the extension of .txt. I checked my headers and Apache is sending them out as text/plain as it should. It's Google who is sticking it inside of "pre" headers and showing the cache as an html page. I had some URLs in the files and they now show up as links and anchors.

Google is doing this parsing themselves on text files. It's probably because they want to suck out the URLs and feed them into their system. Another reason may be that they have to stick their blurb on top in the form of a table, and they cannot do this without converting the whole thing to html.

No, I don't like it. Normally I have the NOARCHIVE set for Googlebot on the rest of the site, but for these special text files I cannot do that since there's no place for a header in a text file. So the cache copy comes out looking rather dumb, with Google's markup plainly visible on it.

Yet another tiny transgression on copyright by good ol' Google.

berli

12:36 am on Jun 26, 2003 (gmt 0)

10+ Year Member



Duckula,

Obviously I was in need of some coffee, or a clue. I just took a look at my headers using the WW plugin, and they come back text/html. Ouch!

Kackle,

You might be right about Google's part in all this . . . Some of my text files have the <html> and <pre> tags, while others don't, but ALL of them are text/html according to Apache. Interesting, no?

It would be really nice, actually, if google could follow properly formed urls in text files. I put urls in them for the saavy user to find other pages on my site, but having Google follow them would be great! This business of displaying html tags for text files, though, makes me look like an idiot and I don't care for it at all.

A quick search for ".txt" on Google shows that the <pre> tags are showing up on lots of text files (in the Google cache, that is).