Forum Moderators: open

Message Too Old, No Replies

Text files visible as html through wordpress

A small bug can cause major headaches and 404 errors

         

JS_Harris

7:30 pm on Nov 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The way wordpress handles category hierarchy can cause problems when your server has a text file on it with the same name as one of your categories. The problem causes two things to happen, #1 - any article written under that category cannot be found and return a 404 error and #2 - The text file can be viewed as html.

It takes some pretty specific conditions, such a pretty permalinks containing the %category% peramater, but if you run into the problems this creates it will feel like a wild goose chase unless you notice the server has a text file with the same name as a category, theres nothing written about this online.

Here is what to look for if you come accross unexplained 404 messages.

- a text file with the same name as one of your categories.
- the category uri is all in small letters (wordpress defaults to this)while the text file has a capital letter to begin with. ie:Example instead of example
- the text file extension is missing or stripped (example instead of example.txt)

What happens:
- You write an article - example.com/categoryname/articlename
- Your browser tries to visit example.com/categoryname/articlename via any link on the site
- Wordpress finds the example.com/Textfilethatmatchescategoryname
- Wordpress ignores the capital letter and uses example.com/textfilethatmatchescategoryname/articlename as default but since the category hierarchy used the wrong file 404 error messages occur and any article in that category returns a 404.
- If you intentionaly visit example.com/Textfilethatmatchescategoryname/anythingatall you will see the text article as full html wrapped in a wordpress header.

I thought I'd write about this since most people who run into this problem won't think of trying a capitalized category name in the uri and will assume permalinks issues. It's not only text files that display, any file without an extension or with it's extension hidden by settings will cause this, but text files are a particular pain.

Although very unlikely this can also cause a weakness in security if the site allows uploads and allows a file without an extension to be uploaded. (mod-remove or edit this as you see fit for security reasons)

[edited by: JS_Harris at 7:34 pm (utc) on Nov. 3, 2008]

ergophobe

6:30 pm on Nov 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks JS. That's pretty strange. I think there's something strange about your server setup.

I looked through some of the relevant WP code (/wp-includes/rewrite.php, /wp-includes/canonical.php, /wp-includes/pluggable.php) and didn't see anything that would cause this.

When I try to replicate it on one of my sites, I can't. The behavior I get is exactly as I expect. If I have a category named "junk" which appears in my menus as "Junk" and I put a text file on my server named "junk" (and then tried again with it named "Junk") and entered a lot of different URLs. I cannot get WP to behave in any unexpected way.


http://example.com/junk/something => 404 page
http://example.com/junk/ => 404 page
http://example.com/junk => displays the text from the text file, with no WP wrapper, assuming the file is "junk". If it is "Junk" it displays a 404. This works this way regardless of case.

That's as expected and I could not replicate the behaviour you mention at all.

There are some questions I have based on a couple of things you mention in your post. First though, let me give some background on how WP does rewrites to bring other members who might be trying to follow this up to speed. Then I'll come back to the specifics.

Background

WP takes any incoming URL and rewrite it to index.php with certain exceptions

So for WP, if you're using pretty permalinks or whatever WP calls them, the default .htaccess file that WP generates is


# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

So basically, this
- turns the rewrite engine on
- tells it to use site root as the base URL (this would be different if you had WP in a subdirectory like example.com/blog)
- tells it not to rewrite if the requested URL is a file or directory

The expected behavior would be that if you have a file named myfile (no extension) and you request

http://example.com/myfile

that request would not invoke WP at all, but would serve up the file, because it would be an exact match.

If the request does not precisely match a filename or directory name (case, extension, etc), it hands the request off to WP which parses that request. At this point, everything WP generates should come from the database. It shouldn't be including any raw files, except insofar as the generated HTML has links to files (as in the src attribute of an img tag).

What JS is saying is that something whacky is happening that interferes with this process.

Specifics

You mention


any file without an extension or with it's extension hidden by settings

Of course, it makes sense that "any file without an extension" would get sent to the browser if the url ended in that filename with no slash or nothing following it. It's the fact that in your case it ends up wrapped in a WP template that is bizarre.

The "extension hidden by settings" part is confusing to me. What settings hide an extension?

Wordpress ignores the capital letter

I poked around. All I could see is that WP will basically treat upper and lower case hostnames the same (as it should), but I don't think it does anything to the case of the URL. Obviously, I haven't looked everywhere and I certainly don't know the WP code well enough to say it absolutely wouldn't do that, but I think again that this is something set on your server (e.g. in your rewrites).

Am I following correctly that your file is named "Myfile.txt" with an uppercase initial letter and a .txt extension, but when you request "http://example.com/myfile (with no extension and all lower case), you're getting that file included as though it were a WP page? (er "post" I suppose in WP jargon).


you will see the text article as full html wrapped in a wordpress header

That part is strange to me too. I wonder, do you have a custom 404 page that tries in some way to include the requested file?

Security

Just one last note. I understand the frustrating debuggin issue - I had a similar headache recently on a site where the maintainer has a mishmash of static HTML files and drupal. He added a file that caused a namespace collision and half his dynamic pages went 404.

That said, this isn't really a security issue. The fact that a text file in the public_html directory on your server can get displayed is hardly cause for alarm. If it were a config.php file that could get displayed as straight, unparsed text, *that* would be an issue. But in this case, all that's happening is that the server is sending the requested file to the browser. If you don't want that file sent to the browser, it shouldn't be in public_html.