Forum Moderators: phranque
Internet Explorer 6 will parse a jpg file as a HTML file if the jpg contains HTML content.
I renamed a .html file to a .jpg and called the file through Internet Explorer. I was able to view the HTML content in the .jpg file.
This does not occur on Firefox.
Is there a htaccess rule that will prevent all browsers from reading HTML content from a JPG?
TIA
I renamed a .html file to a .jpg and called the file through Internet Explorer. I was able to view the HTML content in the .jpg file.This does not occur on Firefox.
Is there a htaccess rule that will prevent all browsers from reading HTML content from a JPG?
The end result is that you did NOT have a JPG.
There are many software's that have an option for reading a file type in the capacity that it actually is rather than what type it is named.
Don't comprehend how you'd expect Apache to overcome the users software viewing options, which are beyond your control?
Just because I renamed a PDF to an RTF, that doesn't change the format of the original PDF in any way.
This should mean that both Firefox and IE6, along with most other browsers, will render the HTML you saved as .jpg as HTML. No more broken display in Firefox.
Now, understand this:
1) .*** in a filename does not define a filetype, it is just part of the filename which can sometimes be used to indicate filetype on a dos/windows system
2) http://example.com/something.*** is not a filename, it is a URI, an identifying string which refers to a request for something, not necessarily called something.***, not even necessarily a document at all
Basically you are relying on two things which are not true and claiming that the browser implementation is incorrect. If you want to ensure only jpeg files end in .jpg then you need to determine the filetype of the file by inspection (magic characters work well here, many libraries available) before you allow the upload, or before you permit the download.
Next, when it comes to serving the data, you need to configure your server to send the right Content-type header. Frequently it will handle windows-style extensions well but in general it is best to set these headers yourself.
In all probability it is not your .jpg extension which is being ignored, it is your server's default configuration which is serving a .jpg as Content-type: image/jpeg which is being ignored; and for good reason! The Content-type header is wrong and browsers do their best with what they get. If you send what's obviously HTML with a Content-type of image/jpeg then don't be surprised if the browser assumes you made a mistake and renders it as HTML.
However, as you say, the problem is caused by Internet Explorer ignoring the Content-Type header sent by the browser, and instead trying to figure out what the content-type actualy is, based on the content itself. Since this behaviour is controlled by a little-known client-side configuration setting, there really is no cure except to scan uploaded .jpg files for HTML content and reject those .jpg uploads that seem to contain HTML.
This in itself is potentially difficult, because the content would have to be judged "valid enough" to render as HTML, but not necessarily completely-valid, in order to prevent false positives due to the (apparently-random) contents of a valid .jpg image, while avoiding false negatives when a spoofed .jpg file containing HTML is uploaded.
Jim
As a developer, there is no valid reason for you to upload a .jpg as .html (other than testing as you've done).
So, I'm presuming you are having trouble with hackers abusing some system in which they are uploading html as a .jpg to spread viruses or phishing spoofs of some kind?
If this is the case,
there really is no cure except to scan uploaded .jpg files for HTML content and reject those .jpg uploads that seem to contain HTML.
A good approach is as always, don't try to reject what is "bad", only allow what is "good" data. For example, one of the benefits of the the ImageMagick module is that when you upload an image, you can tell it to do what JD suggests - it reads in the image and if it's not a valid image type, will return an error. Doesn't matter what the extension is.
Another scenario, if someone's uploading html via your system, they can just as easily upload an executable with a .jpg extension. Read the file with ImageMagick and it will reject it because it's not a valid image format.