homepage Welcome to WebmasterWorld Guest from 54.196.24.103
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
mod pageSpeed seems incompatible with anti-hotlinking code
MichaelBluejay




msg:4421054
 7:55 pm on Feb 23, 2012 (gmt 0)

Ever since installing mod pageSpeed, my anti-hotlinking code doesn't seem to work. My images are loaded just fine on other people's sites.

pageSpeed optimizes my images and rewrites the url, so the new url might look like this:

http://example.com/originalFileName.gif.pagespeed.ce.9-w9vHbyfP.gif

Here's my .htaccess code:

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !mydomain.com [NC]
RewriteRule \.(gif|jpg|png|ico)$ ht tp://example2.com/istealimages.gif [L]

(I added a space to "ht tp" so this forum wouldn't interpret it as a url.)

Any ideas?

 

lucy24




msg:4421170
 12:55 am on Feb 24, 2012 (gmt 0)

I added a space to "ht tp" so this forum wouldn't interpret it as a url.

You could have achieved the same result by calling it simply "example.com".

If the rewritten url still ends in the appropriate extension, it shouldn't make any difference.

But this is a third-party module, right? So you'll need to give a little more information. Start by studying their own documentation.* Breaking hotlink routines is a pretty big drawback; the module itself should have something to say about it. If all they say is how to install, you may have to write the second chapter of the documentation yourself :)


* Sounds nicer than "rtfm" doesn't it ;)

MichaelBluejay




msg:4424637
 3:42 am on Mar 4, 2012 (gmt 0)

Why would you think I didn't look through the docs? If I'd found what I was looking for in the docs [code.google.com] I wouldn't have posted here. Maybe it's there and I missed it, but I couldn't find it.

To clarify: If you load one of my pages in a browser, you see the images just fine, but the image url's have been silently rewritten to the optimized version. When someone hotlinks one of my images, they're using the longer, rewritten url, because that's what they see. I can get .htaccess to catch links to the physical files on my server, but that's not what others are linking to; they're linking to the rewritten urls, which I can't seem to catch with .htaccess.

My access log does show the requests for the rewritten files, and correctly shows the referrer as the foreign site. However, the log shows a status code of 304, which means file not changed. Editing the file doesn't help, because Mod_pagespeed then just optimizes the new file and gives it a *new* rewritten url, and the old image is still served up at the old rewritten url. (For how long, I have no idea.)

Any ideas of a test I can try?

wilderness




msg:4424645
 4:27 am on Mar 4, 2012 (gmt 0)

just a thought.

Is it possible you could use the "random UA" syntax (denial) for a "random refer"?
Adding an exception for when the requests come from you own domain?

MichaelBluejay




msg:4424653
 5:45 am on Mar 4, 2012 (gmt 0)

wilderness, thanks for the suggestion, but I don't understand at all what you're suggesting?

lucy24




msg:4424654
 5:47 am on Mar 4, 2012 (gmt 0)

If you load one of my pages in a browser, you see the images just fine, but the image url's have been silently rewritten to the optimized version. When someone hotlinks one of my images, they're using the longer, rewritten url

Wait. Something isn't adding up here. Possibly it's got something to do with that g-word in the address of the docs. What does "silently rewritten" mean? Are the images being rewritten, or are they being redirected?

If it's a rewrite, then the hotlinkers don't know what the "real" URL is; they only know the file's official name. (I detoured here to verify this with my own hotlinking routine. The address bar says X while the content I'm seeing is Y.)

If it's a redirect, does that mean that every request for an image is really two separate requests? First the "official" name (as given in your html) and then the "real" name (the long one)?

It would be a ### of a lot easier to code if the mod used rewrites rather than redirects, because then all you'd have to do is block external requests for the long complicated name.

Can you disable the page speed module for selected directories within your site? If so, we will be able to figure something out. It may involve intercepting redirects.

Oh, wait. IMPORTANT. Does the page speed module run before or after mod_rewrite? If you don't know offhand, you can make up a test to figure it out. It generally isn't hard.

MichaelBluejay




msg:4425267
 10:25 pm on Mar 5, 2012 (gmt 0)

Thanks lucy24, and I realize I should have explained how mod_pagespeed works better.

mod_pagespeed does lots of various things to speed up page-loading times. One of those things is to optimize image files, making their filesizes smaller so they load faster in browsers. It generates a new url for the rewritten images, and inserts that url into the <img src> of the pages that are being served, replacing the original url. For example, when I upload pages to my server, the image code looks like this:

<html>
...
<img src=originalFilename.gif>
...
</html>

With mod_pagespeed on, the pages are actually *served* like this:
<html>
...
<img src=originalFileName.gif.pagespeed.ce.9-w9vHbyfP.gif>
...
</html>

Anyway, I posted my issue on the Google Group for mod_pagespeed, and the developers said that mod_pagespeed is incompatible with anti-hotlinking code, but they'll consider a code change to mod_pagespeed that will make anti-hotlinking code work in the future. (I think that's what they said, I'm not technical enough to understand the reply 100%).

lucy24




msg:4425367
 3:31 am on Mar 6, 2012 (gmt 0)

How does this work with rewrites? Does the mod simply prevent image files from being rewritten? Or only if there's a change in extension?

Back at the beginning, you had hotlinkers being rewritten to

example.com/istealimages.gif

(Is that what the image itself says? I like it! :)) Does the rewrite simply not happen?

Incidentally, belated question: How do you know that the "real" images are loading? Testing hotlinks can be tricky; I usually have to switch off the null-referer exception, take a quick look and then switch it on again before the next lawful search engine comes through. And identifying rewrites in logs is equally tricky, especially if you don't know how big the "right" file is. If the original is a 3MB jpg and your logs say that 2K was served up, you can safely assume it's your no-hotlinking image and not any kind of file compression.

MichaelBluejay




msg:4426824
 6:32 am on Mar 9, 2012 (gmt 0)

lucy24, normal image files are rewritten just fine. But RewriteRule simply doesn't act on any of the images that have gone through the pagespeed compressor. It doesn't match them, it doesn't see them. According to what the developers said, that's intentional, to avoid compatibility problems. mod_pagespeed doesn't kill .htaccess, it doesn't kill mod_rewrite in general, but it doesn't allow mod_rewrite for the images it optimizes and gives the long, cryptic filenames to.

Yes, my filename is "istealimages.gif". And the image itself is a huge black box with yellow letters saying, "I steal other peoples' images."

I'm not sure I understand your last question, but maybe this answers it: If the original file was "originalFileName.gif", what's served (and what I see in the log files is "originalFileName.gif.pagespeed.ce.9-w9vHbyfP.gif".

lucy24




msg:4426840
 7:24 am on Mar 9, 2012 (gmt 0)

I wondered whether they're getting the "real" originalFileName.gif.pagespeed.ce.9-w9vHbyfP.gif or a rewritten version. But if the mod is made to circumvent rewrites, that's an answer on its own.

If you ever look at "normal" logs that don't have the mod installed, you'll notice that hotlinks don't show up as requests for "Istealimages". That's the difference between a redirect and a rewrite. They come through as successful requests for the original picture-- but the filesize (the number right after the 200) matches the hotlink image instead.

Same thing happens when people are blocked from a particular page. It's unnerving to see such-and-such given as a referer when they were never allowed to go there-- or the page simply doesn't exist-- in the first place. (I'm talking here about real humans following real links, not robots with forged referers.) They did it by clicking a link in the 403 or 404 page, just the way they're supposed to.

Conclusion: rewrites are weird ;)

MichaelBluejay




msg:4426978
 2:00 pm on Mar 9, 2012 (gmt 0)

I wondered whether they're getting the "real" originalFileName.gif.pagespeed.ce.9-w9vHbyfP.gif or a rewritten version.


Well, it's easy enough to see what the requesting page is getting. In my browser, I just control-click on the image and choose "Copy image url" (or "Open image in a new window").

I would never see a request for "istealimages.gif" on my first domain, because that image is on a separate domain.

I do see both rewrites and redirects in my log files, as 301's and 302's respectively.

lucy24




msg:4427024
 4:03 pm on Mar 9, 2012 (gmt 0)

I do see both rewrites and redirects in my log files, as 301's and 302's respectively.

Uh.... No. What you've got there is a permanent redirect (301) as opposed to a temporary redirect (302). Rewrites don't appear in logs.

Make sure those 302s are intentional. A temporary redirect tends to be the default, but it's hardly ever what you want.

Well, it's easy enough to see what the requesting page is getting. In my browser, I just control-click on the image and choose "Copy image url"

That's what the requesting page thinks it's getting. What it's really getting is the picture you see with your eyeballs. It works just like a 403 page: the address bar gives the name of the page you tried to get to (for humans, this is most likely a directory that has no index), but the screen shows the 403 text.

MichaelBluejay




msg:4429718
 9:33 pm on Mar 15, 2012 (gmt 0)

Uh.... No. What you've got there is a permanent redirect (301) as opposed to a temporary redirect (302). Rewrites don't appear in logs.

Okay, so not everything invoked with the RewriteRule command is actually a rewrite. That's certainly confusing.

For those trying to follow along, if you put [R=301,L] after your RewriteRule command, then it's a redirect, and both the original request and the file redirected to shows up in the logfile.

If you put only [L] after your RewriteRule command, then it's a true rewrite, and the original request doesn't show up in the logfile, just the rewritten request.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved