homepage Welcome to WebmasterWorld Guest from 54.166.255.168
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Attempting to fix hotlinking
LunaC

5+ Year Member



 
Msg#: 4550378 posted 11:06 am on Mar 2, 2013 (gmt 0)

Could someone please take a look at this and see if there's any errors or things that will cause problems. Htaccess isn't my strong point and the odds of me doing something terribly wrong with it is remarkably high. :-/

What I'm trying to do is this:

Thumbnails:
Allow any embedded (hotlinked) thumbnails, from certain sites, to be shown normally.
(The actual image is watermarked, the watermark strip is hidden on my site with negative margins in css.)

Preview images:
Indexed, allow hotlinking in Google images (if needed), but send direct links to bring-them-home.php
By direct links I mean how the image is shown alone like when "View Original Image" is clicked. url bar= example.com/image.png.
(These images have a "Preview" stamp on my site, which makes sense... it's so the visitor can see in print preview that the real (high res, without watermark) is the version going to print... it has no stamp or watermark of any kind. If they see "Preview", something went wrong and they know not to send it to the printer.)

High-res and zips:
Always send (except from selected sites and user-agents) to bring-them-home.php for sorting.


bring-them-home.php basically just sorts them, it grabs the image url and looks for a matching parent page.
If found, 302 visitor to parent (302 since the image is not really moved, just not allowed to be seen as is, right?).
If the script fails, send to 404 page (header sends proper 404).
If no matching parent page redirects to a 404 (not graceful, it 302's to the 404 but I'm not a php coder so this is the best I've come up with so far.)



How the images are set up:
name-thumbnail.png, name-preview.png, name-high-res.png, plus a name.zip.


The high-res copy and zips are disallowed from indexing with:

<filesMatch "high-res.(png|jpg)$">
Header set X-Robots-Tag "noindex,noimageindex,noarchive"
</filesMatch>
<filesMatch "\.(xml|txt|zip)$">
Header set X-Robots-Tag "noindex,noarchive"
</filesMatch>



Here's what I've written so for the htaccess:

RewriteEngine On

# Allow blank referrer + web & test servers
# Lucy mentioned in another post that this is better than !^$ .. but why, what exactly does it mean?
RewriteCond %{HTTP_REFERER} !^-?$
RewriteCond %{HTTP_REFERER} !example\.dev:8888
RewriteCond %{HTTP_REFERER} !example\.com

# A few others...

# Send Google direct links (stand-alone) image traffic to send-them-home.php
RewriteCond %{HTTP_REFERER} google.(.*)/blank.html

# But allow other google traffic? Will that work with the rule above?
RewriteCond %{HTTP_REFERER} !google.

# Allow Cache
RewriteCond %{HTTP_REFERER} !search?q=cache

# allow some user-agents
RewriteCond %{HTTP_USER_AGENT} !Googlebot
RewriteCond %{HTTP_USER_AGENT} !msnbot
RewriteCond %{HTTP_USER_AGENT} !bingbot
RewriteCond %{HTTP_USER_AGENT} !Slurp
RewriteCond %{HTTP_USER_AGENT} !Teoma


# allow certain images no matter what
RewriteCond %{REQUEST_FILENAME} !-thumbnail.jpg$
RewriteCond %{REQUEST_FILENAME} !-thumbnail.png$

# for now until I understand this better, allow previews to be hotlinked etc.?
# will the previous rule for Google be run and this skipped or will they trigger this?
# Not sure about this at all.. feels icky to hand these over to hotlinkers / scrapers
# advice please?
RewriteCond %{REQUEST_FILENAME} !-preview.jpg$
RewriteCond %{REQUEST_FILENAME} !-preview.png$

RewriteRule ^([^.]+\.(png|jpg|zip)) /bring-them-home.php?file=/$1


What I'm really fuzzy on:

1) Is there any way (with php or htaccess) to tell if an image is hotlinked (src=image.png) vs. directly linked (href=image.png)


2) Is google.*/blank.html only a referrer on a direct (stand alone) image or is it also triggered with hotlinking (image embedded in external page)?

So would the code say, "if you came from google images, go to the sorting page no matter what" (which means embedded images would be broken :-/, or does this allow the hotlinking and only send if they are directly linked in a stand-alone window (ie. url bar shows image.png)?


Will Google accept the redirect for visitors or am I playing with fire? ie, risking getting kicked out of image, and maybe even web search?


Really, the user experience is terrible right now. "view original image" just opens the image.. no navigation, no context... so all they see is a web-resolution image with "Preview" stamped on it. Totally unusable... yet the content they were looking for is on the site... and no, I have no interest in just giving up and allowing Google to take the print versions.

My 403's have gone through the roof by people going "up" a level from the current image folder (Options All -Indexes, no interest in letting people wander image folders). I'm guessing they're trying to get to the content... So yeah, Thanks Google and Bing for making a terrible visitor experience, a mess in the logs and a huge loss of traffic. >:( (Oh it feels good to let a bit of that out!)

This stuff is breaking my head, I'm more of a graphics / css kind of person... so any help is extremely appreciated.

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4550378 posted 1:20 am on Mar 3, 2013 (gmt 0)

this is better than !^$ .. but why, what exactly does it mean?

You mean the -? tucked into the middle? It's insurance. Look at your logs and you will see that any blank element is recorded as "-" rather than, uh, nothing at all. This is probably just the logging software's way of noting "This category has no content". But one time I found a bona fide "" in logs so I figured it's safer to leave the -? in place.

In general the two forms
^$
and
!.
are synonymous.

2) Is google.*/blank.html only a referrer on a direct (stand alone) image or is it also triggered with hotlinking (image embedded in external page)?

Somewhere in the middle of that long long thread on image search I went off to experiment and came back with a depressing discovery: the image is pre-loaded, not just for Chrome but for everyone. The pre-load comes with the human user's IP and UA, but no referer. So there's no way for htaccess by itself to tell what's going on.

The only way to elicit the /blank.html referer-- the dead giveaway that you've got a "View Original Image" click --is if you start by serving up an uncached image. And, of course, hope that the user's browser obeys. You can then serve up a different image to users who click the link.

My 403's have gone through the roof by people going "up" a level from the current image folder

This belongs in large bold type, though not necessarily in this thread. People who have been in the www business for a long time forget that to an ordinary human, 403 doesn't mean "You evil robot, get back to the Ukraine and stay there!" It simply means "Sorry, no dice, I'm not letting you see the index of this directory". There's no straightforward way to make different 403 pages depending on the nature of the request. ("Straightforward" = without resorting to php.) But if it's practical you can make directory-specific 403 pages that include links to the gallery pages that you want your humans to visit.

LunaC

5+ Year Member



 
Msg#: 4550378 posted 8:42 am on Mar 5, 2013 (gmt 0)

So that's what the -? means, thanks, that makes sense. :)

As for the blank.html thing.. well that is depressing. There are Wordpress plugins that seem work, problem is I don't use Wordpress. The redirect part is written and working fairly well according to local server tests (that alone took a week, I'm not a php coder). I just need a way to be able to tell if they are heading to/on a stand-alone image page.

You're right about forgetting that all 403's aren't evil. My first reaction was panic, thought the site was under attack or something. The warrior woman in me came out with swords gleaming... until the realization that they aren't all evil bots, many are just lost people trying to find their way home. :/

That site now has a very friendly 403 page, so some are finding their way back... doesn't help the mess in the logs, but at least if they get "up" an level they can navigate around. I'll do more with that later if I can't find a way to redirect stand-alone images, probably 301 them to the matching (real) page folder. Still though, even with many going "up", it seems far more are just hitting the back button (guessed from looking at the logs).


Quite off-topic, but it needs to be said...

Yesterday I got an angry email from a visitor saying my site "used to be easy" to use, but now they come in from google and "sometimes" get sent to "just a stupid picture that can't be printed because it has words on it and no help".

Yes, they could click "visit site" instead of "view original image" but many aren't. Some don't know the difference, and really why should they be expected to? The web should be simple to use. It's up to designers and developers to make it user-friendly. Visitors shouldn't have to think, they should just enjoy the content people have created for them.

The hotlinking is inexcusable, but sending people directly to a solitary image, with no page and no context make me red with anger. Making it difficult (perhaps impossible) to redirect visitors where they were (often) trying to go... polite words fail to describe.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4550378 posted 9:17 am on Mar 5, 2013 (gmt 0)

Yah, that riled me too. How can it possibly be good for the user experience to send people into a dead end where they can do absolutely nothing except hit the Back button?

Which, hm, sends you back to g### search-- and makes your site look bad according to their own everyday formulas. ("How quickly did user return to Search after visiting requested site?") Did someone just not think this through all the way?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved