Could someone please take a look at this and see if there's any errors or things that will cause problems. Htaccess isn't my strong point and the odds of me doing something terribly wrong with it is remarkably high. :-/
What I'm trying to do is this:
Thumbnails:
Allow any embedded (hotlinked) thumbnails, from certain sites, to be shown normally.
(The actual image is watermarked, the watermark strip is hidden on my site with negative margins in css.)
Preview images:
Indexed, allow hotlinking in Google images (if needed), but send direct links to bring-them-home.php
By direct links I mean how the image is shown alone like when "View Original Image" is clicked. url bar= example.com/image.png.
(These images have a "Preview" stamp on my site, which makes sense... it's so the visitor can see in print preview that the real (high res, without watermark) is the version going to print... it has no stamp or watermark of any kind. If they see "Preview", something went wrong and they know not to send it to the printer.)
High-res and zips:
Always send (except from selected sites and user-agents) to bring-them-home.php for sorting.
bring-them-home.php basically just sorts them, it grabs the image url and looks for a matching parent page.
If found, 302 visitor to parent (302 since the image is not really moved, just not allowed to be seen as is, right?).
If the script fails, send to 404 page (header sends proper 404).
If no matching parent page redirects to a 404 (not graceful, it 302's to the 404 but I'm not a php coder so this is the best I've come up with so far.)
How the images are set up:
name-thumbnail.png, name-preview.png, name-high-res.png, plus a name.zip.
The high-res copy and zips are disallowed from indexing with:
<filesMatch "high-res.(png|jpg)$">
Header set X-Robots-Tag "noindex,noimageindex,noarchive"
</filesMatch>
<filesMatch "\.(xml|txt|zip)$">
Header set X-Robots-Tag "noindex,noarchive"
</filesMatch>
Here's what I've written so for the htaccess:
RewriteEngine On
# Allow blank referrer + web & test servers
# Lucy mentioned in another post that this is better than !^$ .. but why, what exactly does it mean?
RewriteCond %{HTTP_REFERER} !^-?$
RewriteCond %{HTTP_REFERER} !example\.dev:8888
RewriteCond %{HTTP_REFERER} !example\.com
# A few others...
# Send Google direct links (stand-alone) image traffic to send-them-home.php
RewriteCond %{HTTP_REFERER} google.(.*)/blank.html
# But allow other google traffic? Will that work with the rule above?
RewriteCond %{HTTP_REFERER} !google.
# Allow Cache
RewriteCond %{HTTP_REFERER} !search?q=cache
# allow some user-agents
RewriteCond %{HTTP_USER_AGENT} !Googlebot
RewriteCond %{HTTP_USER_AGENT} !msnbot
RewriteCond %{HTTP_USER_AGENT} !bingbot
RewriteCond %{HTTP_USER_AGENT} !Slurp
RewriteCond %{HTTP_USER_AGENT} !Teoma
# allow certain images no matter what
RewriteCond %{REQUEST_FILENAME} !-thumbnail.jpg$
RewriteCond %{REQUEST_FILENAME} !-thumbnail.png$
# for now until I understand this better, allow previews to be hotlinked etc.?
# will the previous rule for Google be run and this skipped or will they trigger this?
# Not sure about this at all.. feels icky to hand these over to hotlinkers / scrapers
# advice please?
RewriteCond %{REQUEST_FILENAME} !-preview.jpg$
RewriteCond %{REQUEST_FILENAME} !-preview.png$
RewriteRule ^([^.]+\.(png|jpg|zip)) /bring-them-home.php?file=/$1
What I'm really fuzzy on:
1) Is there any way (with php or htaccess) to tell if an image is hotlinked (src=image.png) vs. directly linked (href=image.png)
2) Is google.*/blank.html only a referrer on a direct (stand alone) image or is it also triggered with hotlinking (image embedded in external page)?
So would the code say, "if you came from google images, go to the sorting page no matter what" (which means embedded images would be broken :-/, or does this allow the hotlinking and only send if they are directly linked in a stand-alone window (ie. url bar shows image.png)?
Will Google accept the redirect for visitors or am I playing with fire? ie, risking getting kicked out of image, and maybe even web search?
Really, the user experience is terrible right now. "view original image" just opens the image.. no navigation, no context... so all they see is a web-resolution image with "Preview" stamped on it. Totally unusable... yet the content they were looking for is on the site... and no, I have no interest in just giving up and allowing Google to take the print versions.
My 403's have gone through the roof by people going "up" a level from the current image folder (Options All -Indexes, no interest in letting people wander image folders). I'm guessing they're trying to get to the content... So yeah, Thanks Google and Bing for making a terrible visitor experience, a mess in the logs and a huge loss of traffic. >:( (Oh it feels good to let a bit of that out!)
This stuff is breaking my head, I'm more of a graphics / css kind of person... so any help is extremely appreciated.