Forum Moderators: phranque

Message Too Old, No Replies

allowing Bing Image Search in .htaccess

not working since they changed names

         

Lame_Wolf

1:03 am on Aug 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi all,

Sorry for sounding dumb, but .htaccess is way above my laymans head.

I recently added a framebuster as I was getting rather tired of people not visiting the site, and just grabbing the URL of the image. I have noticed that in the last few weeks that about 400 images have been dropped by Google.

So, I thought I would see how I was doing in Bing. There were only a fraction of images listed, and when I clicked on the thumbnail, I got a 403 page. This is not what I wanted, so I looked at my .htaccess file.

I thought it would be a simple case of changing "live" to "bing" but that did not work. I still got a 403.

Here is my .htaccess file. I know it looks messy. Someone else made it for me ages ago, but I am dumber than dumb on this side of things.


ErrorDocument 404 /404.shtml

<Files 403.shtml>
order allow,deny
allow from all
</Files>

RewriteEngine On

RewriteCond %{REQUEST_FILENAME} .*jpg$|.*gif$|.*png$ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !www.example\.us [NC]
RewriteCond %{HTTP_REFERER} !example\.us [NC]
RewriteCond %{HTTP_REFERER} !google\. [NC]
RewriteCond %{HTTP_REFERER} !search\?q=cache [NC]
RewriteCond %{HTTP_REFERER} !images.search.yahoo.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !images.search.yahoo.com$ [NC]
RewriteCond %{HTTP_REFERER} !yahoo\. [NC]
RewriteCond %{HTTP_REFERER} !search\?q=cache [NC]
RewriteCond %{HTTP_REFERER} !live\. [NC]
RewriteCond %{HTTP_REFERER} !search\?q=cache [NC]
RewriteCond %{HTTP_REFERER} !search.live.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !search.live.com$ [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]

RewriteEngine On
RewriteCond %{HTTP_HOST} ^example.us
RewriteRule (.*) http://www.example.us/$1 [R=301,L]

RewriteEngine on
# Options +FollowSymlinks

RewriteCond %{HTTP_REFERER} spaces.live.com [NC]
RewriteRule .* - [F]


What it is trying to achieve is to stop hotlinking, but allow the major search engines to work properly.

Thank you for any assistance, and help me keep what hair I have left.

jdMorgan

1:15 pm on Aug 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Look in your raw server access log, and copy the exact image-request referrer strings from Google, Bing, and any other image search that you want to allow to have access.

Only with the correct referrer strings can we even begin to discuss modifying/fixing/cleaning up your code...

Jim

Lame_Wolf

2:11 pm on Aug 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks jdMorgan,

The referrer strings are correct for Google etc. It is only from Bing - when you click on the thumbnail for the 2nd time that it 403's it.

All I need to know is what is the referrer string for Bing ? I thought it would be a simple question because many webmasters allow Google etc.

No worries. I'll put up with it. Thanks.

jdMorgan

5:06 pm on Aug 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, but if you'll look up and post that referrer string, it sure would make things easier for us -- and faster for you...

Jim

Lame_Wolf

2:17 am on Aug 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry for the delay, but i've spent a long time going through the raw logs. I eventually found one that was coming from bing.

http:// www. bing.com/images/search

Thing is, it is only doing it on certain images. Depending what image I find on Bing, it reacts differently.

It has been an extremely long time since I used Live/Bing Image Search.

This is what I have been seeing...

Go to Bing Image Search and enter Keyword.
Page displays lots of thumbnails.
Click on thumbnail
New page displays, showing a thumbnail of the image file [and points to the jpg] and below it, the URL to the page.
Clicking on that thumbnail will give a 403

Now...
Go to Bing Image Search and enter a different Keyword.
Page displays lots of thumbnails.
Click on thumbnail
My site automatically loads up, bypassing the other page.

Both pages have the following framebuster code...

<script type="text/javascript">
<!--
if (top!= self) top.location.replace(self.location.href);
-->
</script>

Both pages validate etc.
Both well established pages and images.

So, both should work when breaking from the search engines.
.htaccess seems to be allowing some, but not all images.

I am more confused than before.

jdMorgan

2:47 pm on Aug 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, if the referrer is "http://bing.com/images/search", then that's what needs to be "allowed" in the exception list.

Here's the mod_rewrite code (only) with many clean-ups, corrections, and optimizations. Be sure to completely-delete your browser cache before testing any new server-side code.

Also, if you test by 'spoofing' various referrers, be sure to delete your cache every time when switching between testing the allowed and not-allowed referrers to avoid having you browser show you stale previously-cached images and server responses.

Sometimes it is easiest/most-convenient to simply disable your browser cache for the duration of the testing session. This is usually done with a check-box in the options, or by setting the browser's cache size to zero.

RewriteEngine On
#
# Return 403-Forbidden for unauthorized hotlinked image requests
RewriteCond %{HTTP_REFERER} ^https?://(.+)$
RewriteCond %1 !^(www\.)?example\.us [NC]
RewriteCond %1 !^([^./]+\.)*google\. [NC]
RewriteCond %1 !^([^./]+\.)*yahoo\.com [NC]
RewriteCond %1 !^([^./]+\.)*bing\.com/images/search [NC]
RewriteCond %1 !^([^./]+\.)*live\.com [NC]
RewriteCond %1 !^([^/]+/)+search\?q=cache [NC]
RewriteRule \.(jpe?g|gif|png|bmp)$ - [NC,F]
#
# Return 403-Forbidden for all requests referred from spaces.live.com
RewriteCond %{HTTP_REFERER} ^https?://spaces\.live\.com [NC]
RewriteRule ^ - [F]
#
# Externally redirect all non-blank non-canonical hostname requests to canonical hostname
RewriteCond %{HTTP_HOST} !^(www\.example\.us)?$
RewriteRule ^(.*)$ http://www.example.us/$1 [R=301,L]

This puts the rules in proper order, gets rid of many redundancies, escapes all literal periods in patterns, improves efficiency by removing the "http://" from the beginning of each referrer to be tested and anchoring the patterns to be tested against, and eliminates the original code's ambiguity that would have allowed any image to be successfully hotlinked simply by including (for example) "google.com" in the URL-path or query string of the referring page.

It also improves the hostname canonicalization redirect rule to cover *all* non-canonical hostname requests.

The comments reflect exactly what each rule does. Very useful when you return to modify/enhance this code after a few months/years.

Jim

Lame_Wolf

4:30 pm on Aug 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for that Jim.

I have tested it, and it is now consistant, but not in the way I wished.

I would have preferred it if it bypassed the thumbnail/URL page altogether. Prior to your changes, it depended on what thumbnail image you clicked on, if it broke the frames or not.

I did notice that on the main thumbnails page - in the code - the ones that were bypassing the thumbnail/URL page [which is what I wanted] had this in the code...

<img class="img_pt_u" onload="_li(this);" src="http://ts4.mm.bing.net/images/thumbnail.aspx

And the ones that failed had this in the code...
<img class="img_ls_u" onload="_li(this);" src="http://ts2.mm.bing.net/images/thumbnail.aspx

I have no idea if that had a bearing on things their end.

Thanks for your help, and the .htaccess file is a lot smaller.

jdMorgan

5:59 pm on Aug 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I would have preferred it if it bypassed the thumbnail/URL page altogether.

If you will dig into your raw server access logs and get the HTTP referrers for the "thumbnail/URL page" case and the "no-thumbnail/URL page" case, and if these referrers are different, then perhaps something can be done on your end (in your server). If not, then you would have to have config access to Bing's servers to do anything about it.

To be clear, I allow no-one to cache any content of any kind from my sites, so I do not know anything about Bing's image search functions -- or anyone else's image search functions for that matter. All my stuff is copyrighted (to me) and so I cannot allow caching/copying unless I want to spend my entire life writing DMCA notices and talking to lawyers... So just like mod_rewrite on your server, I must rely on you to provide the "data" that is needed to take action: the URLs and HTTP header values (such as HTTP_REFERER) from your server logs... All the data needed to figure this out is there. And while the conclusion may be that "There's nothing you can do about Bing's interstitial thumbnail page," at least you would know...

Jim