Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to block Google for everything but images?

         

HuskyPup

11:23 am on Jun 2, 2011 (gmt 0)



Hypothetically if I wanted to remove everything from the normal SERPs but keep images in those results, what would be the best method? As I understand it the following applies:

Robots META Tags - Googlebot

1. <meta name="googlebot" content="noindex">

Do Not index / Do follow links

2. <meta name="googlebot" content="nofollow">

Do index / Do Not follow links

3. <meta name="googlebot" content="noindex, nofollow">

Do Not index / Do Not follow links

4. <meta name="googlebot" content="noarchive">

Do Not cache

5. <meta name="googlebot" content="nosnippet">

Do Not display snippet / Do Not cache

6. <meta name="googlebot" content="noodp">

Do Not show ODP description

Do I also assume that if I wanted to block Google completely it would be:

<meta name="googlebot" content="noindex, nofollow, noarchive">

Apologies if this may seem dumb however I've never had to do this before.

aristotle

4:59 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't know much about this either. But I sometimes see a special Google Image bot in my logs. It looks like this: Googlebot-Image/1.0

But the regular Googlebot also sometimes fetches images.

You can use robots.txt to block Googlebot from the entire site, if that's what you intend. But I don't know if that would also block Googlebot-Image/1.0 too. So

indyank

5:07 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No no no.Don't use robots.txt as it isn't going to help in removing content from their publicly visible index (though they might store them in the backend).

You will have to use noindex. You may consider noindex,nofollow as you don't want any page to be in their index and there isn't any use for links to be followed.That way, you may even save some bandwidth :)

aristotle

5:09 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But if you use noindex on a page, doesn't that also noindex all the images on the page?

indyank

5:12 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think noindex is for the search index and not for the image index.They are two different bots as you stated and I doubt whether google image bot will obey the noindex meta tag.Interesting question though.

tedster

5:34 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It is an unusual situation - can't say I ever ran into it, either. Here are some of the bits I think I know:

1. You do want to allow Googlebot-image
2. You don't want to allow googlebot
3. The two bots do share a cache, in the case where regular googlebot downloads an image/

As far as I know, there is no dedicated Googlebot-image meta tag. Adding even more confusion, in robots.txt where a separate directive is possible for Googlebot-image, my understanding is that disallowing googlebot in robots.txt disallows every bot from Google.

So - I may have only added to the "confusion" with these tidbits, but I hope it at least moves the discussion forward.

tedster

5:42 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's another approach that "might" help. Use Google's new "immediate temporary URL removal" tool from within WebmasterTools - see this thread [webmasterworld.com] for more details.

Then you can use robots.txt to disallow future crawling and make the URL removal "permanent" instead of 90-days temporary.

However, this still leaves the question of Googlebot-image. in the best of all worlds, I would like to see this set of robots.txt rules be effective, but I can't guarantee it at all - no experience with it:

User-agent: Googlebot-Image
Disallow:

User-agent: googlebot
Disallow: /

indyank

5:50 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is something I can add to this. I did a test where i allowed Googlebot-image to everything but blocked googlebot to the folder in which images are stored. This prevented images from appearing in google images index.So, it does look like googlebot plays a role in the way images are indexed.

There is yet another thing which confirms this. Currently, google images hotlink images on your site and they do have complete pages in their index.If you add a frame buster script to your pages, browsing images via "google images" will take you directly to the site and the script does help in breaking the frame.This means the entire page is indexed.

So, it does look like you cannot block one (googlebot) while allowing the other (Googlebot-image). Google does tie up several things and you either use them for all or don't use them at all.

tedster

6:07 pm on Jun 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for sharing your experience indyank - just as I feared, then.

indyank

2:01 am on Jun 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tedster and Husky, sorry for the confusion.I failed to mention that hot link protection was on without making any exceptions for google and that could have been a reason for images not appearing in the "Google images" index.

So, this should work.Add noindex to all pages and after they are all removed you may choose to block googlebot using robots.txt while allowing the image bot.

But google imagebot do keep entire pages in their index and not just the images. If you are comfortable with that, you may try this approach.

HuskyPup

9:25 am on Jun 3, 2011 (gmt 0)



Thanks for the responses, as I wrote this is purely hypothetical at the moment therefore I wondered whether it were possible.

I may have to try it out with a small site to see if it can be done successfully.