Did Google hit my images for Panda?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Did Google hit my images for Panda?

Johan007

8:58 am on Jun 9, 2015 (gmt 0)

I think I have discovered the real reason for the next Panda update. In May all 50MB of my images had been hit by Google. This is highly unusual as they are excluded via robots for Google image search. I can only speculate that they will be using this data to look for unique images to form part of the quality score. IMHO Panda is purely about quality through uniqueness.

Since most of my images come from press associations I am legally obliged to use the images provided. Thankfully this goes for my niche.

aristotle

1:01 pm on Jun 9, 2015 (gmt 0)

In May all 50MB of my images had been hit by Google. This is highly unusual as they are excluded via robots for Google image search.

I don't understand this. Are you saying that Google fetched your images even though your robots.txt prohibits this?
If not, what does it mean?

Johan007

2:14 pm on Jun 9, 2015 (gmt 0)

Are you saying that Google fetched your images even though your robots.txt prohibits this?

yes.

netmeg

4:03 pm on Jun 9, 2015 (gmt 0)

What's the statement in robots.txt?

Johan007

8:53 pm on Jun 9, 2015 (gmt 0)

User-agent: Googlebot-Image
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.svg$

The above is to exclude from image search. I presume Google still needs to read the images even if not for Panda like I suspect then for at least mobile UX scoring such as page speed.

A page with unique image should in theory stand a better chance of being quality (Panda). The timing could just be co-incidence. The site in question is a panda effected site.

nikhilrajr

5:13 am on Jun 11, 2015 (gmt 0)

I am seeing all my 6 image subdomain crawl stats going higher than usual from May 22. 3 of the image subdomains are used only when the request is made from UK. And guess what Googlebot is crawling those 3 as well! Those 3 averaged at 350/day now at 939/day.

[edited by: Robert_Charlton at 8:43 pm (utc) on Jun 11, 2015]
[edit reason] fixed date, per poster correction [/edit]

fathom

10:38 am on Jun 11, 2015 (gmt 0)

A page with unique image should in theory stand a better chance of being quality (Panda). The timing could just be co-incidence. The site in question is a panda effected site.

2 + 2 = 475,959,069 eh?

I would first resolve your malform robots.txt file before speculating on what Google is up to.

User-agent: Googlebot-Image
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.png$
Disallow: /*.svg$

Reference: [robotstxt.org...]

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

Also your image references are all malformed by the $.

disallow: the folders where your images are is all you need to do.

Johan007

12:22 pm on Jun 11, 2015 (gmt 0)

fathom how rude mate using some website build in 1995 and update in 2007 as some sort of evidence ;-)

Please read what all search engines including Google, Bing, Yahoo, and Ask are doing on this link:
[developers.google.com...]
Look for:

$ designates the end of the URL
* designates 0 or more instances of any valid character.

...lets get back on topic (speculation that images are being read for the next panda update).

not2easy

12:43 pm on Jun 11, 2015 (gmt 0)

The robots.txt as shown follows Google's recommendations as shown here: [support.google.com...]

aristotle

12:52 pm on Jun 11, 2015 (gmt 0)

Apparently you're only blocking Googlebot-Image, but Google has other bots that could fetch your images too.

Johan007

1:28 pm on Jun 11, 2015 (gmt 0)

aristotle that is exactly true. Googlebot-Image is for Google image search only. So the purposes of all my images being spidered is as I speculate for the next panda update.

fathom

1:51 pm on Jun 11, 2015 (gmt 0)

Right or wrong might be worth testing your directives [support.google.com...]

Not sure how this theory works ... PANDA itself devalues poor quality content that was manipulating its quality controls... thus allowing the main algorithm to score and rank higher quality content.

Ignoring all robots.txt directives to do this in reverse sounds odd.

Frankly my bet is still on a bad robots.txt configuration but I'm not an Ajax App expert.

Did you switch to https:// recently? If your robots.txt file is still http:// that's the likely problem.

fathom

2:36 pm on Jun 11, 2015 (gmt 0)

As pointed out by not2easy

Files of a specific file type (for example, .gif):

User-agent: Googlebot
Disallow: /*.gif$

So I stand corrected. Just include a directive for Googlebot only. Then the speculation about PANDA can end.

seoskunk

10:30 pm on Jun 11, 2015 (gmt 0)

Then the speculation about PANDA can end.

Ermm google have the technology to read meta data in images and other files. Could they use this as a quality signal. Yes
They have been using the technology already on youtube for copyright infringement.

fathom

10:56 pm on Jun 11, 2015 (gmt 0)

That would be the main algorithm that ranks pages not the PANDA one that devalues them.

If PANDA detects anything ranks decline or PANDA doesn't rank might improve but not because of your images forced PANDA to ignore your ranking violations.

lucy24

11:25 pm on Jun 11, 2015 (gmt 0)

This (from [support.google.com...] is fine:

User-agent: Googlebot
Disallow: /*.xls$

This is arrogant and irresponsible:

User-agent: *
Allow: /*?$
Disallow: /*?

some website build in 1995 and update in 2007

Uhm, that "some website" is THE definitive reference for robots.txt matters until someone says otherwise. That "someone" is not Google ... except when one is specifically talking only about directives meant to be understood by the assorted Googlebots.

fathom

12:01 am on Jun 12, 2015 (gmt 0)

Uhm, that "some website" is THE definitive reference for robots.txt

While I agree it isn't worth arguing the point. Google even reference it several times.

Helping to fix flaws is more than enough.

tangor

2:03 am on Jun 12, 2015 (gmt 0)

Since most of my images come from press associations I am legally obliged to use the images provided.

Pretty sure that G already knows this and does not count it for or against quality.... probably discounts./ignores it altogether.

I do know that "gogglebot" and "gogglebot-image" are two different critters and wonder why one would be okay for pictures and the other is not (in the mind of a webmaster). On sites were images are denied, ALL of them are denied, just to keep it simple. Am I missing something?

JS_Harris

8:27 am on Jun 12, 2015 (gmt 0)

tangor, you can block googlebot-image and still get images ranked. If you upload an image to your server and paste it's url onto a page without rendering the image googlebot finds it and ranks it just fine even if you have googlebot-image blocked.

About the subject, images can indeed cause Panda issues, I know of a site experiencing this specific issue. The reason is because the images being displayed are all copies from the internet and the exif data was stuffed by an image sharing site to be copyright, property of, created by, authored by etc that social sharing site. In other words this website is showing images protected by copyright that strictly say they are not free to share.

Check Google image search, you now have an option to search by user rights, at the rights information is emdeded in the images themselves.

fathom

10:04 am on Jun 12, 2015 (gmt 0)

About the subject, images can indeed cause Panda issues, I know of a site experiencing this specific issue. The reason is because the images being displayed are all copies from the internet and the exif data was stuffed by an image sharing site to be copyright, property of, created by, authored by etc that social sharing site. In other words this website is showing images protected by copyright that strictly say they are not free to share.

First correlation does not imply causation.

There are so many things that one can do to get themselves into hot water that Google specifically notes and I find is odd that a site owner would intentional copied images alone merely to prove PANDA would devalue a domain simply because the copyright were uploaded. Where did they come from... The original owner gave premission in an email that Google doesn't have access to... So unless a DMCA claim was received manually (which PANDA has no likely involvement) thus Google did something manually still does support the theory here.

Also I can't believe if you intentionally prevent Google to crawl an image it would intention crawl it so to intentionally devalue you because you took steps to avoid this.

Science Fiction isn't Science Fact IMHO.

Johan007

12:49 pm on Jun 12, 2015 (gmt 0)

Uhm, that "some website" is THE definitive reference for robots.txt matters until someone says otherwise.

We can all safely say otherwise since no major engine is using the outdated rules.

Also I can't believe if you intentionally prevent Google to crawl an image it would intention crawl it so to intentionally devalue you because you took steps to avoid this.

I agree, I think the only way of doing this is to add /images/ folder into User-agent: *.

User-agent: Googlebot-Image is purely for image search.

fathom

8:46 pm on Jun 12, 2015 (gmt 0)

Yes Johan007 the more explicit your directives the better. Course if you had an explicit user-agent like:

user-agent: Googlebot
disallow: /images/

That curbs all PANDA interactions since Google is the only concern.

tangor

3:25 am on Jun 13, 2015 (gmt 0)

tangor, you can block googlebot-image and still get images ranked. If you upload an image to your server and paste it's url onto a page without rendering the image googlebot finds it and ranks it just fine even if you have googlebot-image blocked.

@JS_Harris

Valid points (your post), but I still remain confused (as an educational aspect, I know what I AM DOING AND WHY) that a Panda PICTURE penalty might even exist, under the specifications of the OP.

Site were I control all images and transmissions (Thing Outer Limits, the old TV show) Panda could never even have a clue, much less access! :)

Pretty sure that news sites, Getty users, e5t al, are (perhaps) displayed in gogglebot-image searches (Google image, Bing, too) and discounted for any kind of ranking value as THAT SITE is not the originator, much less the copyright holder, for those ACROSS THE WEB images.

I think we can all agree that it is highly unlikely that IMAGES are part of the Panda process.

Johan007

5:05 pm on Jun 13, 2015 (gmt 0)

I think we can all agree that it is highly unlikely that IMAGES are part of the Panda process.

As I started this topic I have change my mind and come to the same conclusions and agree. Google reads images for other reasons perhaps to give credit to unique images and as part of any Page Speed and Mobile optimise reasons.