Welcome to WebmasterWorld Guest from 34.228.41.66

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Setting canonical for images to avoid duplicates in IIS server

     
5:15 pm on Apr 22, 2015 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:Sept 26, 2012
posts: 52
votes: 0


I have majority of my traffic coming through image search. Recently due to some coding changes I have two types of internal links pointing to the same image. One with example.com/image.jpg and the other example.com/image.jpg&value=true. How can I avoid duplicates from getting into the index? I was looking at setting canonical via header for images. But how can I do that in IIS server? And I noticed from here [googlewebmastercentral.blogspot.in...] that Google supports canonical via header for web search only.
1:04 pm on Apr 23, 2015 (gmt 0)

Preferred Member from GB 

10+ Year Member Top Contributors Of The Month

joined:July 25, 2005
posts:404
votes: 16


You can't add canonical to non-html content. To solve this issue you should add a new rule to your robots.txt file telling the Image bot to ignore all images that end with this parameter:
User-agent: Googlebot-Image
Disallow: /*&value=true

Double-check that it works by testing it via GWT -> Crawl -> Robots tester
1:25 pm on Apr 23, 2015 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:Sept 26, 2012
posts: 52
votes: 0


Thanks. But the article from Google I posted above says, "Google web search now supports link rel="canonical" relationships specified in HTTP headers". And they are using an example for PDF files. The robots.txt is a good option. But for almost 50% of the web pages in my website the image urls contain the parameter &value=true. So I don't think robots.txt is the best option. Of course the best option is to get the parameters removed from the image urls. But based on the discussion with the dev team the parameter is neeeded.
2:15 pm on Apr 23, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:13007
votes: 222


But for almost 50% of the web pages in my website the image urls contain the parameter &value=true. So I don't think robots.txt is the best option.


Why not? (That's a & and not a ? ?)
4:17 pm on Apr 23, 2015 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:Sept 26, 2012
posts: 52
votes: 0


I just double checked. Sorry it's a parameter with ? and not &. Anyway will that make a difference. Curious to know!
For ecommerce landing pages which can be accessed in a lot of ways we are using canonical tags, so I thought about settinig the same for images after reading this article from maxcdn [maxcdn.com...] Am I in the wrong track here?
1:39 am on Apr 24, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11466
votes: 174


You can't add canonical to non-html content.

this is not accurate.
you can included a Link header in the HTTP Response specifying the canonical url of the non-html resource with a re=canonical attribute.
for example:
Link: <http://www.example.com/white-paper.pdf>; rel="canonical"


https://support.google.com/webmasters/answer/139066 [support.google.com]:
Google currently supports these link header elements for Web Search only.

link rel canonical is not a useful option for Image Search.

the proper solution is to refer internally only to canonical urls.
assuming you have reasons for not solving this problem, another solution is to implement an external redirect to the canonical url.
4:03 am on Apr 24, 2015 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:Sept 26, 2012
posts: 52
votes: 0


implement an external redirect to the canonical url

Didn't get that? Please help me understand it.
According to Google [googlewebmastercentral.blogspot.in]
If you’re duplicating your images across multiple hostnames, our algorithms may pick one copy as the canonical copy of the image, which may not be your preferred version. This can also lead to slower crawling and indexing of your images.

Slower crawling and indexing - This is what worries me. I will again check with the dev team about removing the parameter.
8:04 am on Apr 24, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15313
votes: 707


How 'bout the Parameters area in Google Webmaster Tools? Tell them not to crawl any URL that even contains the parameter "value"
8:56 am on Apr 24, 2015 (gmt 0)

Preferred Member from GB 

10+ Year Member Top Contributors Of The Month

joined:July 25, 2005
posts:404
votes: 16


But for almost 50% of the web pages in my website the image urls contain the parameter &value=true.

So you mean these images can't be accessed by the Imagebot in any other way? I thought you said there were two types of internal links pointing to the same image?
1:41 pm on Apr 24, 2015 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:Sept 26, 2012
posts: 52
votes: 0


lucy24, Correct me if I am wrong. I think disallow in robots.txt is similar to the option of setting not to crawl in Google webmaster tools.

This is what I think I should do: Allow Google imagebot to crawl the images with parameter but index the correct or canonical version. I am allowing Google imagebot to crawl the images with parameter so that all the authority or link juice (or whatever it is) they acquired can effectively be passed to the correct version. Will you agree? and why?

Question: Can images acquire PageRank?

adder, the same image can be accessed via two types of internal links.
6:20 pm on Apr 24, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15313
votes: 707


Did you ever explain why a 301 redirect isn't an option? If any given image URL can exist either with or without a query string, redirecting to the query-less form should be trivial. And it will buy you some time to fix whatever is causing the spurious ?blahblah in the filenames, which obviously is the real problem.

:: idle query: what is the point of attaching parameters to a static file such as an image or stylesheet, other than to make sure the search engine knows you're using a CMS? ::
2:17 pm on Apr 27, 2015 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:Sept 26, 2012
posts: 52
votes: 0


We are not using any CMS! The website is built on ASP.Net framework. The parameter is needed to pull the images in website search results pages. So 301 will not be possible. I am having another discussion with the dev team to know if this parameter can be removed.
6:47 pm on Apr 27, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


The parameter is needed to pull the images in website search results pages.

Perhaps it is serving a different size of the image depending on the parameter? Since you do have a parameter in image URL, I am pretty certain there is some pre-processing of that image request and perhaps resizing on the fly or something alongside these lines rather than just grabbing the image from the location on the server. In that case I suspect that to remove the parameter, it would require for the image to exist in different resolutions/sizes on the server - which in essence does not save you anything since instead of having 2 URLs, one with the parameter and one without, you would have two different image URLs.


Recently due to some coding changes I have two types of internal links pointing to the same image. One with example.com/image.jpg and the other example.com/image.jpg&value=true

What are the coding changes you were refering to in your opening post?
Also, what happens if you request your image with example.com/image.jpg?value=true and without this parameter - do you get exactly the same image or perhaps something slightly different (resolution, size, borders, anything).

If you find out why exactly your developers need this parameter, i.e. what do they do when the request with this parameter arrive, then we can advise you better.
9:07 am on May 13, 2015 (gmt 0)

Junior Member

5+ Year Member Top Contributors Of The Month

joined:Sept 26, 2012
posts: 52
votes: 0


I am so sorry for the late reply. I didn't get an email about the reply from aakk9999.

Yes, aakk9999. What you are saying is true. It doesn't affect the size or anything but the meta data associated with the image. More about IPTC here [iptc.org...]

Anyway decision has been made to remove the parameters from the URL. :) And this is part of a major migration to a new CDN.

So the question left is how should we handle the request from Googlebot for the parameter URLs. We will be dealing that with a 301. Okay?

Thanks everyone for the help and advice.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members