joined:May 11, 2012
I really struggled to find the best title for this post. Feel free to update it. This problem has been bugging me for almost a year. I've asked many people for help, but most are lost for ideas.
I've noticed when I submit urls from my eCommerce store to my Google Plus page (well, Facebook too), including actual pages, categories, products, etc, it crawls content from the ENTIRE page including that in the header, footer, and sidebar and not just the content within the main body of the page, category, products, etc (the content that is unique to each specific url).
For instance, if I submit a url to a product page, it pulls up a list of suggested images it accessed on my product page, that I can include with the link. The problem is those images are often grabbed from the sidebar, header and footer and don't relate specifically to the product page. I'm talking about images such a how we ship using UPS, that we accept Visa credit cards, etc. The images are generic, so why does Google Plus or Facebook zone in on them? When I submit product urls from competitor sites (just to test things), only one image comes back in the list... the product icon (the only image that should) and other images such as shipping and payment banners are overlooked as well they should be.
I'm going this isn't normal and something is definitely wrong because if it wasn't, my competitors would be dealing with the same headache. To me, if Google (well, especially Google) is seeing every page as a whole, doesn't that mean page A is no different than B and therefore, duplicate content is a potential hazard?
I have used Fetch as Google in Google Webmaster Tools, and Google seems to crawl my page with no errors but my question is, should sidebars and headers be formatted a certain way, so that that bots can focus on the main body or main content of a specific url? If so, where can I learn more about this formatted or test my theme to make sure it is structured properly?
I look forward to hearing from you all.
UPDATE: I did notice that when Facebook crawls my site, it returns a 206 response code. I'm trying to this fix this product now, but believe it relates to a scripting problem which cause no IPv6 present for my domain. I tried to enable and disable hotlink protection but it did not correct the problem. I'm hoping once I figure out how to return a 200 response code, I may see a difference with Facebook results at least.