Forum Moderators: buckworks

Message Too Old, No Replies

Links coming from hreflang urls?

The links from hreflang urls are backlinks

         

onlinesource

11:28 pm on Aug 20, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



This issue may be a problem for a lot of webmasters, and not just those who operate eCommerce sites. This issue affects me and I am an eCommerce webmaster, so I am starting a thread here.

So, we currently use Magento 1.9 to manage multiple store views including our default .com site along with other international domain alternatives including .ca and co.uk. Right now, we have four in total.

I added hreflang tags to each page, showing alternative links to same pages/posts/categories on OTHER domains.

When I go into Google Webmaster Tools > Search Traffic > Links To Your Site, it shows the top three domains that have links to pages on my default .com site, which are in order: my .co.uk domain, my .ca domain and my .in domain.

1. Is it normal these links coming from hreflang tags are being considered backlinks? I mean, isn't this what "links to your site" are... a list of backlinks to my pages? None of these particular pages have hard links sending people from site A to B, all of the links are coming from hreflang tag codes. Is this acceptable?
2. What is strange is every site shares the same content minus 5 or few urls in between. That being said, according to Google Webmaster Tools, the .co.uk accounts for 10K links to my .com domain, the .ca site accounts for 5K links and the .in site accounts for less than 5K. If they all roughly share the same content and share the same hreflang tags, why would one domain account for nearly twice as many back links?
3. I've been told that too many back-links coming from one particular domain can hurt you because it appears unnatural. If that is true, doesn't having 10K from your top listed site, and less than half that figure on the second listed site, hurt me? I guess I see 10K links as being abusive. Third down is a domain with 509 links to my pages, although I believe those links come from a social media sharing module which generates urls to share content across Facebook, Twitter, etc.

I know of a few sites that use hreflag tags because of alternative domains they share content across. Are all those links from hreflang tag coding normally crawled and accounted for? Do I need to setup some some sort of noindex,follow rule for these alternative domains?

I didn't know if there was some sort of code I could add - maybe through robots.txt - telling Google to follow these links but to not give me backlink credit. Or am I overthinking this?

[s27.postimg.org...]

onlinesource

3:24 am on Aug 22, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



This is really bugging me. I just checked the source code of my co.uk domain and there are no back links from that domain, on any of it's pages, going to my .com site, yet Google Webmaster Tools says that my co.uk site accounts for nearly 10,000 of the "links to my .com site".

At one point, I did have all of my domains listed in the footer of each store, kind of like Amazon does by suggesting people visit their "other stores". This would make sense as far as what GWT reports, but those links have been removed as of six months ago or long before that! It's strange and the only way I can see Google finding backlinks from these domains, is if it counts hreflag tags and alternate links as back links? Does anybody know if this is normal for Google to see such links as back links?

lucy24

4:46 am on Aug 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does the phrase "via this intermediate link" occur anywhere in GWT? I've learned from experience that this sometimes means "via this intermediate figment of the imagination". It was especially noticeable when I moved sites in 2013; it took them forever-- say, maybe as much as a year-- to get the old and new URLs sorted out. In the meantime, they reported any number of Page A linking to Page B via some Page C which had never, in fact, existed at the same time as Pages A and B.

.co.uk accounts for 10K links to my .com domain, the .ca site accounts for 5K links and the .in site accounts for less than 5K

I suspect this has to do with how often they crawl .uk vs .ca vs .in, and how much weight they attach to each ... putting it in the Dog Bites Man category.

onlinesource

2:39 pm on Aug 22, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Well, I feel like a complete fool for not considering this before... but apparently according to "Fetch my Google", the last time the international domains were crawled (or fetched) was anywhere from four to fix months ago! Checking The WayBack Machine, I can see that around that time, we used to have links to our family of websites at the footer of each site. :) Those links are no longer there, but it's clear to me that Google is going off an old cache.

I just asked Google to fetch each domain again, hopefully spotting the problems. I don't know if this is the fix yet, but it likely is.

As far as "via this intermediate link", not sure where I would see that reference, unless you were referring to Crawl > Crawl Errors > Errors> Linked From

onlinesource

2:06 am on Aug 24, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Update: Within one day, the # of "links to your site" from one of my international domains, dropped by a 1,000! That's good news. I'd imagine with so many pages on each site, it will take some time for Google to index every new single url and adjust the link total.

Is there anyway to speed up the index process with Googlebot? Does changing a site or adding content help? Would something as simple as a notice bar, peek GOogle's interest and encourage them to visit my site more?

not2easy

2:28 am on Aug 24, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Sometimes it helps to draw their attention if you submit a new sitemap. I have seen cases where they show the correct sitemap name but it says something like "Fetched Sept. 21, 2014, processed Aug. 19th, 2015" like they are using cached copies rather than getting new sitemaps. Things change. If you click on the sitemap it goes to a new page where you can see the date and submit a new sitemap.

onlinesource

3:09 am on Aug 24, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Usually I just go to my website and update my sitemap, which updates the links listed at /sitemap.xml. I then tell Google Webmaster Tools to resubmit that sitemap url.

Never heard of what you are suggesting.

Right now my sitemap says:

<url>
<loc>http://www.domain.tld/</loc>
<lastmod>2015-08-22</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>

So you suggest that I change the name of my sitemap from sitemap.xml to something like sitemap1.xml and then submit sitemap1.xml to Google?

not2easy

3:52 am on Aug 24, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



No, not at all. In GWT go to the sitemaps section for your domain. Click on the name of the sitemap listed there that you want to have Google check for new URLs or page changes. There you can see the date they last fetched that sitemap and see if you think it is time to resubmit that sitemap. If they have not recently fetched that sitemap, you can resubmit it there. It will show status "Pending" until they update their information.

I'm not suggesting you change your sitemaps, just check to see that Google is using the current version(s) of your sitemap(s) to crawl.

onlinesource

2:45 pm on Dec 7, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



It's been a long time since I addressed this issue. Going back over the weekend to Google Webmaster Tools, I noticed that Google was indexing all of the urls in the hreflang tags, that were inside my body. I don't understand why this was happening? I see other sites that use hreflang tags, but obviously Google had an issue with how mine were setup because I was getting errors like "no return tag" under "International Targeting" in GWT which made me believe that Google didn't recognize them as legit hreflang tags and instead saw them as backlinks, which helped more than hurt. I'm not sure how important having hreflang tags are? For the time being, what I did was installed a GEO IP redirect with my shopping cart, sending visitors to their appropriate store. I figured this would help more?

Right now, have four domains or four stores that share the same shopping cart. A .ca store, .co.uk store, .in store and a .com store for all other countries. In Google Webmaster Tools, I setup the international targeting for each store, and obviously it defaults the .ca domain to Canada, .co.uk to United Kingdom, etc. I can't really say that the .com store is for EVERYTHING ELSE so it defaults to ALL and "all" includes Canada, UK, and India, which I believe has potentially caused duplicate content?

What I've done now is left everything the way it is in GWT. As I said before, International Targeting still shows .ca for Canada, .co.uk for UK and .in for India. .Com defaults to ALL. But I've installed a GEO IP redirect and now all Canada visitors go to .ca, all India visitors go to .in, all UK visitors go to co.uk and everything left over goes to .com. I'm hoping this way the Canadian Googlebot (assuming that exists?) never sees the other sites and therefore won't even worry about the other sites and what they do.

That being said, I have no need for hreflang tags, so I removed them completely.

I just re fetched my entire list of sites with Google Webmaster Tools, but strange enough, I am still seeing a ton of "Links to my Site" coming from my international domains. In other words, the .com site still says that 12,000 links are coming from the .co.uk domain. When I pull up a list of links that it has cached on my .com site, they are all generic links like /cgi-in/ search results or sitemap pages, links that I have since listed as "nocache" because I figured Google was caching too many unnecessary pages and devaluing the important pages.

The problem now is, since Google can't cache those pages again because of the "nocache" restrictions, the last cache with the hreflang tags is sitting in limbo! What can I do? Do I have to remove the noarchieve header tag temporarily? I would like the Links to My Site section to get back to more respectable numbers with the majority of incoming links, actually showing from legit sites and not my own sites. I think it looks bad when I have 12,000 links coming from my own international domain and the next site that isn't mine, is 200 links.

not2easy

3:16 pm on Dec 7, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the hreflang tags, that were inside my body
Those are "meta tags" they do not belong in the body of pages, they should be found before the </head> tag on your pages. The hreflang tags are for robots, not people and should include rel="alternate" for proper usage. Read Google's How-to: [support.google.com...]

onlinesource

3:28 pm on Dec 7, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I apologize, they were in the header. I had them setup as below:

<link rel="alternate" href="http://www.mystore.ca/" hreflang="en-ca"/>
<link rel="alternate" href="http://www.mystore.in/" hreflang="en-in"/>
<link rel="alternate" href="http://www.mystore.com/" hreflang="en"/>
<link rel="alternate" href="http://www.mystore.co.uk/" hreflang="en-gb"/>

The way they were formatted looked good to me. Not sure why Google was indexing these urls on every page but they were. It was causing a HUGE headache! I would go into "Links to Your Site" in GMT under the .com domain and it would say, "Who links the most?" 12,000 links coming from the .in store!?! Why? Unless it was reading the hreflang tags as actual links, because there was no other places on the pages being indexed with a link from the international domains other than the alternate links found in the hreflang tags. Clearly Google was reading them as actual links, even if they shouldn't have. So I had to do something. I got rid of the hreflag tags and now I'm rely on Intentional Targeting settings and the ip Redirect. Hoping this weeks.

In the time being, I would like Google to go back and recache some of the old pages that it once spotted the hreflang tags on but since those pages are no set to nocache, I don't see how they would ever return? Trying to figure out the best way to fix this?