lucy24 - 11:14 pm on Dec 20, 2011 (gmt 0)
I've been poring over posts that talk about the "via this intermediate link" line in gwt, trying to make sense of something that showed up in my own list recently. So far I haven't found any better explanation that "gwt has gone bonkers".
It's going to be a little tricky to give the necessary information without giving the wrong information, so bear with me.
Deep in the bowels of my site I've got a group of pages that are best described as an unauthorized mirror. My own personal Wayback Machine. The directory isn't included in the original site's robots.txt, but each individual page has the same "noindex, nofollow" meta tag. So the real pages are listed in neither google nor the (real) Wayback Machine.*
Recently a new batch of "pages that link to your site" showed up in gwt linking to a specific one of these mirrored pages. Close study suggests they are all the same page, and the owner needs to spend some time with the URL parameters section of gwt-- but I won't complain, because one of them was a "printer friendly" version that didn't require login. It's a plausible link-- except that all of them are listed as "via this intermediate link".
This is where I start getting suspicious about google's compos mentisness, because the "intermediate link" is, you guessed it, a page on the original mirrored site.
This site, too, has also showed up on gwt. A cluster of different pages, all ostensibly linking to my page, again "via this intermediate link". Of course they don't do anything of the sort; what they do link to is their own original version of the page. Which g### doesn't know about. (I checked with site: search.)
Everyone follow that?
According to google, a page on www.example.org links to a page on www.example.gov which in turn redirects to me.
Oh, yes, those not-really-linking pages. I don't have mirrors of them. (They're boring.) So my links lead to the real thing. Like all external links in this group of pages, they are flagged as "nofollow". The explanation is hidden somewhere in this detail, but trying to figure it out is giving me a headache :(
* This may be premature. Looking up my own site turns up nothing more recent than February, though they've been crawling regularly and have archived versions dating back to 2007. I know there is a wide range of opinions on the Wayback Machine. I, personally, like the idea. "Hm, is that really what I thought **** *** ** *** ****** meant back in August 2010?"