joined:Sept 13, 2004
First I'll say, for this particular site, every page links to the home page three times; two links with the text "Home" and one link with a keyword free, generic, brand name for the site. Why? Obviously for the convenience of the visitor. (Is this considered bad practice these days? Linking to Home?)
Webmaster tools (WMT) and a Google quirk.
Using Webmaster tools, "Search Traffic", "Internal Links", then clicking on the report for the Home Page of the site; this report shows more pages than exist on the site, as internally linking to the site. How many more? Very close to the number of removed pages that 301 redirect to a page on the site (whether the page exists or not).
I know Webmaster Tools has many quirks, but I believe when it is apparent that "tools" must be pulling data directly from Google's databases one has to believe the data is accurate. Everything in the story below is validated by actual log content from the site. So when I say a 301 was returned, that is what was reported by logs. Logs have been kept for this site since mid 2004.
A story of a couple 301 redirects.
Once upon a time (sometime before 2009) there was a page named a-b-c.htm, it was renamed A-B-c.htm (for reasons forgotten) and the original was redirected with a 301 code in htaccess. This 301 redirect is still in the htaccess today. There have been no links to a-b-c.htm on the site since 2009. The target page for the redirect, A-B-c.htm was removed from the site and a 410 GONE was reported for the page; this was done at the end of June 2013. Googlebot crawled the a-b-c.htm page two more times, was redirected to A-B-c.htm, where a 410 GONE was returned. Other bots still do crawl a-b-c.htm. I know some bots seem to have trouble with case (they ignore it!).
Googlebot has not crawled a-b-c.htm since Jul 7th 2013.
YET to this day the page a-b-c.htm (gone since 2008) is still reported as in internal page linking to the home page. Oddly, the page A-B-c.htm is not reported as an internally linked page today. Other pages (which no longer exist), with proper 301 redirects, are listed in the WMT internally linking pages report and even show a appropriate preview of the page redirected to. And in fact, the number of pages the internal links report indicates, is the actual number of pages on the site, plus, all the pages that are now 301 redirected to other pages. Google probably does keep track of these old (non-existent) pages to make sure the redirects aren't abused in some way. I suppose the person that designed the WMT "internal links" report may not have realized this database contained this basically outdated information when considering the "Internal Links" perspective. But then one also has to question; Is Google actually considering these non-existent pages and links? It's certainly likely Google has archived these old pages.
The Google site: command intermittently corroborates this incorrect internal links page count from WebMasterTools. If the site: command is used on this site, typically the number of pages reported is fairly accurate, but randomly, the number of pages reported for this command approximates the number of pages indicated by the internal links report. It's not something I can reproduce, but I have seen it.
My fix for this will be to set up a 410 Gone return for all these pages (they are GONE), and then, to make sure Google eliminates them, I will link internally to these non-existent pages until Google attempts to crawl them at least 3 times. 410 GONE does seem to reliably stop Google from crawling a page. But my goal is having these pages truly disappear from the WebMasterTools report.
Also, I just wanted to pass this observation about 301'd pages on.
P.S. I'm practicing run on sentences with BIG words and acronyms, I hear Google considers these at least intermediate? Hey, hey.... It's also astonishing how many pages on the web with virtually no content are "Advanced". But I digress.