homepage Welcome to WebmasterWorld Guest from 54.161.246.212
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
301'd pages last forever as WMT internal linking pages
bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4638368 posted 7:28 pm on Jan 18, 2014 (gmt 0)

First I'll say, for this particular site, every page links to the home page three times; two links with the text "Home" and one link with a keyword free, generic, brand name for the site. Why? Obviously for the convenience of the visitor. (Is this considered bad practice these days? Linking to Home?)

Webmaster tools (WMT) and a Google quirk.

Using Webmaster tools, "Search Traffic", "Internal Links", then clicking on the report for the Home Page of the site; this report shows more pages than exist on the site, as internally linking to the site. How many more? Very close to the number of removed pages that 301 redirect to a page on the site (whether the page exists or not).

I know Webmaster Tools has many quirks, but I believe when it is apparent that "tools" must be pulling data directly from Google's databases one has to believe the data is accurate. Everything in the story below is validated by actual log content from the site. So when I say a 301 was returned, that is what was reported by logs. Logs have been kept for this site since mid 2004.

A story of a couple 301 redirects.

Once upon a time (sometime before 2009) there was a page named a-b-c.htm, it was renamed A-B-c.htm (for reasons forgotten) and the original was redirected with a 301 code in htaccess. This 301 redirect is still in the htaccess today. There have been no links to a-b-c.htm on the site since 2009. The target page for the redirect, A-B-c.htm was removed from the site and a 410 GONE was reported for the page; this was done at the end of June 2013. Googlebot crawled the a-b-c.htm page two more times, was redirected to A-B-c.htm, where a 410 GONE was returned. Other bots still do crawl a-b-c.htm. I know some bots seem to have trouble with case (they ignore it!).
Googlebot has not crawled a-b-c.htm since Jul 7th 2013.
YET to this day the page a-b-c.htm (gone since 2008) is still reported as in internal page linking to the home page. Oddly, the page A-B-c.htm is not reported as an internally linked page today. Other pages (which no longer exist), with proper 301 redirects, are listed in the WMT internally linking pages report and even show a appropriate preview of the page redirected to. And in fact, the number of pages the internal links report indicates, is the actual number of pages on the site, plus, all the pages that are now 301 redirected to other pages. Google probably does keep track of these old (non-existent) pages to make sure the redirects aren't abused in some way. I suppose the person that designed the WMT "internal links" report may not have realized this database contained this basically outdated information when considering the "Internal Links" perspective. But then one also has to question; Is Google actually considering these non-existent pages and links? It's certainly likely Google has archived these old pages.

The Google site: command intermittently corroborates this incorrect internal links page count from WebMasterTools. If the site: command is used on this site, typically the number of pages reported is fairly accurate, but randomly, the number of pages reported for this command approximates the number of pages indicated by the internal links report. It's not something I can reproduce, but I have seen it.

The FIX

My fix for this will be to set up a 410 Gone return for all these pages (they are GONE), and then, to make sure Google eliminates them, I will link internally to these non-existent pages until Google attempts to crawl them at least 3 times. 410 GONE does seem to reliably stop Google from crawling a page. But my goal is having these pages truly disappear from the WebMasterTools report.

Background

One of the things that precipitated this effort is Panda (what a surprise). This site used to have separate pages for various policies, privacy, affiliations, contact, terms of service, terms of use, abuse reports, links to site, disclaimer, etc. This seemed like the best structure for the visitor, but, given Panda's reported penchant for disliking pages that seem to be present just to present more keywords to search engine deities, I chose to consolidate all these pages into one big page with a little Table of Contents at the top. All these various policy pages were removed and 301'd to this one new contact/policies page. Actually it will probably be easier to maintain. Better for the visitor? Maybe? Maybe not? BUT, Google WMT's still lists all these removed pages, as pages internally linking to the home page, even though they have been gone 5 months. The 301's are still in place and the WMT linked previews for the non-existent pages, properly show the new consolidated page. One of the things I've noticed with Panda is some sites appear to be on the "hairy edge". One new page with just the wrong content and boom traffic drops 70%. So I'm working on insurance.
Also, I just wanted to pass this observation about 301'd pages on.

P.S. I'm practicing run on sentences with BIG words and acronyms, I hear Google considers these at least intermediate? Hey, hey.... It's also astonishing how many pages on the web with virtually no content are "Advanced". But I digress.

 

tangor

WebmasterWorld Senior Member tangor us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4638368 posted 9:04 pm on Jan 18, 2014 (gmt 0)

It has been my expereince (since 1996) that no search engine (and I mean ALL OF THEM) ever forget a url they have crawled.

Whether it is an accident that these old urls get back into the crawl, or deliberate, that I can't say, but they are definitely still in their index. I can say that pages GONE 410'd "way back when" are still, from time to time, (re)appearing in my logs.

I don't think there is any way to make them go away.

Conversely, on the side of the search engine (any of them), I wouldn't take a website's word that the page was REALLY gone... it might come back and then what?

These days I just ignore it, keep the site(s) clean and move on.

bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4638368 posted 12:11 am on Jan 19, 2014 (gmt 0)

I have no problem with Google remembering all these pages, but what is important is these non-existent pages are still considered by webmaster tools, and Google, as pages that "internally" link to the home page of the site; this is not the case!
In one case, this has not been true since 2008, in the others, since at least 5 months ago.
Is this a bug?
Of course there are many bugs, but the only way to eradicate them is to point them out.
Publishing them (bugs) in WebMasterWorld is more effective than contacting Google directly.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4638368 posted 1:26 am on Jan 19, 2014 (gmt 0)

there was a page named a-b-c.htm, it was renamed A-B-c.htm (for reasons forgotten) and the original was redirected with a 301 code in htaccess. This 301 redirect is still in the htaccess today. There have been no links to a-b-c.htm on the site since 2009. The target page for the redirect, A-B-c.htm was removed from the site and a 410 GONE was reported for the page

Now, wait a minute, that's another version of the redirect chain.

a-b-c >> A-B-c
A-B-c >> 410

should be replaced by

a-b-c >> 410
A-B-c >> 410

in parallel. Or, if you prefer, (a-b|A-B)-c >> 410

"via this intermediate link" is weird. I've talked about this elsewhere in the context of moving sites. They will say both
onetwothree "via this intermediate link" fourfivesix
and
fourfivesix "via this intermediate link" onetwothree

even though fourfivesix has never redirected to onetwothree. Only in the other direction.

it might come back and then what?

Then there would be newly discovered links to it, wouldn't you think? But I do realize that a search engine's mind does not work like yours and mine.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4638368 posted 10:52 am on Jan 19, 2014 (gmt 0)

If they are being redirected to pages that no longer exist, then the redirect can't work.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4638368 posted 11:00 am on Jan 19, 2014 (gmt 0)

A redirect doesn't mean you go there. It only means the browser is instructed to make a new request. There's no prior information about whether the new request will be any more successful than the old one.

But if A redirects to B, and then later B is removed, then the 410 should be served at both A and B.

bumpski

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4638368 posted 11:42 am on Jan 19, 2014 (gmt 0)

But if A redirects to B, and then later B is removed, then the 410 should be served at both A and B.

I agree entirely. This one case was simply an oversight, one that I imagine happens to webmasters frequently. But should Google report this page, a page that has been gone for 5 years, as a page that links internally to the site?

In addition to this one unusual case, correctly 301 redirected pages have been in this report for 5 months now. (I never really looked this deep into this report before)
I'd be interested to know if anyone else sees pages in this report that no longer exist (for a long period), listed as pages linking internally to another page.

Looking at this report:
Webmaster tools, "Search Traffic", "Internal Links", then clicking on the report for the Home Page of the site;

Should Google be listing non-existent pages as internally linking to other pages?
I'll certainly be fixing all of these cases with 410's, but most are just properly 301 redirected pages.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4638368 posted 2:40 pm on Jan 19, 2014 (gmt 0)

A redirect doesn't mean you go there

Well it means that the browser will try to go there. But in this case there's nowhere to go.

But this is a question of semantics. I'm sure that we both understand how it works.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved