| This 34 message thread spans 2 pages: < < 34 ( 1  ) || |
|"Not Selected" URLs In GWT Too Large -> How Do I Fix Them?|
| 11:24 am on Nov 25, 2012 (gmt 0)|
On September 4-5, our website (blog+forums) traffic went down by about 50%. Since then I've been investigating the cause. I've in the last two months, I've reverted most of the things that could have gone wrong.
Recently I discovered that in GWT, the 'Index Status' shows that we've a HUGE number of 'Not Selected' URLs. I've been told that this could be one of the main reasons our traffic went down. I'm attaching a screenshot of how it looks in GWT-
Few more points that might be useful -
2. I've had a tag auto-link plugin that would find tag keywords on our wordpress blog and link them to relevant tag pages. I've disabled it 3 days ago.
3. Our website URL is - example.com and the forums reside on example.com/community/ . Our community traffic seems to be affected the most.
I'd really appreciate your inputs on fixing the 'Not Selected' URLs.
[edited by: goodroi at 11:32 am (utc) on Nov 25, 2012]
[edit reason] Please no URLs [/edit]
| 5:23 pm on Dec 4, 2012 (gmt 0)|
Well, hopefully, yours is one of the unusual cases where it does, because if they don't discount the quality of your site but rather allow the 'thinning' of PageRank passed, visitor behavior and 'other secondary/indirect signals' to indicate your site is 'not a good choice' because of the broken links you have, it could definitely be more difficult to overcome.
In other words: People with broken links, especially ones that frustrate visitors when they find them and click, may have a much bigger battle to getting their site back where it was.
Them 'not discounting' a site with broken links on it is not necessarily a good thing, in fact, it could range from 'bad' to 'really really bad' for sites with broken links in many cases, because 'just fixing the links' may not solve the problems created by other signals sent and directly used for ranking purposes.
There are actually a bunch of possibilities for 'influence' on rankings from broken links he doesn't specifically address, and I wouldn't expect him or anyone to sit and explain all of them to you, one, because there are a bunch, and two, because if it's someone from Google, they could easily 'give away too much' by listing everything that Could be influenced and how.
I'm really not sure I understand the point of all this...
We're talking to you from years of experience and knowledge, not 'just making things up' or trying to 'lead you down the wrong road', but if you really believe broken links will not harm your rankings, whether directly or indirectly through other 'signals' and 'factors', then leave them broken.
There's no way I'm going to go into exactly how and all the different ways broken links can influence your rankings, because, quite frankly, you're not paying me anywhere near enough and I have other things to do.
| 7:17 pm on Dec 4, 2012 (gmt 0)|
I guess the shorter version of my thoughts after reading John's reply are:
WOW! Having broken links on your site could be way worse than I initially thought...
| 4:08 am on Dec 7, 2012 (gmt 0)|
Way, way, way worse TMS.
Ultimate Solution: GWT says "hey, here's a complete list of all urls we know about on your site that do not return 404/410 codes, yo" combined with "hey again, here are pages on your site with internal links that seem to be broken". Unfortunately that's not likely to happen, some sites are way too big.
Cutts Confusion: Matt, and several others, suggest that if someone is linking to a url on your site but they get it wrong, miss a character or whatever, that it's ok to redirect the broken url to the good one. Matt says this is the easiest way to capture incoming "juice" and plugs GWT as a great tool for finding such broken incoming links.(in a youtube video from a couple years ago, I don't know if his position has changed since)
On the flip side, spammy webmasters rather enjoy redirecting tons of pages to a url in order to get it to rank more highly and Google takes action against them. GWT now also reports them as "not selected" which they describe as "URLs from your site that redirect to other pages...".
So which is it? Personally I feel that a redirect now tells Google "hey, our site really does have both of these urls" and that it's not as benign as it once was, if you have too many of them.
Example of a problem: Many webmasters running wordpress don't know that if you add ".." or "--" to the end of a url that the page will render just fine with that url unless you block this in htaccess. Wordpress also redirects partially formed urls to the correct page on a best guess basis. While this is standard behavior you can expect your friendly neighborhood spammer to know this too and kindly create a bazillion duplicate urls for you intentionally, simply by linking to those pages from his spam-city-network.
Canonical is great but doesn't solve the problem that Google now thinks you really do have a lot of extra urls, they aren't returning 404 or 410. Is that a big deal? probably not, in fact almost certainly not, unless you're the incredibly unlucky webmaster among us who seems to get devalued by this type of thing which *some* undoubtedly will, the algo is a machine after all.
My experience: I had a small 300 page blog that Google thought had over 8000 pages, 7700 of those "not selected". I made a 100% static copy of the site, ditched the CMS, ditched the affiliate redirects I had on 5-6 pages, made a sitemap detailing which 300 pages actually exist and reduced my .htaccess file to near nothing so that very little could be redirected.
My result: Thousands of error messages in GWT for pages that I didn't want in existence anyway but, eventually, no "not selected" either.
Note: I don't think Google visits all of the urls in question so on page elements like canonical may not work, they seem to just ping for a header response without requesting the page. Fix those headers too.
Panda Concern: Several top SEO sites recommend removing low quality and duplicate content in order to minimize your exposure to Panda, your CMS is likely not helping you by creating pages you don't even know about and that Google doesn't report, unless they disapear. Without seeing them in a 404 report or by watching your server logs closely you can't possibly minimize how many you have. I now avoid redirects if at all possible, if a page isn't mean to exist I want a 404 and if Google has found it and I want it gone I prefer 410.
I won't even get into Google's testing of fictitious urls at random just to see what they get.
| 5:04 am on Dec 7, 2012 (gmt 0)|
I'm going to write a case study if my traffic restores after those 404s come down significantly. I'm currently at 34k and thankfully Google's giving me a few surprise drops in the count. Day before yesterday, Google dropped the error count by about 8k.
| This 34 message thread spans 2 pages: < < 34 ( 1  ) |