homepage Welcome to WebmasterWorld Guest from 54.167.173.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
What does huge growth of "Not Selected" in WMT Index Status mean?
1script




msg:4497476
 7:04 pm on Sep 19, 2012 (gmt 0)

I've got a site that shows an explosive growth of the "Non Selected" URLs (the green graph on the Index Status->Advanced page). It started off at about 1/2 of the amount of indexed URLs last year, then ran almost exactly at the amount of indexed for almost 6 months and then the graph just took off and is now at twice the amount of the indexed URLs, last week showing the biggest weekly jump ever.

What do you guys think can be inferred from this? Is this "just" a huge waste of the crawling budget or is there something seriously wrong with the site that causes Google to ignore 2/3rds of the URLs it "thinks" it has. I wish they would indicate what was the reason the URLs were ignored.

Google's definition of Not Selected:

Not selected: Pages that are not indexed because they are substantially similar to other pages, or that have been redirected to another URL


There haven't been any URL structure changes during the period reported (although there were some before that), so there would not be an influx of 301s. Does Google just recall old URLs from time to time and simply adds them the the tally? Or invent their own?

Has anyone ever used the info to troubleshoot a site? I would appreciate any insight on this. Thanks!

 

lucy24




msg:4497511
 9:28 pm on Sep 19, 2012 (gmt 0)

Is your number of indexed pages about the same? Is there a corresponding jump in the number of pages "ever crawled"?
Does Google just recall old URLs from time to time and simply adds them the the tally? Or invent their own?

Invented pages can't count here, because if it can't find the page it can't crawl it. Old URLs, yes. This is a pretty new wmt category-- only goes back, what, a month or two?-- so they're still pulling in information. The "ever crawled" category also includes pages you've moved and/or renamed. So anything with the pattern

oldname > 301 > newname

will eventually settle down to
pages ever crawled: 2
pages indexed: 1
pages not selected: 1

1script




msg:4497519
 9:44 pm on Sep 19, 2012 (gmt 0)

Thanks, Lucy24. The "ever crawled" also goes up but not at the same rate. "Ever crawled" used to be 5 times more than the sum of "indexed" and "not selected" . Then, over time, it came closer to the sum, although still not quite there yet (still about 10% higher than the sum of all other types).

I think I should just chalk it up the data not fully in.

It is unfortunate that they do not differentiate between internal duplicates and 301 redirects which could be counted as "internal duplicates, rectified". In other words, it would be very useful to see how many internal duplicates exist and how many of those were fixed, in case you're working on eliminating duplicates.

Let's hope this tool will evolve into something more useful!

g1smd




msg:4497556
 11:18 pm on Sep 19, 2012 (gmt 0)

I've got "ever crawled" figures at 80 to 100 times the "indexed" number on sites with major tehnical duplicate content issues that have been cleaned up.

Ethansocial56




msg:4497566
 12:31 am on Sep 20, 2012 (gmt 0)

My site is having problem now. It was difficult to get crawled by Google. I have only minimal backlinks.

Sgt_Kickaxe




msg:4497585
 3:05 am on Sep 20, 2012 (gmt 0)

I had an issue with this on a wordpress site and a change I made reduced it by nearly 100%. I am not sure you are experiencing the same problems or not but have a read - [webmasterworld.com...] I go into a bit more detail in the 4th post and posted the unexpected GWT related results in the 10th.

-

Mod's note: URL for copying into browser, as WebmasterWorld redirect script breaks the hash character in the above link...

http://www.webmasterworld.com/google/4494520.htm#msg4497583
.

[edited by: Robert_Charlton at 4:08 am (utc) on Sep 20, 2012]
[edit reason] added url [/edit]

juiker




msg:4500388
 9:21 pm on Sep 26, 2012 (gmt 0)

I have this with at least two sites where the number of indexed pages has dropped signifcantly. I think it's a Panda issue doesn't anyone else? When GMT tells you that pages are "substantially similar to other pages" that sounds like Panda to me. I could be wrong. I'm still investigating, though.

1script




msg:4500709
 4:09 pm on Sep 27, 2012 (gmt 0)

Found another peculiarity about these Index reports: one of my sites has Ever Crawled *lower* than the sum of Indexed and Not Selected. How's that even possible?

If I add "Blocked by robots[.txt]", I am now having "Ever Crawled" 1.5 times less than the sum of the other types. And I thought "Ever crawled" should be definition be more than the sum of the other types.

So, is this an evidence of Google truly inventing non-existing URLs (perhaps based on query strings) or is this just a technical counting error?

Is it possible that pages crawled by other means (image bot, adsense bot) are added to the index but don't make it into the "Ever crawled" figure, which could be based on traditional Googlebot visits only (just a conjecture)? It would be a silly omission of course but Google engineers are also people, have to keep that in mind :)

Pjman




msg:4500720
 4:34 pm on Sep 27, 2012 (gmt 0)

I would have to say this to me is Panda data. Both of my Panda eaten sites have 3 to 4 times more "Not Selected" than indexed pages.

All my sites that have minimal "Not Selected" pages were untouched by Panda.

aakk9999




msg:4500785
 6:29 pm on Sep 27, 2012 (gmt 0)

From what I could gather from the sites I am looking after, the pages you noindex contribute to the "Not Selected" total.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved