Is your number of indexed pages about the same? Is there a corresponding jump in the number of pages "ever crawled"?
|Does Google just recall old URLs from time to time and simply adds them the the tally? Or invent their own? |
Invented pages can't count here, because if it can't find the page it can't crawl it. Old URLs, yes. This is a pretty new wmt category-- only goes back, what, a month or two?-- so they're still pulling in information. The "ever crawled" category also includes pages you've moved and/or renamed. So anything with the pattern
oldname > 301 > newname
will eventually settle down to
pages ever crawled: 2
pages indexed: 1
pages not selected: 1
Thanks, Lucy24. The "ever crawled" also goes up but not at the same rate. "Ever crawled" used to be 5 times more than the sum of "indexed" and "not selected" . Then, over time, it came closer to the sum, although still not quite there yet (still about 10% higher than the sum of all other types).
I think I should just chalk it up the data not fully in.
It is unfortunate that they do not differentiate between internal duplicates and 301 redirects which could be counted as "internal duplicates, rectified". In other words, it would be very useful to see how many internal duplicates exist and how many of those were fixed, in case you're working on eliminating duplicates.
Let's hope this tool will evolve into something more useful!
I've got "ever crawled" figures at 80 to 100 times the "indexed" number on sites with major tehnical duplicate content issues that have been cleaned up.
My site is having problem now. It was difficult to get crawled by Google. I have only minimal backlinks.
I had an issue with this on a wordpress site and a change I made reduced it by nearly 100%. I am not sure you are experiencing the same problems or not but have a read - [webmasterworld.com...] I go into a bit more detail in the 4th post and posted the unexpected GWT related results in the 10th.
Mod's note: URL for copying into browser, as WebmasterWorld redirect script breaks the hash character in the above link...
[edited by: Robert_Charlton at 4:08 am (utc) on Sep 20, 2012]
[edit reason] added url [/edit]
I have this with at least two sites where the number of indexed pages has dropped signifcantly. I think it's a Panda issue doesn't anyone else? When GMT tells you that pages are "substantially similar to other pages" that sounds like Panda to me. I could be wrong. I'm still investigating, though.
Found another peculiarity about these Index reports: one of my sites has Ever Crawled *lower* than the sum of Indexed and Not Selected. How's that even possible?
If I add "Blocked by robots[.txt]", I am now having "Ever Crawled" 1.5 times less than the sum of the other types. And I thought "Ever crawled" should be definition be more than the sum of the other types.
So, is this an evidence of Google truly inventing non-existing URLs (perhaps based on query strings) or is this just a technical counting error?
Is it possible that pages crawled by other means (image bot, adsense bot) are added to the index but don't make it into the "Ever crawled" figure, which could be based on traditional Googlebot visits only (just a conjecture)? It would be a silly omission of course but Google engineers are also people, have to keep that in mind :)
I would have to say this to me is Panda data. Both of my Panda eaten sites have 3 to 4 times more "Not Selected" than indexed pages.
All my sites that have minimal "Not Selected" pages were untouched by Panda.
From what I could gather from the sites I am looking after, the pages you noindex contribute to the "Not Selected" total.