homepage Welcome to WebmasterWorld Guest from 54.196.196.72
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Webmaster Tools - unrealistically high "Ever crawled" number
cmendla




msg:4478634
 3:14 am on Jul 25, 2012 (gmt 0)

I am exploring the webmaster tools a bit. In the index status/advanced I noticed a very high number for 'ever crawled'.

IOW,

Site size approx 60 pages
approx 47 pages indexed
not selected 68
blocked by robots 0
ever crawled 590,000 +
crawl error approx 8 per day


The site was built in frontpage (I know, i'm migrating my sites to joomla).. The site had adsense, other affiliates and had a forum at one time.

My questions are

1. Am I correct in assuming that the 592,000+ 'ever crawled' could mean that that number of pages was crawled at one time in the past.
2. Is this a problem? if so, where else do I look.?


I'd really appreciate any thoughts...

thanks

chris

 

tedster




msg:4478645
 4:18 am on Jul 25, 2012 (gmt 0)

Crawled seems to mean "requested" - so that would include even 404 responses. Unfortunately, clicking on the "learn more" link results in a Page Not Found message, so we can't really tell, officially at least.

I can tell you that almost every site I check also has impossible numbers here unless 404 responses are part of the picture. However, your number is a lot more impossible than the ones that I see.

grippo




msg:4478646
 4:20 am on Jul 25, 2012 (gmt 0)

  1. 592,000+ pages were crawled in all the time (history of your site).
  2. Mybe the problem is that you don't have a clue about where they came from. How many urls return the site:example.com command?

g1smd




msg:4478676
 7:00 am on Jul 25, 2012 (gmt 0)

It appears to be every URL crawled whether valid or not. It includes all status codes returned.

I see a huge 'ever crawled' number on a site that previously had 'infinite duplicate content', but now that site shows just a few thousand URLs indexed. Interestingly that figure is approx 10x the number that appears for the site: search.

spunkle




msg:4478678
 7:26 am on Jul 25, 2012 (gmt 0)

"ever crawled" data is cumulative over the life of the site which is why the numbers are so high.

deadsea




msg:4478712
 9:55 am on Jul 25, 2012 (gmt 0)

Another possible gotcha with the data in this graph: the "not selected" number appears to include pages that used to exist but which now 301 redirect to another page.

I know this because one of my sites with 40,000 pages had 2 pages about each topic. I combined into one page about each topic and 301 redirected so that I have 20,000 pages on the site. As Googlebot found these redirects over the course of a few months, the "indexed" line fell from 40,000 to 20,000. However, the "not selected" line grew from 1,000 to 20,000 over the same period. A mirror image of the "indexed" line.

I wouldn't expect 301 redirects to be considered pages that were "not selected" for the index. I would expect "not selected" to be actual pages.

lucy24




msg:4478858
 7:31 pm on Jul 25, 2012 (gmt 0)

clicking on the "learn more" link results in a Page Not Found message

Which one? The 'learn more' link from the question mark next to "total indexed" on Index Status, or in the grey text under the graph (same link) currently takes me to

https://support.google.com/webmasters/bin/answer.py?hl=en&answer=2642366
(emphasis mine)

The Basic tab displays the following data:

* Ever crawled: The cumulative total of URLs on your site that Google has ever crawled. Not all crawled URLs get indexed, and Google may discover some URLs by other means such as inbound links from other sites. This number should increase over time as new pages are added to your site.
* Total indexed: The total number of URLs currently in Google's index. These URLs are available to appear in search results, along with other URLs Google may discover by other means. This number will change over time, as new pages are added and indexed, and old pages are removed. The number of indexed URLs is almost always significantly smaller than the number of crawled URLs, because it does not include URLs that have been identified as duplicates or non-canonical, or less useful, or that contain a meta noindex tag.


Apparently I don't rate an "ever crawled" because all I get is "total indexed" ::sob::

Hm. Wonder where those 29 pages went? (Difference between highest point on graph, and current number.) Maybe they got stolen by That Other Search Engine; their Total Indexed has been going back up.

"More information" for "not selected" takes you in turn to

https://support.google.com/webmasters/bin/answer.py?hl=en&answer=139066

"Less useful" is an infuriatingly fuzzy phrase isn't it?

realmaverick




msg:4478875
 8:54 pm on Jul 25, 2012 (gmt 0)

Mine is 40 million. I need to take a closer look and get my head around the figures.

serpsup




msg:4478885
 9:33 pm on Jul 25, 2012 (gmt 0)

@lucy24 : The "Ever Crawled" is under the "Advanced" tab.. not sure if you have that or not but it should be right above the Total Indexed graph on the Index Status section of WMT.

lucy24




msg:4478906
 12:39 am on Jul 26, 2012 (gmt 0)

Aha. Just when I'd got used to the idea that, in googlespeak, "advanced" means "show percentage change". Wrong page.

Does every single one of the "learn more" links lead to the same page?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved