|Sitemaps says low urls indexed (1420) - site: says 2000 |
| 4:31 pm on Jul 18, 2008 (gmt 0)|
> Sitemap stats
> Total URLs: 3608
> Indexed URLs: 1420
Why is it so slow? It indexes like 100 new URLs every few weeks. It starts feeling pointless to add new content at this point when it is this far behind. And yes, all those pages are actual original content and not "fluff" pages.
What am I supposed to do?
| 7:28 pm on Jul 18, 2008 (gmt 0)|
Keep building a good site. Only by attracting more links and more visitors will you see better and faster indexing, so if you stop - then what you see is all you get.
with regards to the discrepancy, if site: shows 2000+ urls, then most likely that's at least the total Google has indexed., and it may be more. These reporting functions are often flakey.
| 7:38 pm on Jul 18, 2008 (gmt 0)|
I would recommend avoiding trying to get URLs indexed via a sitemap alone (i.e. without sufficient numbers of incoming links to support that volume of content). If Google only discovers content via a sitemap, then I would expect slow indexing rates and poor performance. Personally, I often prefer not to have a sitemap at all on a site where I'm monitoring and/or working on improving performance.
| 9:10 am on Jul 19, 2008 (gmt 0)|
Google finds a new site quite quickly, and then slowly builds the number of pages appearing in the index, a few pages at a time.
Yahoo seems to take a little longer to find a site, but once found gets all of it indexed a lot quicker, in just a few steps.
Live takes forever to find a new site, but then grabs all of it within a matter of just a few days.
That's my experience with sites from a few dozen to many hundreds of pages.
| 7:14 pm on Jul 25, 2008 (gmt 0)|
I'm not sure your question was answered. Tedster says these reports are often flakey, which I agree is the case. However I am seeing a similar disconnect (actually, a much larger one) between what Webmaster Tools says, and what I know to be the case.
At the moment, I have a customer that is doing a massive URL change on the site. We've managed this very carefully, with all the appropriate 301 redirects.
Google Webmaster tools is reporting that 140 pages have been indexed. A site: query reports that 20,000+ pages have been indexed. This number includes both "old" and "new" URLs. Google has crawled the site extensively, and is gradually digesting the new URL structure, and we can see that they've clearly indexed over 5,000 of the new URLs.
Since this is such a massive URL change, we're trying to keep an eye on all metrics to make sure it's progressing normally. The 140 number reported in Webmaster Tools - clearly at odds with what we're seing is in the index - is quite disconcerting. While site: queries may not be reliable, one would assume that Google would try to provide reasonably accurate numbers in Webmaster Tools. 140 is not even close to reasonably accurate. And it has not changed in days, so it's not a matter of a small lag.
I would love to hear of anyone else that is seeing similar disconnects between what Webmaster Tools is reporting, and what is, in fact, happening in the Google index.
| 7:42 pm on Jul 25, 2008 (gmt 0)|
How is your internal link structure? Are there good ways for visitors to browse to this content?
Have you done your link building? Do you have links to both your home page, and to individual content pages, where appropriate?
| 8:34 pm on Jul 25, 2008 (gmt 0)|
Timster, thanks for your reply.
I do not want to hijack ATWeb's thread, though it seems to have been dormant for a while, and I thought my case might shed further light on the question of how reliable the "Total URLS:" vs. "Indexed URLs:" stats reported by Webmaster Tools might be.
Again, I would love to hear other people's experiences with these numbers. Is the 140 number (vs. 5600+ verified by querying the Google index) a signal that something is wrong? Or do people just think it's an unreliable metric?
Note that Webmaster tools is not reporting any problems with accessing or reading the sitemap.
Timster, to answer your questions:
1. The site is very spider friendly. Plain text URLs, reasonably optimized link hierarchy, average of far less than 100 links per page. Well cross-linked within the site. Good navigation, breadcrumbs, etc. Good googlebot activity, no unusual crawl errors in Webmaster Tools.
2. We have just been engaged by the client, so have done no real link building ourselves (yet). They have some links, but a grossly inadequate number overall. Still, there are certainly enough links, both to the home page and deep pages, to give them some external pagerank and make them visible to crawlers from off-site pages.
Thanks for any other insights into possible explanations for the strange Webmaster Tools numbers.
| 8:43 pm on Jul 25, 2008 (gmt 0)|
Just to keep a running record on this thread (since I haven't found any others on this topic), here's Google's help information on the topic (source: [tinyurl.com...]
"The stats on my Sitemap Details page don't look accurate. Why not?
The stats on the Sitemap Details page are a close approximation of the status of your URLs. However, this figure might not be 100% accurate. Our internal systems are always changing, and the web itself is an ever-shifting ecosystem. In addition, there may be a lag between when the numbers are calculated and when they are visible to webmasters.
We don't guarantee that our system will index all the URLs in a Sitemap. In addition, we don't index images directly (instead, we index the page that contains the image). As a result, direct image URLs in your Sitemap won't be indexed."
| 8:52 pm on Jul 25, 2008 (gmt 0)|
More from Google (source: [tinyurl.com...] that makes it sound like the Webmaster Tools reported numbers *should* be reasonable reliable:
Q: If it doesn't get me automatically crawled and indexed, what does a Sitemap do?
A: Sitemaps give information to Google to help us better understand your site. This can include making sure we know about all your URLs, how often and when they're updated, and what their relative importance is. Also, if you submit your Sitemap via Webmaster Tools, we'll show you stats such as how many of your Sitemap's URLs are indexed. Learn more.
| 11:26 pm on Jul 25, 2008 (gmt 0)|
Greyhound4334, I just have a quick question. When you say "a grossly inadequate" number of links, what do you mean? Do they have low-quality links, under 100 links, what? I'm kind of new to this world, so (while I realize this is not the topic of this post) was wondering if you could give me a bit of insight. Thanks
| 12:23 am on Jul 26, 2008 (gmt 0)|
well,it's all relative. the bottom line is that while the site is well structured and has good "on site optimization", it's not yet ranking well, and the key to solving that is getting more links. More good quality links, in particular. To be even more precise, links from "trusted", "authoritative" sites within their domain. An ideal link is from a high quality, relevant page on a "trusted"*, relevant site with anchor text that matches your desired keywords. Sometimes you can't get the ideal link, but you should strive for something as close to that as possible.
*A site that has, itself, established credibility with Google by having good content and good links pointing at it.
So in a nutshell, they don't have enough of those to rank well for our desired keywords. Our job is to find a way to help them get those (organically, not purchasing them).
Hope that helps,
| 3:06 pm on Aug 29, 2008 (gmt 0)|
Will external links building could improve the site indexing?
| 4:09 pm on Aug 29, 2008 (gmt 0)|
With the site: command, have you tried going all the way to the end, the last page? Once you reach the last page, you will sometimes notice the total number of URLs changes when you reach the end especially for a very large site.
I sometimes feel the total number of results on page 1 of Google is somewhat like an estimate that gets more accurate are you go down the pages.
So sometimes it appears to be too many but when you reach the end it is not as many as you think.
Aside from that... when you reach the last page, sometimes the pages that are too similar in content are all collapsed at the last page and if you open it up, it will be indented results.
| 2:29 am on Aug 30, 2008 (gmt 0)|
You're right about that, Benj. In fact, the early numers you see usually say "of about nnnn pages" but by the end of the line, the word "about" sometimes goes away.