homepage Welcome to WebmasterWorld Guest from 54.166.105.24
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Penalty for updating/adding sitemaps?
Google is losing track of sitemap entries
rstidx



 
Msg#: 4563809 posted 1:23 pm on Apr 11, 2013 (gmt 0)

In an attempt to track the number of pages on my website versus the pages that have been indexed, I've been recording the stats found
on the Sitemaps page (Webmaster Tools, under Optimization). When Index Status was added, I started plotting those numbers as well.

I've been in the process of refreshing my website and adding new content. With the new pages uploaded, I wanted to update just a
couple of sitemaps to see the effect on my graph. I got quite a surprise.

On March 25th, I recorded the following:

From the WMT >> Optimization >> Sitemaps

Sitemap Entries: 1,927,042
Indexed: 2,435,647

Note: I don't have sitemaps for all my pages

From the WMT >> Health >> Index Status

Crawled: 3,894,297
Indexed: 1,385,951

On March 31st, I updated 2 existing sitemaps (sm_ut.xml, sm_ut01.xml) and added 2 new sitemaps (sm_ut02.xml & sm_ut03.xml). I used
an online validator to verify the sitemaps - no errors. The size of the sitemaps and their location:

Sitemap Location
http://www.example.com/sm_ut.xml: 7,072 entries
http://www.example.com/sm_ut01.xml: 6,936 entries
http://www.example.com/sm_ut02.xml: 6,959 entries
http://www.example.com/sm_ut03.xml: 6,760 entries

Total entries: 27,727

On April 1st (and there's nothing funny about it), I found the following:

From the WMT >> Optimization >> Sitemaps

Sitemap Entries: 1,026,611
Indexed: 1,326,025

From the WMT >> Health >> Index Status

Crawled: 3,892,171
Indexed: 1,385,951

Updating/adding the 27,727 entries in the four sitemaps caused Google to lose track of 900,431 sitemap entries and the total number
of indexed pages dropped by 1,109,622 pages. Also notice that on the Index Status page, the number of pages Ever Crawled also dropped
by 2,126 pages.

Today (April 11th), it hasn't improved much:

From the WMT >> Optimization >> Sitemaps

Sitemap Entries: 1,026,611
Indexed: 1,330,904

From the WMT >> Health >> Index Status

Crawled: 3,894,297
Indexed: 1,452,801

I've been trying to understand what happened, but without any success. At this point, I assume that it's some kind of bug with
Google's code.

Although I'm not certain of it, I believe that I've seen this behavior before. If you look at the graph between March & May 2012,
you'll see that the green line (Sitemap Indexed) has a slight incline. During this time, I was trying to keep my sitemaps up-to-
date. In mid-June, I had an idea and stopped updating my sitemaps - you'll notice that the rate of indexing seemed to make a significant
improvement.

For the first time in a year, I made the first update to any of my sitemaps (the four mentioned above) and definitely feel like I'm
being punished for it. Does anybody have an idea? I have more sitemaps to upload, but I think it would be a real mistake without an
understanding of what's happened. If I update a few more sitemaps, my site will probably disappear altogether.

While I'm on the subject: I haven't been able to understand the difference between the numbers shown on the Sitemap page and the
Advanced Index Status page. Looking at my chart, it seems certain that Pages Indexed from the Sitemap page and the Total Pages
Indexed from the Indexed Status page must come from different points in Google's process (not only are the counts different, but
their behavior over time are also different).

Has anybody found documentation or other posts on the difference between the two - I haven't had any luck finding anything.

Any ideas? Is there any way to fix or work-around this situation?

[edited by: engine at 8:43 am (utc) on Apr 17, 2013]
[edit reason] no specifics, thanks, examplified [/edit]

 

rstidx



 
Msg#: 4563809 posted 1:32 pm on Apr 11, 2013 (gmt 0)

I just noticed that if you click on any of the four links to the sitemap XMLs, it will cause a 404 on my website. I don't know the cause, it must have something to do with the way that the post has wrapped the link.

I checked and Google has no problem accessing the sitemaps. If I open a new browser window and use the full URL for the sitemap, it works fine.

Kendo

5+ Year Member



 
Msg#: 4563809 posted 2:37 pm on Apr 11, 2013 (gmt 0)

Having a sitemap won't limit spidering to the pages that are listed in that map. Likewise not all pages in a site map will be indexed. So it's quite understandable to have the number of pages that are actually indexed much less than or a lot more than what are listed in a sitemap.

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4563809 posted 4:09 pm on Apr 11, 2013 (gmt 0)

Don't obsess with Webmaster Tools numbers too much, they appear to fluctuate willy-nilly at times.

And with all the numbers you pulled out you didn't mention the important one: Traffic.

rstidx



 
Msg#: 4563809 posted 6:08 pm on Apr 11, 2013 (gmt 0)

My traffic is fine for now, the issue is whether this anomaly will affect future traffic. Six months from now, I don't want to discover that I have a drop in traffic and it was because of a mystery I blew-off in April.

Consider my point of view: The number of Sitemap Entries should be a simple count of the entries in all the sitemaps. They've had over a week to get the right answer and they haven't. I understand system latencies and limitations of sampling, but I don't think that they are an issue for something so simple. There's nothing that should be hard or complex about it - there's just one correct answer. If they can't get the right answer for something so simple, then why should we believe that they get the right answer for anything more complex - like Analytics or Adsense earnings?

Understand that I'm not throwing stones or just being critical. I really want to believe that there's a problem about how I've done things and that the problem is not with Google. If this anomaly is of my own making, then it's something I should be able to fix - once I understand it.

As I said, I really don't want to discover down the road that it's something that I should have fixed today. I don't understand what I'm seeing and I'm hoping somebody can shed some light on it.

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4563809 posted 7:18 pm on Apr 11, 2013 (gmt 0)

My traffic is fine for now

Good, that means that there's no penalty.

I really want to believe that there's a problem about how I've done things and that the problem is not with Google.

If you've double and triple checked everything on your end -- and have had somebody else put eyeballs on it -- then the problem probably does lie with Google. It might have an awesome team of PhDs, but sometimes they can't get the simplest things right.

For example, just last month the Webmaster Tools folks couldn't deliver pages in the correct language:

Bugs in WMT.... Ooops [webmasterworld.com]

And in February they couldn't report the correct number of backlinks:

Anybody lost a ton of link data in Webmaster Tools ? [webmasterworld.com]

Usually a month doesn't go by without at least one major glitch. Again, if you feel comfortable that you have everything correct, the most I would do at this point is monitor the situation.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved