Forum Moderators: Robert Charlton & goodroi
What I've noticed now is this:
- let's say a new article is posted 8 hours ago.
- I search it in all Google datacenters but it does not appear to be indexed in any of them until one hour ago.
- when it appears in the search results it says it was indexed 7 hours ago, although for the first six hours it does not appear in any data center.
Even when the new article shows in the results, it takes ages for all the datacenters to update.
I checked to see if I am breaking any of G's webmaster guidelines and I DO NOT. The bot comes and downloads the new page but it does not index it. Can anyone explain this situation please? It's driving me insane!
I can't understand why suddenly Mr.G stopped indexing the new articles.
It's indexing old pages that have a link to the new post, but it does not want to index the new post.
I wish I could post my website URL here to get an opinion on what's wrong.
Please help. Is this some kind of penalty?
I'm beginning to loose my mind over this, literally.
Spidering and indexing are two separate steps - they must be in a data set as large as Google's. So we just can't think of Google the way we would think of a mySQL, Access or Oracle database, where once a record is added then it's immediately findable.
Your situation sounds to me like one of these:
1. An infrastructure change on Google's back end, possibly a temporary re-allocation of resources.
2. A different classification of your blog, so it's "freshness" in the search results is now a second tier priority, not top tier.
From your report, your new urls show up in less than a day, even though they may not migrate to all data centers for a while. Many people would envy that situation! And even a close inspection of your website is not likely to add any further insight.
So I'm not sure you've got a problem here. Do your server logs show that googlebot still comes by an hour or so after the Feedburner ping?
I'm not so worried about ranking because I always post original content. I know this because before I post anything I do a search and it returns no results. So, theoretically, my post should be the only result, or at least on the first page.
It is a problem because I post original content. Scraper sites copy everything and they get indexed faster and appear in the search results and get all the traffic. So practically I'm working in vain.
Another thing I don't understand:
when a new URL eventually turns up in the results, it says it was indexed 7 hours ago, although it started appearing in the results just 5 minutes ago. Could this be a geo location issue? (the new page gets indexed in a far datacenter and needs more time to get into the main index)
That's when the spider recorded the page. As I said before, spidering and actually showing up in the index are different stages, but the timestamp is for when googlebot got the source code from your server.
Yes, there's a change in your pattern, and I can sympathize with your concern about scraper sites - although I doubt that there's much you can do to change it. Do you have Webmaster Tools set up - and do you watch it for feedback from Google?
And for the timestamp... the Googlebot gets the source code much earlier than what the timestamp says. I have some doubts that the timsestamp shows when the bot gets the source code (It usually downloads a new post in around an hour after the feedburner ping.)
How come I never see a timestamp that's less than 7 hours?
Until this situation I was able too see any timestamp from a few seconds to 22 hours. (from what you're saying a new post was indexed as soon as it was spidered).
This is the third time this situation happened. I'm beginning to believe that there's a time penalty of some sort, but I can't figure out the reason (I can't figure out how it got solved the first two times either) because I'm playing by the rules.
time penalty of some sort
I actually don't think you have a "problem" here, but this is just a slight modification to G's behavior.
If there's no penalty, some settings are changing for certain and I think those settings are referring to which datacenter the bot is assigning my domanin. I say this because I observed that the datacenters update far slower than usual.
I know this because before I post anything I do a search and it returns no results. So, theoretically, my post should be the only result, or at least on the first page.
Just a quick question: are you blogging for Google or for the visitors of your site?
It seems to me you are over-obsessed with your Google Rankings... just do whatever you do -write original content- and Google will follow (eventually).