Welcome to WebmasterWorld Guest from 54.162.227.136

Message Too Old, No Replies

Googlebot visits, but it takes hours before page is in results

     
2:21 pm on Jun 18, 2008 (gmt 0)

5+ Year Member



The website in question is a blog. Up until last week Mr.G indexed a new article in an hour or so after it was posted (and after I pinged Feedburner). This pattern stopped abruptly on Thursday.

What I've noticed now is this:

- let's say a new article is posted 8 hours ago.
- I search it in all Google datacenters but it does not appear to be indexed in any of them until one hour ago.
- when it appears in the search results it says it was indexed 7 hours ago, although for the first six hours it does not appear in any data center.

Even when the new article shows in the results, it takes ages for all the datacenters to update.

I checked to see if I am breaking any of G's webmaster guidelines and I DO NOT. The bot comes and downloads the new page but it does not index it. Can anyone explain this situation please? It's driving me insane!

I can't understand why suddenly Mr.G stopped indexing the new articles.

4:49 pm on Jun 18, 2008 (gmt 0)

5+ Year Member



It's the third time this year that this has happened and I don't know how it got fixed the first two times.

It's indexing old pages that have a link to the new post, but it does not want to index the new post.

I wish I could post my website URL here to get an opinion on what's wrong.

Please help. Is this some kind of penalty?

I'm beginning to loose my mind over this, literally.

5:06 pm on Jun 18, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



A penalty that delays indexing for part of day? That's hardly likely, IMO. Penalties drive your urls down the rankings or remove them altogether.

Spidering and indexing are two separate steps - they must be in a data set as large as Google's. So we just can't think of Google the way we would think of a mySQL, Access or Oracle database, where once a record is added then it's immediately findable.

Your situation sounds to me like one of these:

1. An infrastructure change on Google's back end, possibly a temporary re-allocation of resources.

2. A different classification of your blog, so it's "freshness" in the search results is now a second tier priority, not top tier.

From your report, your new urls show up in less than a day, even though they may not migrate to all data centers for a while. Many people would envy that situation! And even a close inspection of your website is not likely to add any further insight.

So I'm not sure you've got a problem here. Do your server logs show that googlebot still comes by an hour or so after the Feedburner ping?

5:27 pm on Jun 18, 2008 (gmt 0)

5+ Year Member



Yes it does. Sometimes the bot takes the new URL 2 or 3 times in the hour following the feedburner ping. It just does not want to appear in the results.

I'm not so worried about ranking because I always post original content. I know this because before I post anything I do a search and it returns no results. So, theoretically, my post should be the only result, or at least on the first page.

It is a problem because I post original content. Scraper sites copy everything and they get indexed faster and appear in the search results and get all the traffic. So practically I'm working in vain.

Another thing I don't understand:

when a new URL eventually turns up in the results, it says it was indexed 7 hours ago, although it started appearing in the results just 5 minutes ago. Could this be a geo location issue? (the new page gets indexed in a far datacenter and needs more time to get into the main index)

5:51 pm on Jun 18, 2008 (gmt 0)

5+ Year Member



Something I a forgot to mention, that started a couple of weeks ago: in any given day it indexed a number of posts as usual, and a few of them did not get indexed until the next day.

Starting from last Thursday none of the new posts get indexed until the next day.

6:44 pm on Jun 18, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



[quote]it says it was indexed 7 hours ago[quote]

That's when the spider recorded the page. As I said before, spidering and actually showing up in the index are different stages, but the timestamp is for when googlebot got the source code from your server.

Yes, there's a change in your pattern, and I can sympathize with your concern about scraper sites - although I doubt that there's much you can do to change it. Do you have Webmaster Tools set up - and do you watch it for feedback from Google?

9:54 pm on Jun 18, 2008 (gmt 0)

5+ Year Member



Yes, I have Webmaster Tools set up. No feedback from Google.

And for the timestamp... the Googlebot gets the source code much earlier than what the timestamp says. I have some doubts that the timsestamp shows when the bot gets the source code (It usually downloads a new post in around an hour after the feedburner ping.)

How come I never see a timestamp that's less than 7 hours?

Until this situation I was able too see any timestamp from a few seconds to 22 hours. (from what you're saying a new post was indexed as soon as it was spidered).

This is the third time this situation happened. I'm beginning to believe that there's a time penalty of some sort, but I can't figure out the reason (I can't figure out how it got solved the first two times either) because I'm playing by the rules.

2:12 pm on Jun 19, 2008 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



time penalty of some sort

I don't think so. Google's spidering and indexing behavior is (as far as we know/think), algorithmically-driven, based largely on PageRank. The settings tend to sometimes slip and slide a notch or two. When they do, it's logical that some sites will see changes in frequency of spidering and speediness of indexing.

I actually don't think you have a "problem" here, but this is just a slight modification to G's behavior.

2:50 pm on Jun 19, 2008 (gmt 0)

5+ Year Member



I've seen blogs with less PageRank that get indexed with no problems, so I don't think PageRank is a factor in this matter.

If there's no penalty, some settings are changing for certain and I think those settings are referring to which datacenter the bot is assigning my domanin. I say this because I observed that the datacenters update far slower than usual.

3:10 pm on Jun 19, 2008 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I've seen blogs with less PageRank that get indexed with no problems, so I don't think PageRank is a factor in this matter.

I didn't say it was *all* PageRank. Obviously there are other factors, can be as simple or as complicated as the folks at Google would like it to be.
3:29 pm on Jun 20, 2008 (gmt 0)

5+ Year Member



I know this because before I post anything I do a search and it returns no results. So, theoretically, my post should be the only result, or at least on the first page.

Just a quick question: are you blogging for Google or for the visitors of your site?

It seems to me you are over-obsessed with your Google Rankings... just do whatever you do -write original content- and Google will follow (eventually).

7:10 pm on Jun 22, 2008 (gmt 0)

5+ Year Member



It seems to me you are over-obsessed with your Google Rankings

Just my point. I don't care about Ranking because I usually am the first to post on a particular subject. But if scraper sites get indexed faster than me, there's no point in posting at all.

12:36 pm on Jun 23, 2008 (gmt 0)

WebmasterWorld Senior Member wheel is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Here's just a wild thought assuming that speed of getting indexed is what actually makes the difference when fighting scraper sites. How about a small script that checks who's asking for a page and giving a 404 or a blank page until it's Googlebot requesting the page - after which shut the script down and publish the page.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month