Welcome to WebmasterWorld Guest from 34.238.194.166

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Why would indexing (not spidering) slow down?

Not exactly a penalty, but...

     
10:01 pm on Feb 15, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 6, 2004
posts:84
votes: 0


I was one of those that used <a certain> ad network and got smacked down hard into the supp index during the BigDaddy rollout around Feb 2006. That was rectified and my site was added back into the index with the same rankings for my keywords that I had previously. One thing that I did notice though, was that ever since then it takes much longer for my content to be added to the index.

In the past when I would post new content, and it was read by GoogleBot according to my web logs, it would take about 1-3 days to get into the index. Now it takes anywhere from 5-8 days. Page ranking for this content does not seem to be affected, it's just that it takes much longer to get in the index.

This is obviously a problem, because when I break a new security story, tons of sites link back to it, but their re-reports of my news make it into the index way before my page. This causes them to grab the lion's share of the traffic for this story rather than it going to me who broke the news in the first place.

This is not a one-off event either. This happens every single time I post something new. I post some news. My logs show GoogleBot indexes it. Other people start blogging or creating articles about the news on their sites (after googlebot already accessed my page). The next day these people appear in the index, while it takes another 5-7 days for my page to appear. When my page eventually makes it into G, I have the #1 ranking or very close, but by that point it's old news so does not provide a huge benefit.

Also the sites making it into the index are not necessarily higher trafficed sites than mine. Many of them have much lower pagerank. This just happened to me and its driving me nuts because I have no idea how to go about fixing the problem.

So thats why I think if I am being penalized, it is a strange one. Amount of indexed pages is fine, and when the content does make it into the index it ranks great. It just takes a long time to get into the index in the first place.

Any thoughts, comments, suggestions, money, or anything else that comes to mind

<Specifics not needed>

[edited by: tedster at 11:01 pm (utc) on Feb. 15, 2007]

11:29 pm on Feb 15, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 5, 2006
posts:2095
votes: 2


That time seems to be the norm now. Nice thing is when a page typically updates its across 80% of the data centers. I would not worry about it to much if googlebot is visiting those pages soon after you create them.
4:32 am on Feb 16, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 6, 2004
posts: 84
votes: 0


That time seems to be the norm now.

Is it? If that is the case, then all of these other sites should not be making it into the index before me when their content was created after mine, and after GBot visited me.

There must be something at play here that is allowing smaller sites with lower PR etc to get into the index before me. The question is, is it me, or is it them?

Something tells me it's me unfortunately.

1:25 pm on Feb 16, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 21, 2006
posts:569
votes: 0


That's the remnaint of the penalty.
Which - in my opinion - is as manual as it gets.

I've been running into it on two sites I'm working on. Yet to find a cure.

All in all, there's a penalty on your site, which can not take effect because of the trust that flowed to you with the probably thousands ( or tens of thousands ) of backlinks. Did I get the numbers right?

Your site has a dark shadow it can't cast away.

Google's schizophrenic behaviour originates from the fact it both trusts the pages and views them with suspicion.

You know, in the case of these sites I'm overseeing this penalty might very well be manual. There was some minor keyword stuffing on one, with some subdomains containing duplicate content linked from nowhere on the net, but that just can't be it, as most of this has been removed.

But on the other site there's nothing wrong, except it is touching a subject that Google is really sensitive to. Nothing that it could be penalized or banned for, nothing unethical either. Has half a million uniques a year so I really can't complain, but it's interesiting to see such a behemoth of popularity NOT being able to introduce a new page.

Here're some symptoms you should look out for:

- Are the pages being cached within a day or two... then dropped from the cache?
- Does Google crawl older pages at a pace of about two months?
- Does the cache revert from a much newer date, back to a date where a given change was not yet visible?

On the sites I'm talking about this is a deliberate act, no doubt about it. I've seen the cache go backwards when they found new data on a page.

The site is crawled daily, and if I looked hard enough I could catch the dates for today or yesterday, just before the filters take effect and wiped it out, replacing it with a cache a month old.

Yet the sites are doing great with the keywords they are targeting.
Interesting.

I was thinking about a reinclusion request detailing that we know we had problems, but have now corrected them. And see if there's going to be any change to this... pattern. Although while I know what to say for one of the sites ( and I know that wasn't the cause, but, oh well ), the other one I can't even come up with a fake.

4:06 pm on Feb 16, 2007 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


It used to be the case that spider activity related to PR and new incoming links. I've noticed more recently that the bots are much more intelligent, even to the point of visiting at times when content is normally uploaded. In addition, they seem to use Adsense to inform them of new URLs through the general caching feature. I'd be surprised if Analytics didn't do the same...
4:52 pm on Feb 16, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 6, 2004
posts:84
votes: 0



All in all, there's a penalty on your site, which can not take effect because of the trust that flowed to you with the probably thousands ( or tens of thousands ) of backlinks. Did I get the numbers right?

Yes..according to the links tool in the Google webmaster tools I have about 59k+ inbound links. After I downloaded the list and imported them into excel I scanned through em all. They all look like legit relevant links. There does not appear to be any non-relevant inbound links leftover from the ad network big daddy days. So those seem to have been flushed or disregarded...which is a good thing.


Google's schizophrenic behaviour originates from the fact it both trusts the pages and views them with suspicion.

You know, in the case of these sites I'm overseeing this penalty might very well be manual. There was some minor keyword stuffing on one, with some subdomains containing duplicate content linked from nowhere on the net, but that just can't be it, as most of this has been removed.

What I do not understand is that I never practiced anything shady on my site. I understand, since BigDaddy, that Google frowned upon the ad networks, but I think it is negligent on their part to have put a penalty on my site when they had never officially said it was wrong to use them. Regardless, I have stopped using it since word started coming out that it "was possibly" frowned upon. That was the only activity on my site that can possibly be construed as shady and it was stopped over a year ago.

Has half a million uniques a year so I really can't complain, but it's interesiting to see such a behemoth of popularity NOT being able to introduce a new page.

I agree entirely. This bug is not hurting my site per se, but is frustrating seeing other sites benefiting from my work and research while my site gets the leftover traffic. It really should be the other way around. We get close to a million uniques and Google referrals a month. You would think with that type of traffic my content may be of some quality that is worth indexing a little quicker.

Are the pages being cached within a day or two... then dropped from the cache?

Nope...my serps remain in the cache.

Does Google crawl older pages at a pace of about two months?

No Google indexes my pages quite often...old or otherwise. GoogleBot lives on my site.

Does the cache revert from a much newer date, back to a date where a given change was not yet visible?

Not that I can see.

It used to be the case that spider activity related to PR and new incoming links. I've noticed more recently that the bots are much more intelligent, even to the point of visiting at times when content is normally uploaded. In addition, they seem to use Adsense to inform them of new URLs through the general caching feature. I'd be surprised if Analytics didn't do the same..

Well this particular page in question has been viewed 6500 times since Monday. Not a huge amount, but thats enough Adsense views from that page to suggest there is some activity there.

This whole situation is very frustrating and as usual with Google it is impossible to get an answer. I sent a request to Google support asking if we are penalized a couple of days ago, but I am sure they will say no.

5:10 pm on Feb 16, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 21, 2006
posts:569
votes: 0


The page I'm testing this with currently had 5000+ views since Monday.

Furthermore... it also has AdSense on it, which seems to be quite up to date with what the content might be. Yet the cache has been removed after two days. Can't even make Google cough it up with a site: command right now.

The site in question is several years old, outranks everything in the SERPs, has adsense, analytics, webmaster tools, and is crawled, cached regularly...

...unless a new page shows up, or additional words appear that were not on the page before, which perhaps are aiming for additional relevancy.

In that case the cache first gets refreshed, stays that way for two days, then is rolled back to a date before the change was made / page was added.

This has been going on for a while.

Eventually the pages get accepted, the changes get cached, but the process is slow. But not because of "acceptance", but rather because the cached data the rollback is made with, slowly advanced to the date where the change was already made.

<edit reason: didn't post in time> ... might not be the same problem as yours but is definately similar. Perhaps the "penalties" in question are different, but in both cases they seem to be holding back the site, while trust seems to be giving them a boost. And all this at the same time.

[edited by: Miamacs at 5:31 pm (utc) on Feb. 16, 2007]

9:46 pm on Feb 16, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


"This is obviously a problem"

No, it's the norm. It's happening all over the place so there's no point in personalizing it.

10:43 pm on Feb 16, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 6, 2004
posts: 84
votes: 0


No, it's the norm. It's happening all over the place so there's no point in personalizing it.

How is this the norm when it happens every time to my site, yet many other sites are easily beating me to the index with my own information?

The logic is just not following for me. If this was the norm, then we would all be in the same boat, and I would get indexed first because GB visited me first.

This is not the case, though.

If you are saying its the norm for small no-name mfa/affiliate sites to beat out larger and more authoritative sites then Google has much bigger problems then I knew about.

11:00 pm on Feb 16, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 5, 2006
posts:2095
votes: 2


Google assigns crawling priorities. If your fox news you will get crawled and cached daily. If not, then it might be a bit longer. If you have backlinks to your site on the story google will know you wrote the article.
11:34 pm on Feb 16, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


"If this was the norm, then we would all be in the same boat"

No we wouldn't. That doesn't make sense.

The bottom line though is this is the normal behavior for the vast majority of sites, so just looking at it in isolation doesn't accomplish anything.

Some new pages I added this in the past eight days are now on their third fresh date addition to the index, while others still are on their second, others aren't in at all, and a couple stuck immediately... and they all have exactly the same linking. That's just the way it works now with Google's lousy crawl, lousy indexing and also semi-instant supplemental-ing in some cases.

12:16 am on Feb 17, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 21, 2006
posts:569
votes: 0


Sure. But going lethargic and choosing the ever convenient answer, "Google is lousy and that's that"... has never helped me to understand anything, and so far there were very few things about Google I didn't comprehend.

Back on topic, I think I have an alternative idea about how and why this would be happening. It is tied to page-by-page trust, internal links passing this trust, and relevancy. No need for external links, if the page added is relevant to the homepage, and the broad theme of the inbounds to the homepage or any of its neighbours within the site.

In other words, while the homepage is trusted for X, it is not trusted for Y. And if I add a page linked from the homepage about Y, then that page will have no trust at all, for not even the homepage is trusted for that word, thus the internal links won't be able to pass the trust that makes the site instant top 3 for everything that's related to X.

To use the ever popular disinforming term, this could be a tiny "sandbox effect", making that single page fall below the thresholds to appear ( this fast ) in the primary index.

Basically this all could be connected to the same overall trend of Google trying to filter out irrelevant and spam information. Perhaps the sites were/are not relevant for the content on these pages. I'll need to look into this a little further.

I'll need to get these pages indexed faster.
And since I've seen it work on all the other sites, I know there's a difference. Perhaps this was it.

2:25 am on Feb 17, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 6, 2004
posts:84
votes: 0


Google assigns crawling priorities. If your fox news you will get crawled and cached daily. If not, then it might be a bit longer. If you have backlinks to your site on the story google will know you wrote the article.

Agreed...then I should be in the index as I am the source and googlebot crawled me immediately. Dozens of sites are linking back to me since the beginning of the news story release. All of those stories linking back to mine are in the index..mine is not.

How does this make sense? According to your logic, Google should know I am the source and thus get me into the index and ranking highly on it.

Basically this all could be connected to the same overall trend of Google trying to filter out irrelevant and spam information. Perhaps the sites were/are not relevant for the content on these pages. I'll need to look into this a little further.

This is not my case. The linking page is totally relevant to the linked page.

The bottom line though is this is the normal behavior for the vast majority of sites, so just looking at it in isolation doesn't accomplish anything.

If thats the case, then why are all these other sites posting basically reprints of my story and getting in the index within a day? As I said previously..this is not a one-off. As these sites always reprint my news there is a history here that I have witnessed. They reprint my news and are always in the index way before me. Therefore there is something about these sites that are getting them indexed faster. Either they are whitelisted or I have a penalty or am on a blacklist. With the history and past experiences, I just do not see any other explanation.

Unfortunately, it sounds like a penalty of some sort to me. Even more so when this all started from the supp/big daddy issue that some of us had. Steve, from what I remember you were in it as well. Maybe you have the same penalty and it's not quirky google behaviour like you believe.

Now is it even remotely possible to speak to someone knowledgeable at Google to help rectify this?

3:48 am on Feb 17, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


Dealing with duplicate content is totally different. That's always the primary issue. In your case it appears your domain is primarily duplicate content, so Google gives it not much respect.
4:49 am on Feb 17, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 6, 2004
posts:84
votes: 0


How is my content duplicate content? Feel free to sticky me with how you feel I have DC issues if there are particulars. I personally do not see it.

Also from what I heard duplicate content on your own domain will only make it so certain pages wont be indexed. For example if I have two pages detailing the same filename, it will only one of those pages to the index. Not penalize you further.

so Google gives it not much respect.

And if that is the case why is it when my content gets into the index it ranks well?

[edited by: Grinler at 4:51 am (utc) on Feb. 17, 2007]

11:27 am on Feb 17, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


"then why are all these other sites posting basically reprints of my story"

You said other people use the same content.

12:23 pm on Feb 17, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 6, 2004
posts: 84
votes: 0


What I meant was I write a story about Widgets that noone else has. "New type of widget has been released with unbelievable widget technology".

Then dozens of other sites write about it using their own words "Example.com reported yesterday of a new widget with extraordinary technology. For the full story go here: link to example.com"

That is what I meant about reusing my content. Not that they were stealing it word for word.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members