Forum Moderators: open

Message Too Old, No Replies

My Google woes....

If i had hair I would be ripping it out !

         

Equiano

10:38 am on Aug 29, 2003 (gmt 0)

10+ Year Member



I have a site (Site A) which has the home page of PR6. I created a new site (Site B) which I linked to the home page of my PR6 site. I did this in April this year.

In the days of the Monthly update Google most of pages of Site B were successfully added into the index. Now since the "continuous update" update started, I have witnessed a gradual removal of the Site B's Pages from the index.

I can't work out what is going on. I am not using any spamming techniques as site B has been built using the same techniques I have used on a lot of other sites which are all sitting happily in Google's index.

Any ideas how I go about sorting this out? At its peak Site B had about 80 pages in the index, now it's down to about 5!

Mark_A

9:17 pm on Aug 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Equiano .. a question ..

You say "I created a new site (Site B) which I linked to the home page of my PR6 site. I did this in April this year. "

did you link from the PR6 page to the new page or from the new page to the PR6 page?

I assume you mean you linked from the PR6 to the new page ....

is that the only reference link into the new site or are there also other credible links into the new site?

Jakpot

12:04 am on Aug 31, 2003 (gmt 0)

10+ Year Member



That's Google folks!

claus

12:42 am on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jackpot, Equiano, this seems to be something that's happening a lot lately, judging by the number of posts on this subject.

It seems like Google is favoring ..well, whatever.. perhaps it really is a new emphasis on fresh content, as in recently updated pages. And the oldies&goodies get pushed out or put in the background for a while.

It could have something to do with the reconfiguration of the Gbot - now that it's deepfreshbot it has to do more things, and then there's also more of them out spidering all the time. With all this fresh content constantly added it seems that other (older, but not less valuable) content is sliding out the backdoor due to limited capacity.

I've seen some posts mention that there are scalability issues being worked on and i personally hope this is something they will be looking at, as this is not really good for the quality of their searches - some of the sites dropping are valuable sources that's been around for ages. Personally i keep finding (updated but irrelevant) email-lists, discussion boards and blogs when i'm looking for quality (static) reference material and it's simply getting harder to find the good stuff.

Don't think this comment is valuable to you, but it's a thing i've been thinking a bit about lately. I admit, though, that it sounds like the usual "wait and see, it'll get better eventually" that has been aired quite a few times, sorry about that. Fact is that i haven't got a clue if it's going to be better, i just hope so.

/claus

Jakpot

1:59 pm on Aug 31, 2003 (gmt 0)

10+ Year Member



There is a lot of negative energy swirling around about Google right now. Not just from webmasters but end users as well.

vitaplease

2:08 pm on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Equiano,

any chance of duplicate content on site B?

also did you try adding "&filter=0" to the end of your search query you might be using to check site B's pages?

see also: [webmasterworld.com...] msg4

In any case - getting extra site external deeplinks to Site B's inner pages should also help.

Did you check if googlebot is chewing on your pages?

Josefu

11:41 am on Sep 1, 2003 (gmt 0)

10+ Year Member



Since we speak of woes : ) - Google has visited a total of twelve times over the past four months and taken only my robots.txt and index page. I can't for the life of me tell why... could someone help by having a look at my site? (profile) - sorry to ask this but I'm going nuts - and I have lots of hair to pull out. Please pull this if I'm doing a no-no...

claus

11:49 am on Sep 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Josefu your problem is also shared by a lot of other posts. It's all in the behavior of the new Gbot i think. Vitapleases advice on getting external incoming links to deep pages may help (if you're able to get them) - Otherwise it seems that the site must pass some levels (of PR/incoming/PR of incoming/?) before the spidering gets deeper these days...

/claus

Josefu

12:02 pm on Sep 1, 2003 (gmt 0)

10+ Year Member



...thank you claus. Getting links to internal pages is a GREAT idea : )

( DOH! )

valeyard

12:55 pm on Sep 1, 2003 (gmt 0)

10+ Year Member



Some while back there was speculation that Google might be running out of index space. AFAIR it was laughed off at the time.

Now I'm beginning to wonder.

The observations people are reporting sound very similar to the old freshbot behaviour, but applied to the entire index over a longer timescale.

If the index was reaching capacity then a slow churn of pages would be one short-term workround. My guess is that this isn't a bug but a deliberately applied piece of sticking plaster.

James_Dale

1:42 pm on Sep 1, 2003 (gmt 0)

10+ Year Member



is site A on the same or similar IP address as site B?

claus

1:49 pm on Sep 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



well perhaps capacity is not the right word for it after all. Afaik they have a very distributed architecture so increasing capacity could be done by adding more boxes if that was it. Otoh, their setup is a very large version of distributed linux computing, so if any civilian entity should meet possible capacity constraints, my guess would be google.

They have recently increased the published index size too, although this size does not have to be recent just because the publishing of it is recent. This is at least a signal that they have increased capacity.

The "index" afaik is just one huge file that is already distributed across several machines, and the querying consist of identifying pointers to the relevant part(s) of this file. In principle just like when you make a zipfile that spans a couple of floppies, although somewhat more complicated and much larger. It's a very unconventional setup, but it has proved very efficient and scalable sofar.

So, it's probably not a capacity problem in "the index" as such. It's more likely a decision made somewhere - an altered or new "weight" to certain pages of the SERPS imho.

/claus

Napoleon

2:05 pm on Sep 1, 2003 (gmt 0)



>> There is a lot of negative energy swirling around about Google right now. Not just from webmasters but end users as well. <<

It's not helped by some of the daft things they've done in recent months... the Spamazon debacle probably being the most noticable and the most heavily criticized. I would have expected them to clean it up by now, but they don't seem to see it as a problem.

Perhaps some of the proper content pages lost above were replaced by this stuff. If there is a finite lid on the index, the logic is that something has to go to make room for it. Who knows.

claus

2:19 pm on Sep 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It might also be as simple as a preference for recent values of
document last modified
.

Such would give all the dynamically generated pages an edge over static pages. Problem is, you just can't trust that a page is fresh just because it's dynamically generated the minute you request it.

That's not a bug, it's just an issue solved programatically that cannot be solved programmatically. A regular error (40), that is.

/claus

Kackle

4:06 pm on Sep 1, 2003 (gmt 0)



The "index" afaik is just one huge file that is already distributed across several machines, and the querying consist of identifying pointers to the relevant part(s) of this file. In principle just like when you make a zipfile that spans a couple of floppies, although somewhat more complicated and much larger. It's a very unconventional setup, but it has proved very efficient and scalable sofar.

Yes, the architecture is distributed in terms of hardware, but this does not preclude the need for every unique URL to receive a unique docID. This is done on the front end of the architecture, during the crawl.

From The Anatomy of a Large-Scale Hypertextual Web Search Engine [www-db.stanford.edu]:

"Every web page has an associated ID number called a docID which is assigned whenever a new URL is parsed out of a web page."

Napoleon

5:12 pm on Sep 1, 2003 (gmt 0)



>> the Spamazon debacle probably being the most noticable <<

Actually.... I may have to retreat a little... they seem to be scaling back on this thank goodness!

Looking across the centers, most of them seem to have pushed it back somewhat from where it was a week or so ago. Much better quality of course, if they keep it that way, or better still, push it further. Anyone else notice the change?