homepage Welcome to WebmasterWorld Guest from 54.145.182.50
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 249 message thread spans 9 pages: < < 249 ( 1 2 3 4 [5] 6 7 8 9 > >     
Google June 2003 : Update Esmeralda Part 2
GoogleGuy

WebmasterWorld Senior Member googleguy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 14342 posted 3:17 pm on Jun 16, 2003 (gmt 0)

Continued from: [webmasterworld.com...]


MurphyDog/johnser/bokesch, sound like all your sites will benefit from those extra links over time. If we didn't get the site into this index, sounds like we'll get it soon. It's fun to watch expectations change. MurphyDog launched his site a week or so ago and is chomping at the bit for it to show up. Give it just a little bit of time--we should find the site soon. :)

<added>
P.S. I won't be posting as often (gotta work, ya know :), but I will be checking this post and chiming in when there's something I can add.
</added>

 

Kackle



 
Msg#: 14342 posted 2:30 am on Jun 17, 2003 (gmt 0)

While it's true that the 20 character identity would add to storage space, at this time (3 billion pages) it would "only" be 16*3 billion, or around 40-ish gigs of space. If you spread this out over, say, 40 machines that's a gig apiece: not a lot. And the index would be spread out over more than 50 machines.

Excuse me, but read the essay by Brin and Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," about the Google architecture. They use two inverted indexes, the "fancy index" and the "plain index." Between these two indexes, plus the other places in the system where the docID is used, it amounts to a total space requirement of two docIDs per word per document indexed.

Yes, that's not one docID per document, it's two docIDs per word per document.

You can use a 20-byte hash if you like, but I think a four-byte or five-byte docID would make just a little more sense.

Brett_Tabke

WebmasterWorld Administrator brett_tabke us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 14342 posted 2:31 am on Jun 17, 2003 (gmt 0)

> It looks like Google is having difficulty telling
> that www.mydomain.com is the same as mydomain.com

This is nothing new and honestly, they are handling it correctly. www.domain.tld is not necc .domain.tld

Recap

- Update will take several days to finalize. Sites will flux in/out until then.
- PR on the toolbar is not reliable.
- PR from -fi is not reliable.
- Directory has not been updated or is glitchy at times.
- Sit back, Relax and enjoy the flight.

> Brett, any chance of a trophy or citation or something
> for GG on this one?

He'd never take a gratuity. (I've even tried to help with algo on many occasions ;-) no go.

Dolemite

10+ Year Member



 
Msg#: 14342 posted 2:32 am on Jun 17, 2003 (gmt 0)

Hey Peter,

OK, so the actual hashed IDs might only be 40GB, but you need to associate each of those IDs with every word on the page in order to have a searchable index, probably by associating words themselves with the pages on which they occur. That's where the bigtime storage comes in, since you effectively have to multiply that ID by every word, but its necessary to index in this manner for computational efficiency. Gigabytes become terabytes pretty quickly...and you'll need to keep most (if not all) of it in RAM to return 10-100 results in under a second. Further computational and storage efficiency would be gained by keeping the IDs as short as possible, and in that sense I can see some credibility in the theories about running out of address space.

Traveler

10+ Year Member



 
Msg#: 14342 posted 2:50 am on Jun 17, 2003 (gmt 0)

> Brett, any chance of a trophy or citation or something
> for GG on this one?

He'd never take a gratuity. (I've even tried to help with algo on many occasions ;-) no go.

I've learned SO MUCH on this forum!

The most astonishing thing so far that I've learned on this go-round:

BRETT actually has a sense of humor-and a GOOD, DRY ONE @ that!

Just when I thought that the mods were nuns with rulers poised to slap the hands of anyone trying to be creative-thanks for the comic relief, Brett (and for the fantastic forum that you have created)!

BTW, if this post is deleted, please disregard the 'sense of humor ' reference.

rts5678

10+ Year Member



 
Msg#: 14342 posted 2:52 am on Jun 17, 2003 (gmt 0)

checked the 'new' PR for my sites using -fi's IP (hosts file method).

IFFF this PR sticks (and my hunch is that it will for most sites), seems like now Google is awarding PR more conservatively than it has in the past. In the past (its been my observation that) getting links from fewer high PR sites resulted in a greater boost in PR. Comparitively it seems now that getting links from a greater number of high PR sites is resulting in a relatively smaller boost (or no change) in PR. This change seems to be across the board on all of my sites.

I always felt that Google's (pre-Dominic) algo was rather too easy to 'work with' in order to achieve a high PR. Simple trick seemed to be to have 1 or 2 PR-7 sites link to you to get a PR of 5 or 6 (all other on and off page factors constant).

If that is actually the case, then this change is welcome since it's in line with some recent observations that Google's algo seems to be laying less emphasis on PR now (as opposed to the past) when ranking pages in search results.

dazzlindonna

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 3:02 am on Jun 17, 2003 (gmt 0)

rts5678,

But if incoming links from high PR sites is less of a factor now, how does that fit into Google's own description of their service. Here's some excerpts from one of their help pages.
-------
The heart of our software is PageRank...
PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."
-------
So, is Google changing the *heart of their software*? Seems like this would be a strange thing to do. Almost like Microsoft deciding to dump Windows for Linux. Ok, maybe that's a wierd analogy, but it just feels off to me. Yet, this dance hasn't helped my PR in -fi (yet) despite the highly ranked links I've gotten from related sites. So, I am definitely confused. Am still hoping the rest of the dance will change this.

my3cents

10+ Year Member



 
Msg#: 14342 posted 3:16 am on Jun 17, 2003 (gmt 0)

Brett, you said:
This is nothing new and honestly, they are handling it correctly. www.domain.tld is not necc .domain.tld

and I am wondering, since it's new to me and having a bad effect...

ok, it's true that www.domain.tld is not necc .domain.tld

but shouldn't google be able to tell if they are the same thing?

shouldn't they know that index.shtml is the same in both instances too?

I understand your point that this is nothing new, but I can tell you for a fact that it's new to a lot of sites that have multiple listings of different paths to the same page, and it only makes sense that this has something to do with the fact that those pages dropped significantly.

I'm not trying to start an argument with anyone, but it is probably not coincidence that having the same page indexed 4 or more times would dilute PR and ranking relevancy.

If google has to determine which page of a themed site should show up, and has ALWAYS chose the main page, splitting this main page into 4 duplicate listings can't make this choice any easier.

I understand that there is a lot of update left, and I'm not rushing out to change my .htaccess file until I see if this is something google will fix or not. I'm not going to ignore the problem and say there's nothing wrong though.

GoogleGuy

WebmasterWorld Senior Member googleguy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 14342 posted 3:31 am on Jun 17, 2003 (gmt 0)

I just wanted to echo what Brett was saying about PR in flux. I've seen several searches, including a few that Napoleon was kind enough to pass on (thanks Napoleon!), and many of those are still affected by pending PR computation. I wouldn't worry about PR until things settle down a little more; indeed, you might want to wait a few days after people call the index switchover complete before you draw conclusions about what your PR is. Just wanted to chime in with that so that people not to worry too much. Hope it helps people to know that some PR is still stewing in the back. In particular, rfgdxm1: looks like the search you told me about isn't done brewing, and Napoleon: the first search you mentioned to me also looks like it will settle more (both for the better, in my opinion).

Maybe it's pointless to advise people to bear in mind that results probably will change some (based on the digging that I've done today), but hopefully it will ease a few minds, too. :)

[edited by: GoogleGuy at 3:32 am (utc) on June 17, 2003]

rts5678

10+ Year Member



 
Msg#: 14342 posted 3:32 am on Jun 17, 2003 (gmt 0)

dazzlindonna,

There's absolutely no doubt that PageRank is at the heart of Google's algo(s), however PageRank is only one of the variables that Google uses (in addition to other on page and off page factors; anchor text, kw density etc.) in order to rank pages in search results. I would like to think that in order to determine SERPs, the importance of page rank 'PR' (in relation to other on page and off page factors) is not a constant, and is controlled by Google's engineers who write that algo.

Let's not forget that Page Rank has been bought and sold in the past (and probably still is); therefore it's important that in order to deliver high quality results, authors of an algo are able to fine tune the importance of PR in determining the search engine result positions of web sites.

With content (and not PR) being the king, there are plenty other reasons why PR alone must not be the domineering factor in determining the overall position of web sites in search results.

Regardless, best of luck (PR and otherwise) with this dance.

[edited by: rts5678 at 3:51 am (utc) on June 17, 2003]

przero2



 
Msg#: 14342 posted 3:51 am on Jun 17, 2003 (gmt 0)

Maybe it's pointless to advise people to bear in mind that results probably will change some (based on the digging that I've done today), but hopefully it will ease a few minds, too. :)

Here is hoping and praying that you had a chance to dig into the ranking of an internal page vs. index page issue and hope things will change for better:). Thanks

GoogleGuy

WebmasterWorld Senior Member googleguy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 14342 posted 4:12 am on Jun 17, 2003 (gmt 0)

"Incredible; GG actually had stickymail on. There must be some special WebmasterWorld medal of honour for that."

Hey, I wanted to get a chance to ask about what searches were good or bad. :) It snowed me under enough that I'll probably leave stickies off in the future except to duck in, ask for specifics from someone, and duck back out.

What, you thought I had some outside life or something? Note to self: gotta work on getting an outside life. :)

As far as stickymail goes, I was heartened to hear a lot of really nice messages. Even the reports of searches people disliked weren't as bad as I'd expected from reading the boards here; that was another interesting surprise.

drewls

10+ Year Member



 
Msg#: 14342 posted 4:20 am on Jun 17, 2003 (gmt 0)

GG, it shouldn't be a surprise. Personally, I've been pretty happy with this whole thing so far. It treated us pretty darned well. But even if it hadn't, your work to keep everyone's mind at ease during all this is most definately noteworthy to say the least, no matter what one thinks of the index. IMHO, only a real troll would think otherwise and they're too chicken to sticky ya :D

dazzlindonna

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 4:21 am on Jun 17, 2003 (gmt 0)

rts5678,

I do understand that PR isn't the only factor in determining search position, and really, I guess you and I might have been talking about different things. I was referring to backlinks increasing PR, rather than PR affecting SERP position. But in any case, GG says PR is still stewing, so I will just be patient (or become a patient in a mental hospital) and see what happens...

Dolemite

10+ Year Member



 
Msg#: 14342 posted 4:22 am on Jun 17, 2003 (gmt 0)

There's absolutely no doubt that PageRank is at the heart of Google's algo(s)...

I'm not so sure about that. I would say language/semantics and HTML/web context is the heart of any web search engine. For many, many searches, pagerank has little or no influence on the results returned.

dazzlindonna

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 4:24 am on Jun 17, 2003 (gmt 0)

Dolemite,

I was just quoting straight from Google's own website, which says...

The heart of our software is PageRank

GoogleGuy

WebmasterWorld Senior Member googleguy us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 14342 posted 4:29 am on Jun 17, 2003 (gmt 0)

You bet, mrguy. I'll try to work through the backlog over the next week or so. Toward the last bit, it got to be a little much. I could look up and see the snowball of stickies getting bigger and bigger. :)

Dolemite

10+ Year Member



 
Msg#: 14342 posted 4:33 am on Jun 17, 2003 (gmt 0)

Dolemite,
I was just quoting straight from Google's own website, which says...

The heart of our software is PageRank

Well, I guess I disagree with them. ;)

Chicago

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 4:52 am on Jun 17, 2003 (gmt 0)

I guess I disagree

With G? How dare you Dolemite. You think this is a company that makes mistakes?

A long day for me, thx to G. Must go... but thought I'd give a parting shot to the almighty before I dream about her, even though I'll be erased- kinda makes me feel better in a way.

jeremymgp

10+ Year Member



 
Msg#: 14342 posted 4:55 am on Jun 17, 2003 (gmt 0)

Hi everyone,

For me the update is good news, nice to see fresh backlinks in at last, especially as 2 months ago I had virtually no external links at all.

My index page is still not there for my main keyword in Google, but in Yahoo it's jumped a whole pile of places from around 150 to 41, all thanks to WW and a systematic 2-part campaign of content and link building. The jump is no doubt because of the links, but the question is does the better position in Yahoo bode well for Google too? As Yahoo uses Google it seems likely that I should see a similar jump, but on the other hand it's strange that my index page does well in Yahoo first.

What do people think are my chances of a similar jump in Google SERPS?

All the best everyone,

Jeremy

Chicago

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 5:01 am on Jun 17, 2003 (gmt 0)

Jeremy, Y! is G

The difference is Y! has a 20 default serp and shows no interior pages, whilst G has a 10 default with interior pages showing in results. Otherwise you are witnessing flux in the index.

dazzlindonna

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 5:06 am on Jun 17, 2003 (gmt 0)

Chicago, thanks for sharing that info. I hadn't noticed that before. My search term moves up 3 pages in yahoo (to the second page), due to the 20 results default / no interior pages default. Wow, I sure wish google's defaults were the same. I would be a much happier camper. Granted, I'm basically still in the same position, but it *feels* better. :)

twilight47

10+ Year Member



 
Msg#: 14342 posted 5:28 am on Jun 17, 2003 (gmt 0)

3 Down 6 to go

fi sj dc

r3ved

10+ Year Member



 
Msg#: 14342 posted 5:46 am on Jun 17, 2003 (gmt 0)

I just observed another interesting phenomena. There is a site that sits in the number one spot for one of my minor keywords. It is an internal page on the site and I have never looked hard at it. I just visited it and noted that there was a page rank of 6 on that page. I then visited the home page and it has a PR of 4. I then ran a back link check on the PR6 page and it has all internal links (about 60 of them) and one link from the DMOZ (no idea why the DMOZ linked to an internal page, I didn't think they did that).

Obviously their are a bunch of other variables involved, but it seems strange to have an internal page with all internal links with a PR 2 higher than the home page. Did this guy crack the PR code and figure out how to make page rank consolidate on an internal page through a interlinking scheme?

Napoleon



 
Msg#: 14342 posted 6:24 am on Jun 17, 2003 (gmt 0)

>> I've seen several searches, including a few that Napoleon was kind enough to pass on (thanks Napoleon!), and many of those are still affected by pending PR computation. I wouldn't worry about PR until things settle down a little more <<

A nights sleep.... and I feel much better.... but still sad enough to get out of bed and head straight to the PC.

The above is absolutley correct. Many of those I received yesterday are back to more or less where they should be on their original index pages. A relief really, because there are some situations that just don't add up.

I'll study (and monitor) more today, including those that came in overnight. But it is clear that we are still in update flux (you'd think I'd should know better than to think otherwise on day 1 of the dance... let's just call it therapy!)

Thanks GoogleGuy by the way for looking over those. They were very typical.

At least today has started more positively. I lost yesterday completely!

WebGuerrilla

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 6:41 am on Jun 17, 2003 (gmt 0)


The new index is now showing up on SJ and DC. It's quite refreshing to see that we seem to be back to a timeframe measured in days rather than weeks.

After last month, I was expecting GG to announce that it had finally moved to the second datacenter sometime around July 4th. :)

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 14342 posted 6:52 am on Jun 17, 2003 (gmt 0)

At least today has started more positively. I lost yesterday completely!

In my case I think I have about 18 hours left in my penalty. The first of the June 14 fresh tags have disappeared. The second set should be dropping off soon. After the June 15 tags go I should be back on top.

As an aside. All interior pages (35) are still in the top 5 for their key phrases and the new content is also right on top. Losing the home page's key phrases accounts for only about 10% of the traffic, if that. It is the yardstick by which potential clients judge the site, however. I can't really say "Join my site, look how it ranks on MSN when you search for 'company name', now can I?"
I'm not whining here, just giving an example of why diversification (within a site) is not always enough.
OTOH, the directory is contractually limited in size and scope, and almost full. All I really have to do is keep bringing qualified traffic, regardless of where i'm ranked. Almost.

[edited by: Powdork at 6:54 am (utc) on June 17, 2003]

a1call

10+ Year Member



 
Msg#: 14342 posted 7:25 am on Jun 17, 2003 (gmt 0)

Hi,
I have a theory. google is involved in 80% of the search performed on the net. The new index contains manyfold more webpages as the older indexes because of the extra subpages that are added. This and the new algos are returning drastically different results for most keywords. This results in people finding what they are looking for faster as the returned resuls are of more relevance. All this reduces the the internet trafic as people do not need to keep looking.
I am saying All this as althogh I have unbelivable positions for many keywords, I am actually experiencing less hits than last month.
I know that www does not always retrun the new index but the new index has danced into www regularly.

JayC

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 7:25 am on Jun 17, 2003 (gmt 0)

dazzlindonna:
But if incoming links from high PR sites is less of a factor now, how does that fit into Google's own description of their service.

The heart of our software is PageRank...

I think we have to be careful not to confuse marketing prose with engineering realities.

PageRank likely will always be an important part of the process, even while the approach to calculating it might change. But its importance to the people developing the algorithms is one thing, and its importance to the people writing "why you should choose Google" copy is another. :)

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 14342 posted 8:02 am on Jun 17, 2003 (gmt 0)

In my case I think I have about 18 hours left in my penalty. The first of the June 14 fresh tags have disappeared. The second set should be dropping off soon. After the June 15 tags go I should be back on top.

The rest of the 6/14 tags are gone for me. I'm hoping for only eight more hours in the box. Only then will I know if this is something thats in the new index.

mipapage

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 14342 posted 8:04 am on Jun 17, 2003 (gmt 0)

The new index is now showing up on SJ and DC.

Funny, for one of my search terms I get different results on -sj then on any of the other datacenters. It's been this way for the last two updates, but I can't figure it out. I've never seen the -sj result turn up on www.google.com outside of the 'dance' period.

Has anyone else noticed this?



As luck would have it, the result on -sj is much more favorable...

tiappon

10+ Year Member



 
Msg#: 14342 posted 9:01 am on Jun 17, 2003 (gmt 0)

Is PR now Site gu-Estimated?

I have a site with a front door (widgets-wadgets.com/index.html), that then directs to different main language index pages (e.g. widgets-wadgets.com/english/index.html, widgets-wadgets.com/french/index.html etc.).
The top index door is a PR4, (with 17 links coming in shown in Google) the inner language pages are PR3.

But, now here's the odd thing: I also have about 30 into the *English* page directly showing. I got these to boost that English page but they don't seem to have any effect.

I think those thirty on their own would be enough to give that English index page a PR4 even without any other links. yet its a PR3 just like the other language index pages.

It looks like PR is now estimated for a site based on its connectivity to the root index. Anyone else noticed this?

[added] I don't think these are real PRs, I think they're starting points in a continuous PR calculation [/added]

[edited by: tiappon at 11:15 am (utc) on June 17, 2003]

This 249 message thread spans 9 pages: < < 249 ( 1 2 3 4 [5] 6 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved