Welcome to WebmasterWorld Guest from 54.166.150.205

Forum Moderators: open

Message Too Old, No Replies

Google June 2003 : Update Esmeralda Part 2

     
3:17 pm on Jun 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Continued from: [webmasterworld.com...]


MurphyDog/johnser/bokesch, sound like all your sites will benefit from those extra links over time. If we didn't get the site into this index, sounds like we'll get it soon. It's fun to watch expectations change. MurphyDog launched his site a week or so ago and is chomping at the bit for it to show up. Give it just a little bit of time--we should find the site soon. :)

<added>
P.S. I won't be posting as often (gotta work, ya know :), but I will be checking this post and chiming in when there's something I can add.
</added>

11:26 pm on June 16, 2003 (gmt 0)

Full Member

joined:Dec 13, 2002
posts:314
votes: 0


Just because those other search engines show 80 backlinks doesn't mean Google will because Google doesn't show anything less then a PR4. Google is saying you only have 7 backlinks that are PR4 or higher.
11:47 pm on June 16, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:June 16, 2003
posts:111
votes: 0


"Just because those other search engines show 80 backlinks doesn't mean Google will because Google doesn't show anything less then a PR4. Google is saying you only have 7 backlinks that are PR4 or higher."

Thank you for that info, i did not know that, however, I have several backlinks that are 6 or higher and they are not showing at all...

11:55 pm on June 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Hi vbjaeger, welcome to WebmasterWorld. Google probably saw all those backlinks to your site--it's just that we don't report all backlinks that we see. But don't worry; we find and process many more links than we report.

If you jumped 27 notches and there's only 100 people ahead of you, then you sound like a natural SEO--you got 1/5th of the way there in one try. :)

I'd recommend reading around here more. Start with the FAQ and Brett's guide to building traffic in 26 steps. Feel free to keep an eye on what I post via my profile page. Think about terms that users would actually type to find your site, and make sure you've got them on your pages. Look for other sites that your users might like and link to them, and look for sites that would be a good match for your domain and nicely ask if they'd link to you. Definitely read through our section at www.google.com/webmasters/ for more advice about how to make your site more crawlable.

One cautionary word of advice: take everything with a grain of salt, and make choices that are common sense to you and work well for your users. For example, there was recently a thread that suggested Google was running out of "address space" to label our documents. I was talking to another engineer here and he said he almost fell out of his chair laughing when he read that. So there's always a lot of theories floating around all the time about why something is this way or that. My advice is to assume that Google wants the most useful, relevant pages to come up first for searchers. Try to build those useful, relevant pages as well as you can, and we'll do our best to find them and rank them accurately for searches.

Welcome to WebmasterWorld!
GoogleGuy

[edited by: GoogleGuy at 11:56 pm (utc) on June 16, 2003]

11:56 pm on June 16, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 13, 2002
posts:3347
votes: 0


vbjaeger, are you checking your backlinks on www.google.com, or www-fi.google.com? The latter will have the most recnt index.
11:59 pm on June 16, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:June 16, 2003
posts:111
votes: 0


I checked on fi- and found 3 more, but it seems we actually lost a couple and gained more from within my own site.
12:06 am on June 17, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:May 20, 2003
posts:44
votes: 0


"My advice is to assume that Google wants the most useful, relevant pages to come up first for searchers. Try to build those useful, relevant pages as well as you can..."

I think that for a vast majority of site owners, this is what we are trying to do and we do try our best to stay within Google's guidelines. I can say for me, that building a user friendly sight has always helped with google. I may have had a few bumps with the last update (and maybe this one as well but I am hoping things will change). Generally good content and a relevant site will be rewarded by google and by users.

12:18 am on June 17, 2003 (gmt 0)

Full Member

10+ Year Member

joined:July 10, 2002
posts:232
votes: 0


When I do an allinurl search for my site, I'm coming up with the indexed pages in mydomain.org, but I'm also coming up with duplicate content on URLs like:

[secure.mywebhost.net...]

I've never seen this before.

Googleguy, is this likely to cause me problems? Anyone any ideas about how to stop this happening if it is a problem?

12:20 am on June 17, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 28, 2003
posts:43
votes: 0


I still think there's a lot of merit to the "address space" theory... for one thing, if you set up a sim in RAM it actual terminates in a flow error... something the boys at microsoft have been working with, and showed off here in DC a few weeks back.
12:27 am on June 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 13, 2002
posts:3347
votes: 0


Start with the FAQ and Brett's guide to building traffic in 26 steps.

It would be interesting to see how many who have missing index pages have followed most of those steps. I did... whoosh..gone

My advice is to assume that Google wants the most useful, relevant pages to come up first for searchers. Try to build those useful, relevant pages as well as you can, and we'll do our best to find them and rank them accurately for searches.

That's why I'm not about to jump out the first floor window.
12:55 am on June 17, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Feb 27, 2003
posts:223
votes: 3


I'm still confused as to why I see 45 backlinks on all datacenters except -fi, where I only see 8.

Why would I lose 37 backlinks?

The links are still there. I checked.

Will this kill my PR, and slam me?

Oh, and another site of mine is showing a Grey Bar now. I'm really confused by all this because the site shows up on numerous searches on -fi. While I understand that there will be fluctuation of PR on the toolbar right now, should that include fluctuating into the Grey Bar range, which generally means the site has been penalized doesn't it?

1:18 am on June 17, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


The address space theory is bunk, and this is why:

Google is going to identify pages by the result garnered through some hash algorithm or message digest, the inputs of which may be domain name, path, title, who knows? This allows them to separate index bits by prefixes of this message digest/hash result.

Given a hash "signature" that is 20 characters long, and using both upper and lower case letters as legal characters, we have a total address space of 20^^52, or 4.503599627370496e+67 (4.5 with 67 zeros after it).

If we think that Google engineers are using a four-byte unsigned integer value to identify pages we may be taking an infantile view of their structure. :)

Peter

1:35 am on June 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Hey, I'm going to have to turn off my stickymail (I'm slowly collapsing under the weight of stickies to read), but I'll still be reading this thread off and on. I'm checking on subpages showing. Some searches I've collected seem like the right level of detail to me. In another case I don't think the final pageranks have settled down yet. But if you have specific feedback on good or bad searches, you can always send it to webmaster at google.com.
1:36 am on June 17, 2003 (gmt 0)

New User

10+ Year Member

joined:June 11, 2003
posts:5
votes: 0


Hi,

I hope this is the place for my posting.

I have a site which appears in www-fi.google.com in the number one spot for my chosen key words. It also had appeared as number one in the last index used. Now with this new update I cant find it anywhere. The site is about a month old, perhaps a little more.

Am i right in thinking that the www-fi.google.com is the latest index, and that this is a snapshot of what is going to hit the streets?

Can I sit back and wait for my site to hit the high street as it appears in the fi index.

Is is possible that an old index is being used while the update completes?

David

1:40 am on June 17, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 20, 2003
posts:390
votes: 0


Google is going to identify pages by the result garnered through some hash algorithm or message digest, the inputs of which may be domain name, path, title, who knows? This allows them to separate index bits by prefixes of this message digest/hash result.

You sound pretty sure of that.

Given a hash "signature" that is 20 characters long, and using both upper and lower case letters as legal characters, we have a total address space of 20^^52, or 4.503599627370496e+67 (4.5 with 67 zeros after it).

That would be 52^20, 2.0896e+34.

Of course that would also require significantly more storage space than the proposed 4-byte system.

I'm not one to guess at the mechanics of the system, but I can see how just a little forethought would have avoided any sort of address space issues that have been speculated about. When you realize that just an extra byte can really save your butt in these situations, you aren't likely to be cheap about it.

1:53 am on June 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 13, 2003
posts:672
votes: 0


hi all,
to the guy who has had his site copied , I hope this might help , 8 months ago a company copied one of my sites (300 pages )word for word even to the point of leaving my name in email addresses to their domain, I was gutted and thought about evil devilish palns to sabatage them , but then decided if there was any justice on the net I would just keep working away at my site and not spend all my time worrying about them . low and behold 6 months later i was back to the same stats where i was and up somewhat . the company in question uses every dubious technique to obtain visitors on umpteen domains and still appears in top 3 places on loads of keywords with no substance to their site but it is easy to forget the surfers out there will only be conned once with search engine results so my moral is serps and PR is only king for a day what we provide to our visitors hopefully will decide if we succeed long term
1:57 am on June 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


"I'm checking on subpages showing. Some searches I've collected seem like the right level of detail to me. In another case I don't think the final pageranks have settled down yet."

I certainly hope that pagerank will soon kick in and obvious errors be corrected. Also, a complete lack of any pagerank calculations could conceivably explain why situations like mine are occuring (but it would have to be "complete lack", with any calculation of PR the current ranking of the minor pages would just be silly). Therefore I will attampt to have faith for a bit longer...

My users may be goofy but I think even they might wonder why they are starting on page three of a three page article instead of page one.

Also, this straightjacket the white coats have me in makes it hard to type....

2:03 am on June 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 4, 2002
posts:1068
votes: 0


Steve40,

<<it is easy to forget the surfers out there will only be conned once with search engine results so my moral is serps and PR is only king for a day what we provide to our visitors hopefully will decide if we succeed long term>>

Well stated. Some of my business interests are in areas especially subject to spam noise. Often quality does rise to the top. But man oh man you can really get buried in the noise until Google implements filters to stop it. But while buried in spam, cloaked sites and thousands of identical doorway pages, it's good to remember words like yours. Quality will be remembered. Trust of the public is easy to lose.

2:08 am on June 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 4, 2002
posts:1687
votes: 0


Hey, I'm going to have to turn off my stickymail (I'm slowly collapsing under the weight of stickies to read)

Incredible; GG actually had stickymail on. There must be some special WebmasterWorld medal of honour for that. Brett, any chance of a trophy or citation or something for GG on this one?

To stay on topic; Esmeralda was the girlfriend of the Hunchback of Notre Dame, and the update came on Father's Day.... coincidence?

2:09 am on June 17, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 31, 2003
posts:386
votes: 0


Dolemite:

Whoops! Thanks for the correction. (It's late--that's my story and I'm sticking to it).

While it's true that the 20 character identity would add to storage space, at this time (3 billion pages) it would "only" be 16*3 billion, or around 40-ish gigs of space. If you spread this out over, say, 40 machines that's a gig apiece: not a lot. And the index would be spread out over more than 50 machines.

Peter

2:19 am on June 17, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 19, 2002
posts:2139
votes: 0


<<My users may be goofy but I think even they might wonder why they are starting on page three of a three page article instead of page one.

Also, this straightjacket the white coats have me in makes it hard to type.... >>

That made my night :)

What benefit does landing 2 clicks away from the content the user is searching for have?

Also, I'm not quite convinced that frshbot has done the job as well as deep used to as it just didn't seem to pick up all the pages it should have :(

2:30 am on June 17, 2003 (gmt 0)

Junior Member

joined:Mar 2, 2003
posts:104
votes: 0


While it's true that the 20 character identity would add to storage space, at this time (3 billion pages) it would "only" be 16*3 billion, or around 40-ish gigs of space. If you spread this out over, say, 40 machines that's a gig apiece: not a lot. And the index would be spread out over more than 50 machines.

Excuse me, but read the essay by Brin and Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," about the Google architecture. They use two inverted indexes, the "fancy index" and the "plain index." Between these two indexes, plus the other places in the system where the docID is used, it amounts to a total space requirement of two docIDs per word per document indexed.

Yes, that's not one docID per document, it's two docIDs per word per document.

You can use a 20-byte hash if you like, but I think a four-byte or five-byte docID would make just a little more sense.

2:31 am on June 17, 2003 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38070
votes: 16


> It looks like Google is having difficulty telling
> that www.mydomain.com is the same as mydomain.com

This is nothing new and honestly, they are handling it correctly. www.domain.tld is not necc .domain.tld

Recap

- Update will take several days to finalize. Sites will flux in/out until then.
- PR on the toolbar is not reliable.
- PR from -fi is not reliable.
- Directory has not been updated or is glitchy at times.
- Sit back, Relax and enjoy the flight.

> Brett, any chance of a trophy or citation or something
> for GG on this one?

He'd never take a gratuity. (I've even tried to help with algo on many occasions ;-) no go.

2:32 am on June 17, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 20, 2003
posts:390
votes: 0


Hey Peter,

OK, so the actual hashed IDs might only be 40GB, but you need to associate each of those IDs with every word on the page in order to have a searchable index, probably by associating words themselves with the pages on which they occur. That's where the bigtime storage comes in, since you effectively have to multiply that ID by every word, but its necessary to index in this manner for computational efficiency. Gigabytes become terabytes pretty quickly...and you'll need to keep most (if not all) of it in RAM to return 10-100 results in under a second. Further computational and storage efficiency would be gained by keeping the IDs as short as possible, and in that sense I can see some credibility in the theories about running out of address space.

2:50 am on June 17, 2003 (gmt 0)

New User

10+ Year Member

joined:Apr 15, 2003
posts:37
votes: 0


> Brett, any chance of a trophy or citation or something
> for GG on this one?

He'd never take a gratuity. (I've even tried to help with algo on many occasions ;-) no go.

I've learned SO MUCH on this forum!

The most astonishing thing so far that I've learned on this go-round:

BRETT actually has a sense of humor-and a GOOD, DRY ONE @ that!

Just when I thought that the mods were nuns with rulers poised to slap the hands of anyone trying to be creative-thanks for the comic relief, Brett (and for the fantastic forum that you have created)!

BTW, if this post is deleted, please disregard the 'sense of humor ' reference.

2:52 am on June 17, 2003 (gmt 0)

New User

10+ Year Member

joined:Mar 11, 2003
posts:10
votes: 0


checked the 'new' PR for my sites using -fi's IP (hosts file method).

IFFF this PR sticks (and my hunch is that it will for most sites), seems like now Google is awarding PR more conservatively than it has in the past. In the past (its been my observation that) getting links from fewer high PR sites resulted in a greater boost in PR. Comparitively it seems now that getting links from a greater number of high PR sites is resulting in a relatively smaller boost (or no change) in PR. This change seems to be across the board on all of my sites.

I always felt that Google's (pre-Dominic) algo was rather too easy to 'work with' in order to achieve a high PR. Simple trick seemed to be to have 1 or 2 PR-7 sites link to you to get a PR of 5 or 6 (all other on and off page factors constant).

If that is actually the case, then this change is welcome since it's in line with some recent observations that Google's algo seems to be laying less emphasis on PR now (as opposed to the past) when ranking pages in search results.

This 249 message thread spans 10 pages: 249
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members