homepage Welcome to WebmasterWorld Guest from 54.237.98.229
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 60 message thread spans 2 pages: < < 60 ( 1 [2]     
Searching 4,285,199,774 web pages
panos

10+ Year Member



 
Msg#: 22051 posted 12:16 pm on Feb 17, 2004 (gmt 0)

Have you noticed that on google.com?

"Searching 4,285,199,774 web pages"

 

rise2it

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 1:53 am on Feb 18, 2004 (gmt 0)

Keep in mind Dealtime and Kelkoo account for about a bazillion of these!

Krapulator

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 2:23 am on Feb 18, 2004 (gmt 0)

There was an interesting quote from Sergey Brin in the Sydney Morning Herald that I didn't notice in any other report:

[smh.com.au...]

"Google has made five significant changes to its algorithmic formulas in the last two weeks, Brin said."

skipfactor

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 2:38 am on Feb 18, 2004 (gmt 0)

Still the biggest & the best...size does matter, and logs too.

Happy Billion Day, Cheers.

Stefan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 3:31 am on Feb 18, 2004 (gmt 0)

Personally, my favorite is the 880M images. The bump from ~400M plus freshening the data makes it much more useful.

You wouldn't say that if it was your site getting a lot of useless traffic because it happens to have some popular search term pics on it.... :-)

The jpg's on our site suck back a lot more bandwidth the the text. Anyway, I shan't ban the imagebot... we still have our necks above water bandwidth-wise. Some of them click on through.

sidyadav

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 3:57 am on Feb 18, 2004 (gmt 0)

Try this query [google.com]. Gives you a total for 5 billion results.

Sid

allanp73

10+ Year Member



 
Msg#: 22051 posted 3:58 am on Feb 18, 2004 (gmt 0)

This really affected my ranking. My site when down over 880,000 spots to 4,285,199,774th ;)

webgp

10+ Year Member



 
Msg#: 22051 posted 4:04 am on Feb 18, 2004 (gmt 0)

Anyone have an estimative of how many pages are public available at Internet?

Maybe google is indexing just 10% of all Internet pages or less , I have no idea.

MedCenter

10+ Year Member



 
Msg#: 22051 posted 5:58 am on Feb 18, 2004 (gmt 0)

IITian, amazon has 4 million pages on Google, Yahoo has 15 million... (my site has twice that, but yeah)

claus

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 6:37 am on Feb 18, 2004 (gmt 0)

Krapulator, from that article you posted:

As Google covers more online turf, it is also digging deeper into Web pages. Roughly 40 per cent of the Web pages scanned by Google weren't fully indexed until the latest improvements, Brin said. Now all but about 20 per cent of the Web pages that Google covers are fully indexed.
(emphasis added)

Now, is it common in Australian media to use the term "Web pages" meaning "web sites" or are "pages" to be understood as individual pages?

If it's not common in Australian media, then one might wonder if it's common inside Google in stead? It's one of two:

(1) the term is used meaning "sites" in which case Google is now indexing deeper, ie. more pages from same site

(2) the term is used meaning "pages" in which case Google is indexing wider, ie. more code than previously is now indexed (eg. invisible page elements, such as scripts, css, and markup tags)

(sidenote: if it's (1) then the 4 bio. figure is much larger when translated to page count)

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 22051 posted 7:03 am on Feb 18, 2004 (gmt 0)

perhaps the emphasis should have been only on fully indexed. Perhaps part of the recent 'anomalies' are the result of a page being only partially indexed. Perhaps previously 40% of the pages were only partially indexed and now that is down to 20%. Perhaps by partially indexed that implies that the page (or site) could have a score but be missing a local score (to speak in very general terms), or vice versa.
I personally don't believe this, but it shows how things can change just by moving the [b] over a few words.
Claus, you seen my queries? The cross linking is not ruling the roost any more. Two days before brandy I recieved a request to join their 'relationship'. The email included references to servers on different class c ip blocks and listed the pages I would be getting links from, in order of pagerank.

markus007

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 7:13 am on Feb 18, 2004 (gmt 0)

thats good news, i'm tired to seeing URL's only indexed, instead of the page content.

zgb999

10+ Year Member



 
Msg#: 22051 posted 10:49 am on Feb 18, 2004 (gmt 0)

IMO this means that Google scanned about 5.3 Billion pages. Before they only had about 60 % in the index (3.3 Billion) and now they have about 80 % in the index (4.3 Billion).

258cib

10+ Year Member



 
Msg#: 22051 posted 2:27 pm on Feb 18, 2004 (gmt 0)

Here is the US version of an AP article with the same quote from Brin:


[customwire.ap.org...]

Even with its expanded reach, Google still isn't close to capturing the constantly expanding constellation of online content. By some estimates, there are 10 billion pages on the Web....

Google has made five significant changes to its algorithmic formulas in the last two weeks, Brin said.

Google has been regularly upgrading its search engine since its late 1998 debut with a Web index of 25 million pages, but the potential threats from Yahoo and Microsoft have added more urgency.

"We have decided to put even more energy into our improvements and have turned up the notch on innovation a bit," Brin said.


Jakpot

10+ Year Member



 
Msg#: 22051 posted 11:24 pm on Feb 18, 2004 (gmt 0)

And does it include all those pages in Google that consist only of a URL but no cached page? (and there are one heck of a lot of those)

75% of my pages show a URL but no cached page.
What's up?

sun818

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 5:42 am on Feb 19, 2004 (gmt 0)

> Personally, my favorite is the 880M images.
> The bump from ~400M plus freshening the data
> makes it much more useful.

I'm not too happy about it. I've had to upgrade my web hosting account twice since one of the requirements for Froogle inclusion is allowing the Google ImageBot. My bandwidth requirements have more than doubled.

Froogle has been good to me, but Google Images is sucking up hordes of unnecessary bandwidth.

sidyadav

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 9:34 am on Feb 19, 2004 (gmt 0)

75% of my pages show a URL but no cached page.
What's up?

Jakpot, check your pages to see if you didn't accidently/purposely put the "<META NAME="ROBOTS" CONTENT="NOARCHIVE">" tag on the top of your 75% pages.

If you didn't does you page cache show up in other search engines which have the cache feature like Gigablast?

Sid

Leosghost

WebmasterWorld Senior Member leosghost us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 22051 posted 11:06 am on Feb 19, 2004 (gmt 0)

Indexing over 4bn pages?
If what I see will be the "new" #1 and #2 sites in my category are anything to go by its also gone back to indexing and giving best slots to "cloaked" and "auto java-ed redirects" together with "look at all this semantically rich text right here in my "no-frames" ...'course the pages in question have all the framesizes set to "0" except "body" frame..."spider food" ..uh ..uh "spider fooled" ..oh yeah!
The rest of the page one results in question are spam directories.
However they do all run "google ads" ;-)
BTW ..maybe I missed something during years of higher education .but we were always told that some one who couldn't decribe what they were talking about without using more than one synonym was a "woolly thinker" or just using "purple prose" ...having read the L.S.I.paper cited elsewhere here .I still beleive that to be true .
When I search on google I know what I'm looking for ..I don't need or want google or some theoretician with a vested interest in a "sorting program" trying to second guess me as to what they think I really wanted to find ...not to mention that if this takes hold we're all gonna be writing 400kb index pages just to get all those synonyms into the new LSI "enhanced" google .

Jakpot

10+ Year Member



 
Msg#: 22051 posted 3:58 pm on Feb 19, 2004 (gmt 0)

Of the 4.28 billion "web pages" how many are urls only
with no cache?

sidyadav

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 6:11 pm on Feb 19, 2004 (gmt 0)

Of the 4.28 billion "web pages" how many are urls only
with no cache?

That totally depends on us webmasters. If we don't want Google to cache a webpage, and we enter the noarchive META tag, Google won't cache it.

If Google visits a page where theres no noarchive META tag, it will of course cache it.

Google will not purposely forget to cache a page. It all depends to us, whether or not we want our cache showing in Google

Sid

percentages

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 6:15 pm on Feb 19, 2004 (gmt 0)

The number of pages indexed is irrelevant for all but the very obsure search term.

Personally I believe ATW, Inktomi and Google had the most significant parts of the web indexed several months ago, now we are in to a meaningless battle of who has the biggest index for largely irrelevant reasons.

Quality is what counts.....not quantity. Right now Google has the quantity, but seriously lacks the quality!

MedCenter

10+ Year Member



 
Msg#: 22051 posted 6:18 pm on Feb 19, 2004 (gmt 0)

why do i see so many anti G posts? I'm seeing very relevant results here, and even if they dont favor me at times I have to admit they are doing a genius job.

or do you just not like them because you arent #1? lol

one good thing is they have eliminated all spam for my sector since the Brandy update. [thanks to LSI i think]

percentages

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 6:53 pm on Feb 19, 2004 (gmt 0)

MedCenter, Google sends me over 5,000 visitors per day......I don't like the Google results simple because they are often irrelevant.

I can only convert a relevant visitor, a visitor that found one of my site due to Google's irrelevance costs me money for no return.

I can live with the cost of bandwidth for irrelevant results, but I hate the fact that the Google algo fails to show the most relevant results when used.

A decent percentage of my traffic is now complete junk. Combine this with some of those looking for what I have to sell (who can't find me) and the situation becomes ridiculous!

Search engines are supposed to deliver targeted, relevant results......not junk traffic. Google is now delivering more junk traffic than good relevant traffic......so yes, I'm whining!

SyntheticUpper

10+ Year Member



 
Msg#: 22051 posted 7:15 pm on Feb 19, 2004 (gmt 0)

why do i see so many anti G posts? I'm seeing very relevant results here, and even if they dont favor me at times I have to admit they are doing a genius job. or do you just not like them because you arent #1? lol

No - if only it were so. What has happened, spacehopper, is that many sites that almost entirely disappeared from Google in November 2003, are now back to the top ranking they probably deserved.

Us folks are just trying to figure out whether we've been stuffed big-time, or whether something clever was happening :)

The simple argument is that: if we ranked well then, and then disappeared, and now rank well again, what has happened in the interim?

BTW: You will do very well on these boards if you continue to mildly criticise webmasters, and admire Google.

You'll be a moderator one day!

MedCenter

10+ Year Member



 
Msg#: 22051 posted 7:17 pm on Feb 19, 2004 (gmt 0)

I have to admit I'm getting lots of weird searchs too...

for example, all our products have a chemical description/keyword. we are coming up quite high for these chemical words.

we are still ranked high for the primary keywords but i can definately see the irrellevance kicking in.

i think they are still fiddling with the algorithms, im watching my position slide up almost one slot on the SERPS per day for the last week now.

i also think they are finding the balance between their new LSI technology which im sure they have, and their more traditional ranking methods.

i had a good laugh 2 days ago when i went to google.com and for a few minutes it was pulling from a rare datacenter. we were like #2 for a highly competitive keyword, but it was a very deep page on our site with no PR and that word only occurred once in the page (title).

im not saying that they dont have stuff to work through, i just still find it a lot better than other places.

im anxious to see what yahoo will look like once it completely drops google.

Leosghost

WebmasterWorld Senior Member leosghost us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 22051 posted 12:17 pm on Feb 20, 2004 (gmt 0)

it really is aquestion of relevance ..I lost out on anumber one to tricksites and went up on 6 more keyphrases over the top of sites which honestly are better put together than mine and have more of the content that was serched for ...So overall I won ...not counting the bandwidth issue ....But these new results are still in the main rubbish ( dont wanna be a moderator someday ;-)

alex_h

10+ Year Member



 
Msg#: 22051 posted 5:25 am on Feb 25, 2004 (gmt 0)

We're mad at Google because my wife and I are no longer able to find good stuff. We've been searching for a Mango Kiwi sorbet recipe for a while, when Mango Kiwi Sorbet Recipe used to return 8 recipes in the first page, now it returns junk...

Because Google cares so much about outbound links, I can find Pages featuring Mango Sorbet and Kiwi Smoothie recipes, but not a Mango Kiwi Sorbet recipe.

That's why I'm annoyed.

Note: apologies to any moderators for dropping a specific search phrase, but I really wanted to make sorbet tonight.

neverhood

10+ Year Member



 
Msg#: 22051 posted 5:25 am on Feb 25, 2004 (gmt 0)

Sorry for little offtopicing.
Therefore, I'm from Serbia & Montenegro (ex Yugoslavia), and I would also like to follow this great conference. My question is, does you plan to directly stream or upload recorded video file, that people around the world can feel part this happening.\

Thanks and best regards from Serbia.

Chndru

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 7:12 pm on Feb 27, 2004 (gmt 0)

[216.239.37.104...]

shows an old google page with "Searching 3,307,998,701 web pages"?

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 22051 posted 4:34 am on Feb 28, 2004 (gmt 0)

Google is not affiliated with the authors of this page nor responsible for its content.
;)
Chndru

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 22051 posted 12:26 am on Feb 29, 2004 (gmt 0)

lol, Powdork. wont they refresh atleast their cache?

This 60 message thread spans 2 pages: < < 60 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved