Welcome to WebmasterWorld Guest from 54.166.46.226

Forum Moderators: open

Message Too Old, No Replies

Searching 4,285,199,774 web pages

   
12:16 pm on Feb 17, 2004 (gmt 0)

10+ Year Member



Have you noticed that on google.com?

"Searching 4,285,199,774 web pages"

1:53 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Keep in mind Dealtime and Kelkoo account for about a bazillion of these!
2:23 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There was an interesting quote from Sergey Brin in the Sydney Morning Herald that I didn't notice in any other report:

[smh.com.au...]

"Google has made five significant changes to its algorithmic formulas in the last two weeks, Brin said."

2:38 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Still the biggest & the best...size does matter, and logs too.

Happy Billion Day, Cheers.

3:31 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Personally, my favorite is the 880M images. The bump from ~400M plus freshening the data makes it much more useful.

You wouldn't say that if it was your site getting a lot of useless traffic because it happens to have some popular search term pics on it.... :-)

The jpg's on our site suck back a lot more bandwidth the the text. Anyway, I shan't ban the imagebot... we still have our necks above water bandwidth-wise. Some of them click on through.

3:57 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try this query [google.com]. Gives you a total for 5 billion results.

Sid

3:58 am on Feb 18, 2004 (gmt 0)

10+ Year Member



This really affected my ranking. My site when down over 880,000 spots to 4,285,199,774th ;)
4:04 am on Feb 18, 2004 (gmt 0)

10+ Year Member



Anyone have an estimative of how many pages are public available at Internet?

Maybe google is indexing just 10% of all Internet pages or less , I have no idea.

5:58 am on Feb 18, 2004 (gmt 0)

10+ Year Member



IITian, amazon has 4 million pages on Google, Yahoo has 15 million... (my site has twice that, but yeah)
6:37 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Krapulator, from that article you posted:

As Google covers more online turf, it is also digging deeper into Web pages. Roughly 40 per cent of the Web pages scanned by Google weren't fully indexed until the latest improvements, Brin said. Now all but about 20 per cent of the Web pages that Google covers are fully indexed.
(emphasis added)

Now, is it common in Australian media to use the term "Web pages" meaning "web sites" or are "pages" to be understood as individual pages?

If it's not common in Australian media, then one might wonder if it's common inside Google in stead? It's one of two:

(1) the term is used meaning "sites" in which case Google is now indexing deeper, ie. more pages from same site

(2) the term is used meaning "pages" in which case Google is indexing wider, ie. more code than previously is now indexed (eg. invisible page elements, such as scripts, css, and markup tags)

(sidenote: if it's (1) then the 4 bio. figure is much larger when translated to page count)

7:03 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member



perhaps the emphasis should have been only on fully indexed. Perhaps part of the recent 'anomalies' are the result of a page being only partially indexed. Perhaps previously 40% of the pages were only partially indexed and now that is down to 20%. Perhaps by partially indexed that implies that the page (or site) could have a score but be missing a local score (to speak in very general terms), or vice versa.
I personally don't believe this, but it shows how things can change just by moving the [b] over a few words.
Claus, you seen my queries? The cross linking is not ruling the roost any more. Two days before brandy I recieved a request to join their 'relationship'. The email included references to servers on different class c ip blocks and listed the pages I would be getting links from, in order of pagerank.
7:13 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



thats good news, i'm tired to seeing URL's only indexed, instead of the page content.
10:49 am on Feb 18, 2004 (gmt 0)

10+ Year Member



IMO this means that Google scanned about 5.3 Billion pages. Before they only had about 60 % in the index (3.3 Billion) and now they have about 80 % in the index (4.3 Billion).
2:27 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



Here is the US version of an AP article with the same quote from Brin:


[customwire.ap.org...]

Even with its expanded reach, Google still isn't close to capturing the constantly expanding constellation of online content. By some estimates, there are 10 billion pages on the Web....

Google has made five significant changes to its algorithmic formulas in the last two weeks, Brin said.

Google has been regularly upgrading its search engine since its late 1998 debut with a Web index of 25 million pages, but the potential threats from Yahoo and Microsoft have added more urgency.

"We have decided to put even more energy into our improvements and have turned up the notch on innovation a bit," Brin said.

11:24 pm on Feb 18, 2004 (gmt 0)

10+ Year Member



And does it include all those pages in Google that consist only of a URL but no cached page? (and there are one heck of a lot of those)

75% of my pages show a URL but no cached page.
What's up?

5:42 am on Feb 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Personally, my favorite is the 880M images.
> The bump from ~400M plus freshening the data
> makes it much more useful.

I'm not too happy about it. I've had to upgrade my web hosting account twice since one of the requirements for Froogle inclusion is allowing the Google ImageBot. My bandwidth requirements have more than doubled.

Froogle has been good to me, but Google Images is sucking up hordes of unnecessary bandwidth.

9:34 am on Feb 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



75% of my pages show a URL but no cached page.
What's up?

Jakpot, check your pages to see if you didn't accidently/purposely put the "<META NAME="ROBOTS" CONTENT="NOARCHIVE">" tag on the top of your 75% pages.

If you didn't does you page cache show up in other search engines which have the cache feature like Gigablast?

Sid

11:06 am on Feb 19, 2004 (gmt 0)

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Indexing over 4bn pages?
If what I see will be the "new" #1 and #2 sites in my category are anything to go by its also gone back to indexing and giving best slots to "cloaked" and "auto java-ed redirects" together with "look at all this semantically rich text right here in my "no-frames" ...'course the pages in question have all the framesizes set to "0" except "body" frame..."spider food" ..uh ..uh "spider fooled" ..oh yeah!
The rest of the page one results in question are spam directories.
However they do all run "google ads" ;-)
BTW ..maybe I missed something during years of higher education .but we were always told that some one who couldn't decribe what they were talking about without using more than one synonym was a "woolly thinker" or just using "purple prose" ...having read the L.S.I.paper cited elsewhere here .I still beleive that to be true .
When I search on google I know what I'm looking for ..I don't need or want google or some theoretician with a vested interest in a "sorting program" trying to second guess me as to what they think I really wanted to find ...not to mention that if this takes hold we're all gonna be writing 400kb index pages just to get all those synonyms into the new LSI "enhanced" google .
3:58 pm on Feb 19, 2004 (gmt 0)

10+ Year Member



Of the 4.28 billion "web pages" how many are urls only
with no cache?
6:11 pm on Feb 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Of the 4.28 billion "web pages" how many are urls only
with no cache?

That totally depends on us webmasters. If we don't want Google to cache a webpage, and we enter the noarchive META tag, Google won't cache it.

If Google visits a page where theres no noarchive META tag, it will of course cache it.

Google will not purposely forget to cache a page. It all depends to us, whether or not we want our cache showing in Google

Sid

6:15 pm on Feb 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The number of pages indexed is irrelevant for all but the very obsure search term.

Personally I believe ATW, Inktomi and Google had the most significant parts of the web indexed several months ago, now we are in to a meaningless battle of who has the biggest index for largely irrelevant reasons.

Quality is what counts.....not quantity. Right now Google has the quantity, but seriously lacks the quality!

6:18 pm on Feb 19, 2004 (gmt 0)

10+ Year Member



why do i see so many anti G posts? I'm seeing very relevant results here, and even if they dont favor me at times I have to admit they are doing a genius job.

or do you just not like them because you arent #1? lol

one good thing is they have eliminated all spam for my sector since the Brandy update. [thanks to LSI i think]

6:53 pm on Feb 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



MedCenter, Google sends me over 5,000 visitors per day......I don't like the Google results simple because they are often irrelevant.

I can only convert a relevant visitor, a visitor that found one of my site due to Google's irrelevance costs me money for no return.

I can live with the cost of bandwidth for irrelevant results, but I hate the fact that the Google algo fails to show the most relevant results when used.

A decent percentage of my traffic is now complete junk. Combine this with some of those looking for what I have to sell (who can't find me) and the situation becomes ridiculous!

Search engines are supposed to deliver targeted, relevant results......not junk traffic. Google is now delivering more junk traffic than good relevant traffic......so yes, I'm whining!

7:15 pm on Feb 19, 2004 (gmt 0)

10+ Year Member



why do i see so many anti G posts? I'm seeing very relevant results here, and even if they dont favor me at times I have to admit they are doing a genius job. or do you just not like them because you arent #1? lol

No - if only it were so. What has happened, spacehopper, is that many sites that almost entirely disappeared from Google in November 2003, are now back to the top ranking they probably deserved.

Us folks are just trying to figure out whether we've been stuffed big-time, or whether something clever was happening :)

The simple argument is that: if we ranked well then, and then disappeared, and now rank well again, what has happened in the interim?

BTW: You will do very well on these boards if you continue to mildly criticise webmasters, and admire Google.

You'll be a moderator one day!

7:17 pm on Feb 19, 2004 (gmt 0)

10+ Year Member



I have to admit I'm getting lots of weird searchs too...

for example, all our products have a chemical description/keyword. we are coming up quite high for these chemical words.

we are still ranked high for the primary keywords but i can definately see the irrellevance kicking in.

i think they are still fiddling with the algorithms, im watching my position slide up almost one slot on the SERPS per day for the last week now.

i also think they are finding the balance between their new LSI technology which im sure they have, and their more traditional ranking methods.

i had a good laugh 2 days ago when i went to google.com and for a few minutes it was pulling from a rare datacenter. we were like #2 for a highly competitive keyword, but it was a very deep page on our site with no PR and that word only occurred once in the page (title).

im not saying that they dont have stuff to work through, i just still find it a lot better than other places.

im anxious to see what yahoo will look like once it completely drops google.

12:17 pm on Feb 20, 2004 (gmt 0)

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



it really is aquestion of relevance ..I lost out on anumber one to tricksites and went up on 6 more keyphrases over the top of sites which honestly are better put together than mine and have more of the content that was serched for ...So overall I won ...not counting the bandwidth issue ....But these new results are still in the main rubbish ( dont wanna be a moderator someday ;-)
5:25 am on Feb 25, 2004 (gmt 0)

10+ Year Member



We're mad at Google because my wife and I are no longer able to find good stuff. We've been searching for a Mango Kiwi sorbet recipe for a while, when Mango Kiwi Sorbet Recipe used to return 8 recipes in the first page, now it returns junk...

Because Google cares so much about outbound links, I can find Pages featuring Mango Sorbet and Kiwi Smoothie recipes, but not a Mango Kiwi Sorbet recipe.

That's why I'm annoyed.

Note: apologies to any moderators for dropping a specific search phrase, but I really wanted to make sorbet tonight.

5:25 am on Feb 25, 2004 (gmt 0)

10+ Year Member



Sorry for little offtopicing.
Therefore, I'm from Serbia & Montenegro (ex Yugoslavia), and I would also like to follow this great conference. My question is, does you plan to directly stream or upload recorded video file, that people around the world can feel part this happening.\

Thanks and best regards from Serbia.

7:12 pm on Feb 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[216.239.37.104...]

shows an old google page with "Searching 3,307,998,701 web pages"?

4:34 am on Feb 28, 2004 (gmt 0)

WebmasterWorld Senior Member powdork is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Google is not affiliated with the authors of this page nor responsible for its content.
;)
12:26 am on Feb 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



lol, Powdork. wont they refresh atleast their cache?
This 60 message thread spans 2 pages: 60