Forum Moderators: open

Message Too Old, No Replies

New index seem to be at least 33% bigger

         

Allergic

1:40 pm on May 14, 2003 (gmt 0)

10+ Year Member



How to judge the weight of the index. A old trick was to search for the word "the". But with all the new filter, this time the new index (www2) is smaller : 3,280,000,000 vs 3,460,000,000 on www. But I got the idea to check on files that i think Google never apply filtering. Surprise :
PDF: 8,760,000 OLD 5,870,000
TXT: 3,200,000 OLD 2,510,000
DOC : 1,190,000 OLD 452,000
(E)PS: 1,174,000 OLD 734,000
XLS : 850,000 OLD 424,000
PPT : 692,000 OLD 414,000
RTF: 538,000 OLD 263,000

OLD=WWW

Seem to have a heavy filtering at Google now!

mrbrad

1:45 pm on May 14, 2003 (gmt 0)

10+ Year Member



No spam filters have been applied yet.
Im surprised Google would do this.

Lots of spam and expired domains are back in the index!

Allergic

3:12 pm on May 14, 2003 (gmt 0)

10+ Year Member



mrbrad : How to explain the fact of the smallest result of the "the" results vs the higher result in filetype: inurl: results?

GoogleGuy

3:18 pm on May 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are lots of different ways to measure the size/usefulness of an index. Nice job to Allergic for noticing something that most people usually don't. :)

Allergic

3:29 pm on May 14, 2003 (gmt 0)

10+ Year Member



Thanks GG ;-)
I have a little question :
Did you put a little filter on the database filetype (they are now having less results) or is it finally the webmasters who clean theirs server?

PS: Sorry for my bad english.

mcavic

3:51 pm on May 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



But, if I search for "html", I get 467,000,000 on the old (current) index, and 320,000,000 on the sj index.

A search for "http" also yields fewer pages in the sj index.

Allergic

4:07 pm on May 14, 2003 (gmt 0)

10+ Year Member



mcavic : It is exactly what I was pointing. The raw index going bigger and bigger and the final results going a bit smaller.

Google seem to deal with; more spam, expired domains, mirror sites, farm links, etc. Hope this will end a day and peoples start to give nice and original content!

mcavic

4:39 pm on May 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hope this will end a day and peoples start to give nice and original content!

Yes, I hope so. I'm not holding my breath, though. :(

EliteWeb

4:51 pm on May 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Truthfully I love finding PPT files and PDFs in the index, those I find of use and information fullfilled becuase they are usually constructed by companies for presentations and such, PPT that is :D and the pdf is just nice.

danHood

5:26 pm on May 14, 2003 (gmt 0)

10+ Year Member



I have a silly question, but I am confused.

Has Google just updated in the last few days?

I looked at Brett's table
[webmasterworld.com...]

but I am new to this site and wonder if there had been an update whether this table would be updated also, and how soon?

I am reading lots of talk about "sizes of the new index" and geographical scout reports of sitings of various google bots, but have we actually had an update - if so when was it? I am afraid much of this discussion is beyond me!

Many thank you's in advance.
And I aplogise if I posted this question in the wrong place.
DanHood

Allergic

5:51 pm on May 14, 2003 (gmt 0)

10+ Year Member



Not yet DanHood and welcome to WebmasterWorld. The www-sj.google.com have been transfert to www2 and www3 but it's still in a testing phase.

danHood

8:55 pm on May 14, 2003 (gmt 0)

10+ Year Member



Thanks Allergic. Now I am thirsty for more answers, hope you are not allergic to novices!

Question 1:
Do you have any idea when the testing phase will go into operation? I guess that is one of those -its anyone guess kinda questions, but it would be really helpful to me if I knew if it was days or weeks away.

Also Question 2:
I wonder, does having a link to the w3 html 4.0 validation have an impact on google ranking?
(the pages are HTML 4.0 compliant).

Once again many thanks for anyone who can spare the time to shed some more light on this fascinating subject.
DanHood

Allergic

10:08 pm on May 14, 2003 (gmt 0)

10+ Year Member



danHood
Q1 - Only God and few peoples at GooglePlex know this one. But it should be soon (I saw one referal of Google in log file this afternoon on a brand new site not list yet).

Q2 - I don't think so. And specially for a site of that type it should help at the place to harm.