Forum Moderators: open

Message Too Old, No Replies

Fast and the 2/3rds of servers are blocked quote

         

mr_dredd2

9:45 pm on Apr 6, 2002 (gmt 0)

10+ Year Member



any one brent winters recent newsletter - in it he quotes from an interview with FAST, in which they say the majority of the webs servers have been blacklisted as spam - possibly a lot of these drop outs are some kind of spam filter??

Brett_Tabke

11:40 pm on Apr 6, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Someone got their interview stats/info mixed up. 2/3rds of the webservers are NOT banned by Fast.

Brett_Tabke

4:19 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Well, I asked for a clarification, and I sit corrected. The quote is accurate. (humble apologies)

2/3rds of the servers? It's so absurdly high, I just couldn't believe it when I read it. I still don't really.

If you calculated it out based on an estimated 3 billion indexable open documents on the web and 23 million servers (according to other se's), that would put Fast's index at about 400 to 450 million pages absolute maximum.

heini

5:12 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Astounding quote indeed. I didn't believe it when I first heard it either.
How does this fit in with the long announced 2 Bill. index?

chiyo

5:38 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can believe it. The fact is we never see most of these sites because they are not indexed or positioned highly anyway. Of course a lot depends on your definition of spam.

If you (not me!) include purely affiliate sites with little added value, link farm linked sites, auto redirections, near duplicates, third level domain duplicates, as well as the many sites that are obvious spam, you may get fairly close.

DrCool

6:01 pm on Apr 8, 2002 (gmt 0)

10+ Year Member



I would guess the 2 Bill. index that they talk about could just mean they have found 2 bill. pages and have "indexed" them and just happened to "index" 2/3rds of them into the trash. If I have a stack of 2 billion note cards for a research paper and decide to index them who is to say I can just throw most of them away and only keep the ones I find useful. I would have "indexed" them but only used a few of them.

lazerzubb

6:11 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Stephen Baker at FAST says.

There are no banned servers. In our experience, we have seen spam on 2/3's
of all web servers, however, we do not blacklist 2/3's of all servers. It
is just an interesting statistic but doesn't really have any consequences.

I think this clear things up, (I recieved this 5 minutes ago in my mailbox)

Brett maybe another "Humble apologies" :)

EliteWeb

6:18 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I couldn't believe it, that they would ban these 2/3 of the sites since so much of the results are based on spamming techniques ;)

heini

8:18 pm on Apr 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If I Were A Search Engine....

being the target of more or less sophisticated spamming I had basically two choices
- build better algo
- try to scare people to death

Building a better algo is the more difficult solution. People tend to go the easy way.
I'd perhaps make a statement somewhere saying many many servers were completely blacklisted and banned, and then, when the message is out, make another statement saying, oh well, it's just stats at this time...

It's not just us web promoters who are experts in PR - search engines are in the same industry.

Seeing - not hearing - is believing.

stephen baker

2:26 pm on Apr 10, 2002 (gmt 0)



Hi all, this is Stephen from FAST. Lazerzubb/Heini - Thanks for clarifying the quote. That is accurate...we have found spam on 2/3's of the servers that we crawl, however, FASTidentifies spam at the lowest common denominator (i.e. the URL) and only in extreme circumstances do we block larger groups of sites.

To provide some more clarification regarding the 2BB index size. Many of you have noticed that we have been crawling aggressively over the past 3 weeks. We have been adding about 100MM URL's to the index (high quality URL's, no dupes, full-text index). We will continue to aggressively expand the size of the database however, if we determine that there are not 2BB interesting Web documents on the web, we will not sacrifice quality for quantity.

I hope this clarifies.

lazerzubb

2:32 pm on Apr 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the information Stephen.

heini

2:34 pm on Apr 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello Stephen - thanks for reopening the line of communication.

Sooo - we shall see what happens, right? ;)