Forum Moderators: bakedjake

Message Too Old, No Replies

Gigablast spellchecker

         

heini

9:18 am on Nov 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe we haven't covered it yet: Gigablast has introduced a spellchecker [gigablast.com] feature for English language queries, seems to be working fairly well.

richardb

7:21 pm on Nov 19, 2003 (gmt 0)

10+ Year Member



Yeap

+

cache facility

"Gigablast uses its cached web pages to generate its dictionary instead of the query logs. When a word or phrase is not found in the the dictionary, Gigablast replaces it with the closest match in the dictionary. If multiple words or phrases are equally close, then Gigablast resorts to a popularity ranking."

And

additional file type indexing

"... PostScript (.ps) , PowerPoint (.ppt), Excel SpreadSheet (.xls) and Microsoft Word (.doc) support in addition to the PDF support. Woo- hoo."

Building up quite a list

Rich

jeremy goodrich

8:40 pm on Nov 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If there is any engine in the "second teir" that is gearing up for serious prime time, it's Gigablast :)

Nice new feature!

takagi

4:50 am on Nov 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree with that jeremy_goodrich.

I am planning on purchasing the hardware required for achieving a 5 billion document index within the next 12 months.
source: Matt Wells [gigablast.com]

March last year he wrote:

my current setup only goes to about 200-250 million
Message 39 in the thread GigaBlast Part 3 [webmasterworld.com]

brotherhood of LAN

10:54 am on Nov 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was reading a PDF the other night about typo detection/correction, ,struggling to find the URL but it was was very useful, will try to re-find it. ;)

It mapped each letter to its most likely typo error, and whether the typo was a substitution/deletion/insertion of letters. - i.e. some letters are used in mis-spelled words more often than others and they usually follow a pattern. Will have to re-look that one up.....

I suppose "cracking" a spelling algo is just as worthy as cracking an SE algo, ala typo domains, mis-spelled words etc......

wonders if reverse engineering a spelling algo is worth it...