You are absolutely right rubble88!
We knew it would come one of these days and what a pleasure that it finally did. I can't find anything on ATW or FAST' sites, so, from Freepint:
|Here’s a search for the terms librarian AND database [alltheweb.com] that was constructed using the advanced search page and filtering the term .pdf in the url. You can also limit by using the syntax, url.all:pdf , in the any search box. |
Nice move Fast. Let's have more of that :)
They have done it for quite some while if i am not wrong, it's just that they used scirus for viewing it.
I talked to Stephen Baker from FAST for some weeks ago, and he said that they would start to include the spidered .PDF files in the regular search results too.
AllTheWeb will indexing more formats in Q3 and Q4, for Q2 they focus on building a bigger index.
I wonder whether FAST is able to spider PDFs, or is it a submissions-only type of thing? I tried locating some "random" PDF files that were in Google's results, and they didn't show in FAST (using the "must have .PDF in the URL" setting).
FAST can spider .pdf files, they can spiders most files, they just don't include them in there index.
If many people who used Lycos wanted to be able to search word documents, and Lycos would like FAST to include word documents in there Index, they would probably include them pretty quickly.
It's all about what the customer wants.
I think I can add to another to the list of improvement to FAST's index, it seems that AllTheWeb is now capable of recognising that two URL's point to the same site. For some time we had the same site listed under two different URL's (.co.nz and .com) but this seems to have been resolved. (maybe it's not new?)
It seems this happened in the last month or two (Mar 2002)?
(edited by: Jaze at 5:10 am (utc) on May 22, 2002)
Nice to see another engine adding PDF support. Does it have the option to view as HTML though? Sometimes I'd rather have that than loading a huge program and crashing my computer (poor thing)
Jaze, I've seen similar glitches where ATW showed both a .com and a .com.br when checking with url.host:
Seem to have been fixed to some extent now.