Welcome to WebmasterWorld Guest from 54.226.25.231

Forum Moderators: open

Message Too Old, No Replies

ATW Begins Roll Out of PDF Crawl and Access

     

rubble88

2:18 am on May 17, 2002 (gmt 0)

10+ Year Member



The Virtual Acquisition Shelf and News Desk Weblog is reporting that ATW has begun to crawl and provide access to .pdf material.

[resourceshelf.freepint.com...] (5/16/02 Posting)

Rumbas

7:23 pm on May 19, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



You are absolutely right rubble88!
We knew it would come one of these days and what a pleasure that it finally did. I can't find anything on ATW or FAST' sites, so, from Freepint:

Here’s a search for the terms librarian AND database [alltheweb.com] that was constructed using the advanced search page and filtering the term .pdf in the url. You can also limit by using the syntax, url.all:pdf , in the any search box.

Nice move Fast. Let's have more of that :)

lazerzubb

12:26 pm on May 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They have done it for quite some while if i am not wrong, it's just that they used scirus for viewing it.
I talked to Stephen Baker from FAST for some weeks ago, and he said that they would start to include the spidered .PDF files in the regular search results too.
AllTheWeb will indexing more formats in Q3 and Q4, for Q2 they focus on building a bigger index.

Winooski

5:13 pm on May 20, 2002 (gmt 0)

10+ Year Member



I wonder whether FAST is able to spider PDFs, or is it a submissions-only type of thing? I tried locating some "random" PDF files that were in Google's results, and they didn't show in FAST (using the "must have .PDF in the URL" setting).

lazerzubb

5:16 pm on May 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



FAST can spider .pdf files, they can spiders most files, they just don't include them in there index.
If many people who used Lycos wanted to be able to search word documents, and Lycos would like FAST to include word documents in there Index, they would probably include them pretty quickly.
It's all about what the customer wants.

Jaze

4:52 am on May 22, 2002 (gmt 0)

10+ Year Member



I think I can add to another to the list of improvement to FAST's index, it seems that AllTheWeb is now capable of recognising that two URL's point to the same site. For some time we had the same site listed under two different URL's (.co.nz and .com) but this seems to have been resolved. (maybe it's not new?)

It seems this happened in the last month or two (Mar 2002)?

(edited by: Jaze at 5:10 am (utc) on May 22, 2002)

EliteWeb

4:54 am on May 22, 2002 (gmt 0)

WebmasterWorld Senior Member eliteweb is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Nice to see another engine adding PDF support. Does it have the option to view as HTML though? Sometimes I'd rather have that than loading a huge program and crashing my computer (poor thing)

Rumbas

9:26 am on May 22, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Jaze, I've seen similar glitches where ATW showed both a .com and a .com.br when checking with url.host:

Seem to have been fixed to some extent now.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month