Welcome to WebmasterWorld Guest from 23.22.250.113

Forum Moderators: open

Message Too Old, No Replies

FAST-WebCrawler ... why are you so dumb?

Maybe that's harsh ... but this is pretty weird.

     

ArtSEPI

5:31 pm on Jul 19, 2001 (gmt 0)

10+ Year Member



I have a site set up in the following way, which should be pretty standard:

www.domain.com/index.html
www.domain.com/products/index.html
www.domain.com/products/typeofproduct/index.html
www.domain.com/products/typeofproduct/specificproduct/index.html

The site is themed so there is cross-linking across levels and linking down the tree to pages below that are related. I thought this would be great for spiders like FAST but there's a big problem right now. FAST is coming through and spidering my pages that have links like:
<a href="/products/typeofproduct/specificproduct/">Specific Product</a>
But instead of following the link he requests:
/products/specificproduct/
Why GOD??? Is anyone else seeing this. He's relentless in doing this even though none of the links are relative in terms of using ../blah or ./blah ... all are served up relative to the root directory with their full path (but not [domain.com...] I guess my beautiful pages won't be listed in FASTs index on the next update ... but what can I do to avoid this happening in the future?

Brett_Tabke

6:26 pm on Jul 19, 2001 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Hmm. I've not seen that. I have only seen that on a page where I had some other html errors.

roscoepico

6:34 pm on Jul 19, 2001 (gmt 0)

10+ Year Member



[neartexpress.com...]

Is the type of link you're refering to from your site in the profile? If so, I am a little confused as to what the problem is. Can you exlain a little further?

ArtSEPI

7:13 pm on Jul 19, 2001 (gmt 0)

10+ Year Member



Well, FAST seems to be crawling those sorts of links OK (i.e. the artist pages). However, when it comes to the pages below those such as:
[neartexpress.com ]
[which as you can see is linked right off of that page .. no funny stuff]
FAST will not crawl them but instead requests /fine_art/time_well_spent/ which gives a 404 because there is no such page. I have no idea how WebCrawler gets the idea in it's head to do that .. but maybe there are HTML errors that I didn't notice as Brett suggests (thanks for the hint)

ArtSEPI

9:09 pm on Jul 19, 2001 (gmt 0)

10+ Year Member



FAST has been crawling today and hitting a few pages. Now it seems to be hitting some of the pages I discussed before OK! Hopefully all will be well :)

(BTW, no offense meant in the title FAST, it's probably my fault!)

 

Featured Threads

Hot Threads This Week

Hot Threads This Month