Forum Moderators: open

Message Too Old, No Replies

Testing spiders for a price-comparision site in ASP, but too slow!

Viewer has to wait up to 2 min to get the results. How do I optimize?

         

ziggystardust

7:54 pm on Mar 23, 2003 (gmt 0)

10+ Year Member



I'm planning to develop a price-comparision site, using a http-parsing component with ASP to obtain the prices from different ecommerce sites.

So far, I've written a couple of test-spiders, which work great (results show up all fine), but they take too long a time to execute.

Now, sure, if there was just 1 or 2 spiders, a 5 sec waiting time would be okay, but figuring there could be as many as 20-30, I need to cut it somehow.

Is there a way to optimize them or at least run them all at the same time? Should I switch from ASP to another programming language?

//ZS

jpjones

8:18 pm on Mar 23, 2003 (gmt 0)

10+ Year Member



The delay most likely comes from the time it takes to make the web request.

What is actually happening is that:
1) You request the page from your server.

2) This in turn makes several requests to various other web pages, presumbly sequentially.

3) Your server then processes and combines the fetched pages into a single web page.

4) The final page is then returned to your browser.

I would say that its not the fact that ASP is slow (it isn't that slow). The delay comes from stage 2 - accessing several other pages which always takes at least a second. The fact you are accessing several sites multiplies the time factor.

What I would suggest you do is cache the data you retrieve, and only update this cache once per hour, for instance. Visitors to your site will only experience a delay the first time the data is retrieved. After that, the page should be delivered almost instantaneously.

HTH,
JP

ziggystardust

9:00 pm on Mar 23, 2003 (gmt 0)

10+ Year Member



Ah, thanks, but it'll still be very slow with displaying the uncached results. I mean, if nobody waits 2 min, no results will get cached :) There's no way of speeding that process up?

Still, I'll start with the caching. Thanks again.

RichD

9:29 pm on Mar 23, 2003 (gmt 0)

10+ Year Member



You could also consider looking at ways to fetch more than one page at a time. If the bootle neck is the speed of the suppliers sites, rather than available bandwidth, this could mean you could get data from 30 sites in the same 5 seconds space rather taking 5 seconds each.

I'm currently building a comparison site, but most of the data is provided by the suppliers as CSV files or I use a crawler that gets all the prices each night. If the prices in the market you are targetting don't change during the day then this may be a solution to improve access times.

jpjones

10:15 pm on Mar 23, 2003 (gmt 0)

10+ Year Member



Ah, thanks, but it'll still be very slow with displaying the uncached results. I mean, if nobody waits 2 min, no results will get cached :)

Well what I would do is:

1) Hold results in cache
2) Everytime the page is accessed,
a) display the results from cache.
b) check to see if the cache is still "current", e.g. last cache update < 60 minutes.
c) If not, then update cache time to current time.
d) Run procedure to fetch results from external sites and update cache.

Doing it like the above would mean that subsequent accesses during the 2 minutes update time would deliver the old results, but would not fire off the fetch-results-update cache procedure again. Then once the procedure has finished getting the results, the cache contents will be updated. Just remember to set the script timeout setting to a high value!

JP

ziggystardust

11:05 am on Mar 24, 2003 (gmt 0)

10+ Year Member



RichD, what kind of options do I have on fetching more pages at the same time? Can I do that in ASP? I'll definetly go for the caching, but I think fetching more pages will speed the searches more (since a lot of them will be "new searches", ie. not previously cached)

Thanks for the help tho, this got me thinking :)

//ZS

brotherhood of LAN

11:22 am on Mar 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure if anyone here works with CURL alongside ASP but you might want to have a scan of it...

It's made for this sort of thing. You could probably use a counter to keep track of how frequent you access certain pages, and if that count passes a threshold, you tell CURL to save it into a file, or maybe put it into a database.

One handy thing about it is that you can tell it to retrive X to Y bytes from a page, so say if you wanted the footer of this page, you would tell it to grab the last 1000 bytes or whatever.

Might speed up the fetching processs a little, it's a good tool to have anyway :)

RichD

11:24 am on Mar 24, 2003 (gmt 0)

10+ Year Member



I'm still working on ways to get multiple pages it once! I use PHP which, similar to ASP, tends to do one thing after another rather than doing them at the same time.

As a start, think about calling one script which then calls individual scripts for each site. Then have the first script monitor these child scripts to see when they complete and then return the results.

There may (hopefully) be someone else on the board that can give more ideas.

jpjones

11:38 am on Mar 24, 2003 (gmt 0)

10+ Year Member



ASP running Parallel processes?

As far as I'm aware it's impossible, though I'd love for someone to tell me I'm wrong!

ASP runs things only sequentially. It executes line by line, awaiting the previous "command" to finish before starting the next one.

The only thing that might side-step this is an extra server component which allows multi-threading calls. I've not seen one yet though :( .NET supports multithreading as standard from what I can gather.

The alternative (if you are running your own server, or your hosting company is willing to install things for you) is to roll your own component which should be able to work multi-threaded.

JP

aspdaddy

12:14 pm on Mar 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you need 2 separate services here, one to crawl suplliers updating a database and one end-user service to display results to searchers from a database.

You could also use a COM server that updates an XML file, and get your client app to parse the XML

RichD

1:49 pm on Mar 24, 2003 (gmt 0)

10+ Year Member



This was what I was thinking of: -

First off you request the central script which then calls the sub scripts by doing a HTTP request as if it was a client. This is where the parrallel part comes in, at the webserver level rather than the script level. With PHP you could use fopen() for this, which I believe will return a file pointer as soon as the GET request is sent. I've not used ASP, but there should be a similar way of requesting a URL as a filestream.

Now you've sent requests to each child, which in turn have sent requests to the sub sites, you can get the output of each request in turn with a smaller delay in between them.

This is just theory, I haven't tried it out myself, but it sounds like it should be a way of absorbing the delay while the remote site sends data.

ziggystardust

11:37 am on Mar 26, 2003 (gmt 0)

10+ Year Member



Been looking real hard for a multithreading component during the last days, but without finding anything concrete. PowerTCP's got one, but I can't get it to run multiple threads in ASP.

I'll look into ASP.NET, maybe I'll go ahead and learn that one. Thanks for the answers guys.

//ZS