Forum Moderators: buckworks

Message Too Old, No Replies

Shopping Bot Development

need something kind of Froogle-like

         

directoryczar

3:16 pm on Aug 18, 2004 (gmt 0)

10+ Year Member



I don't want to spider the internet, but I do want to spider my link partners shopping carts to display their product info. I have too many customers to ask them for datafeeds. The website that would display this info is CF and SQL driven. I guess this amounts to integrating a shopping portal within the existing website.

#1 are there off-the-shelf scripts that can parse all the different shopping cart interfaces from a range of domains I specify and index them? If not off the shelf, are there any I could license, we are a narrow B2B industry that is not a competitor with B2C sites.

#2 if no, then should I be looking for a Verity developer or for something else?

I can't find any good resources for bot or portal development, mainly because G is LOADED with spam on these topics...

TallTroll

2:21 am on Aug 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Shooping bots are big projects, if they are tackled correctly. From the details you have given, you need a custom built system, IMO

You are bound top have some churn in suppliers, and / or supplier interfaces, which means constant attention and updates in any case. Working with the larger guys to provide you with a datafeed will benefit both of you

Airportibo

9:39 am on Aug 27, 2004 (gmt 0)

10+ Year Member



If you look at Froogle, you see that they distinguish between "confirmed" and "total" results. The reason is, that when you crawl a website for product information, it's impossible to make sure that you match the right product name with the corect price and the coresponding picture, unless you "personalize" the crawler to each website. I have seen tools, you can teach to crawl shopping sites properly, but you need to pass through a configuration process for each customer and if a shop decides to change its design, you have to do it all over again.
So in my opinion working with data feeds is the only way to do it right. Another possibility is to use full service providers like pangora.com
Best,
Airportibo

killroy

10:13 am on Aug 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've done something similar for a limited number of merchants. I created a tool set for the general parsing task, but then added a small sniupped of custom code for each merchant. In effect I created my own datafeeds with the least amount of effort. Now I can single-button update all feeds.

If this is viable for you, depends entirely on the number of merchants.

SN

Lord Majestic

10:20 am on Aug 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know I am picking at words here but AFAIK Froogle is not based on crawling non-structured HTML data but on accepting structured (XML?) feeds from participants in that program. If my understanding is correct then creating something similar is easy from programming point of view (just parse data in pre-agreed format and stuff it into database) and the hardest bit is to actually get people sign up for these feeds.

Airportibo

10:28 am on Aug 27, 2004 (gmt 0)

10+ Year Member



Correct me if I'm wrong, but I thought Froogle is doing both: crawling and dedicated feeds (tab delimited text files)

Lord Majestic

10:35 am on Aug 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I stand corrected - it appears that Froogle is analysing already crawled pages by Googlebot or even crawls on its own. This was certainly not an easy undertaking as data is so unstructured that they must have had one hell of a time developing generic algorithm to get this information (correctly!) from unstructured HTML pages.

directoryczar

1:41 pm on Aug 27, 2004 (gmt 0)

10+ Year Member



Thanks for the great advice and comments. OK, I'm backing off the idea of spidering all my link partners. We pioneered this category and a good chunk of the merchants host their shopping carts with us (of course as they get bigger and want more functionality, they grow toward more robust carts, which is FINE). We never wanted to be an ecommerce host it was just the best inducement we saw to move this market over to the web. The thought that sparked this thread was that if we could develop a method to spider shopping carts, we could discontinue our service but provide the same promotional benefit to our existing customers. So now I'm leaning toward vastly improving the existing interface of shopping search for the ones we do host to induce datafeeds and similar treatment for the carts we don't host.

Off topic

[edited by: DaveAtIFG at 1:57 pm (utc) on Aug. 27, 2004]