Forum Moderators: bakedjake
In particular, crawlers are stymied by dynamic Web pages, which are customized as users choose various options, such as car color at Cars.com.To counter that, Chicago-based Dipsie Inc. is developing software that promises to fill out Cars.com's simple online forms, which are based on multiple choice, though not the complex ones for the government's patent and trademark databases, which require typing in keywords. A public test version is expected by summer.
Mercury News [mercurynews.com]
Even if a smaller start-up doesn't send us lits and loit of traffic, it will be fun to see small innovators do their thing. I'll be we're FAR from maturity in the search space. I'll bet we have lots of surprises ahead of us. At least I hope so, because the current search landscape feels pretty dull to me. I mean, I didn't get involved with the web to spend my time planning out link tricks, you know? That's not what really turns me on.
Anyway, earlier press for Dipsie talked about launching with 10 billion pages - if this is their approach to finding deep content, it will be interesting to see if that makes a big difference or if it's basically a yawn.
Be interesting to see if a press release has been posted anywhere.
Tick Tick Tick....
It does look quite nice though.
Sid
The same kind of thing can be achieved by just reading the O'Reilly Spidering Hacks book. Customising a spider so that it analyses a form and then fills out all possibilities and submits them is trivial. However the damage that it can do to the website being spidered is considerable and could, in some cases result in something approaching a Denial Of Service attack. Of course most data on websites with form based searching is constantly updating. This means that the site has to be respidered frequently. Doing it wrong can mean an immediate ban.
The Dipsie quote is interesting because what it is describing already exists. It exists in the form of shopping comparison websites. It is nothing new or revolutionary. As for the claim of ten billion pages spidered - just where is Dipsie's spider? Has anyone ever seen it? Perhaps Dipsie is buying in data or will end up as just another Overture or Espotting SERPswamp.
The majority of the web is largely static, changing on a yearly basis. As such it can be spidered aperiodically since it is not being regularly updated. The key to preparing a good search engine index is in identifying the chronological types of sites being spidered before wasting time on spidering. Perhaps working on writing spiders and building search engines has made me a bit too cynical :) , but I think that Dipsie has nothing new and certainly nothing that GYM [1] could not squash.
Regards...jmcc
[1] The Google Yahoo Microsoft troika.
I mean would one be able to send it out to say just one engine?
It would then scan all sites 10'000'000 etc then return the results to dipsie, without us knowing about it.
Maybe that's why we haven't seen Dipsie's spider in our stats logs.
A crawler has no way of knowing what a form is used for and what the result may be of posting the form.
I don't want them filling in my forms, as this would result in entry's in the database and who knows what.
Problems could be huge.
They expect to find more content when submitting forms?
I good constructed website will allow crawlers to reach all content via normal links. So this would not be neccesary.
These Spiders/crawlers etc. How controllable are they exactly? I mean would one be able to send it out to say just one engine?
I heard about one airlines fare comparison site that was banned from an airline site for putting a high load on that site's servers. The data on the type of site is time sensitive and thus has to be frequently respidered.
It would then scan all sites 10'000'000 etc then return the results to dipsie, without us knowing about it. Maybe that's why we haven't seen Dipsie's spider in our stats logs.I'd be very surprised if anyone has seen Dipsie's spider. It has all the appearance of vaporware - loads of buzzwords, piles of public relations/press releases and no results. :)
Regards...jmcc
It would then scan all sites 10'000'000 etc then return the results to dipsie, without us knowing about it.
It promises to provide a HUGE index of 11 billion pages when it launches - and guess when its launching date is set to? 2004, which if, I'm not wrong, is this year. So for sure, if a search engine is promising 11 billion webpages its gotta be seen somewhere!
Sid
If you're going to set up a scam, there are easier, less expensive ways of doing it.
But I hear ya.
It don't sound to clever to me. I reckon it's genuine, just they messed up on promises maybe? - probably due to delivery date pressures or something.
I think it's a scam - but more complex than would first appear, if it were as basic a scam as you suggest, people wouldn't fall for it, and it wouldn't even get started - therefore why attempt a non - starter if it's doomed for failure?
It's just like the rest, idea being to make a name for themselves, get as big as the net public will allow, then hope to be bought out by someone.
That's why most of the engines are created. You can usually tell if a new one will be successful, by it's actual design work and services.
Most of these multiple engine searchers will fail - cos there's nothing special about them.
One of the interviews with the main mover in Dipsie was so full of venture capital guff and search buzzwords that I nearly spilled my coffee from laughing. To anyone who does not work with search engines or indexing large datasets over networks for a living, it was very convincing. When you read between the lines and then start thinking of the bandwidth, storage and processing requirements for doing what Dipsie is supposed to be doing, it does not make sense.
Perhaps Dipsie will appear but as an other recycler of Overture or some other search fodder provider. It may even overlay its own search algorithms over these results to provide a more streamlined approach. Using another engine's data would be more efficient. But would this mean that Dipsie is just another meta search engine rather than a genuine player?
Regards...jmcc
[edited by: jmccormac at 9:50 am (utc) on April 8, 2004]
I'm occassionally in contact with one of Dipsie's rep, they said they're gonna launch an SEO service next week - I have to ask them if Dipsie is a scam or not..
Regards...jmcc
A Search Engine Optimisation service Sidyadav?
Does this mean that they are getting out of the search engine business or just getting into it in a completely different way to actually running a search engine? ;)
Sid
Reps don't own or hold managerial posts in companies - otherwise they would be directors and not reps!
So I doubt very much that what you've been told by some rep bloke, is the whole truth anyway.
And that's based on common sense.
People are way too quick to believe others on the web.
Someone sets themselves up as an expert - and people bow down to these charlatans.
Nothing more than enthusiastic speculators I'm afraid.
I'll judge this engine based on what I actually see with my own eyes, and not on secondary info, from some sales bloke that was passed on to someone that I don't even know from adam.
Sorry, but he and you could be saying anything - really.
So why would Dipsie tell it's sales reps about the launch - then NOT post this launch date on their site or place press releases out to all media?
Na- it just don't work like that. This info you've been getting just sounds strange and without procedure or authority.
But if you are right, it should be interesting to see it in action - next week you say?
(another fishy joke)
Will it sink or float?
I don't know about anyone else, but I'm confused already!
Yikes.
From years of reading press releases recycled as news (in what masquerades as the "technology press"), let's run the CNET article through the cynical editor's gobsh1te filter:
'competing consumer search engine'- it searches for consumers to click on its PPC listings.
'uses natural language algorithms to assess the content of a Web page and render a slew of synonyms and antonyms likely to crop up in a Web search'- Somebody browsing the page with dictionary.com and thesaurus.com open in other browser tabs.
It then feeds that page to search engines to help the site's position in results- Whoa I thought this software was meant to improve the ranking of web pages in Google et al not submit them. I guess improving the PR from not indexed to 1 counts as an improvement.
"It uses our crawling technologies to get past barriers that have been around for last five to 10 years in search robot technologies," Wiener said.- Yeah right! Nobody spots any spider - anywhere. Claims of indexing all possiblities on database backed websites, claims of world domination through superior software, missed deadlines, vapourware website, nothing more complex than the "hacks" in the O'Reilly Spidering Hacks book.
A classic example of search engine development by press release.
Regards...jmcc
Are they a big company? I mean I don't see any evidence that they are backed by anyone.