Alexa opens up crawler to the public

Forum Moderators: bakedjake

Message Too Old, No Replies

Alexa opens up crawler to the public

For a fee....

grelmar

3:05 pm on Dec 13, 2005 (gmt 0)

From Wired [wired.com]:

In a move with potentially far-reaching implications for the search market, Alexa Internet is opening up its huge web crawler to any programmer who wants paid access to its rich trove of internet data.

From Alexa Web Search Platform [websearch.alexa.com]

The Alexa Web Search Platform provides public access to the vast web crawl collected by Alexa Internet. Users can search and process billions of documents -- even create their own search engines -- using Alexa's search and publication tools.

The pricing scheme is confusing, but it looks like it would be fairly cheap for what its offering.

Anyone in here know what this is really going to accomplish?

vincevincevince

3:10 pm on Dec 13, 2005 (gmt 0)

Now this is very interesting....

Lord Majestic

3:43 pm on Dec 13, 2005 (gmt 0)

$1 per 50 GB processed

I assume here they mean 50 GB of raw uncompressed data - it therefore follows that its $1 per about 2.5 mln web pages, or $2000 per 5 bln (what appears to be their 2 month worth of crawling) processed.

Getting your data out will cost more - $1 per GB is not cheap if you are building full text index: at 2 KB per page (10 times less than raw size) it will take 10 TB or $10,000. It will cost the same if you keep the data on their system for a year.

All in all it is not THAT expensive - spammers will certainly like the idea of paying just 2K for processing of 5 bln pages for email addresses...

vincevincevince

3:57 pm on Dec 13, 2005 (gmt 0)

When you compare it to the costs of building your own index, it's peanuts. I have the perfect application for this - just got to find the perfect time to code it!

Lord Majestic

3:59 pm on Dec 13, 2005 (gmt 0)

Its still more expensive - however the main issue for me would have been the fact that its Alexa that controls all data: anybody building anything serious on top of any platform should be 100% sure of long-term conditions of such platform use.

PaulPA

5:10 pm on Dec 13, 2005 (gmt 0)

Posted inside earlier this morning

[webmasterworld.com ]

Rosalind

5:20 pm on Dec 13, 2005 (gmt 0)

I wonder whether this will have any impact whatsoever on overall bot activity?

surfin2u

5:24 pm on Dec 13, 2005 (gmt 0)

I've noticed alexa crawling my site more recently. Now I know why. I wonder if there's more to this for alexa than just collecting fees. Will the type of requests for their data be a source of valuable information to alexa, and even more importantly to their parent amazon?

Jon_King

6:44 pm on Dec 13, 2005 (gmt 0)

Huh. I don't get it. I am obviously slow but why are they doing this?

Kirby

6:48 pm on Dec 13, 2005 (gmt 0)

>why

Money.

Lord Majestic

6:56 pm on Dec 13, 2005 (gmt 0)

Money.

I doubt it - prices are so low and the product is so exotic that they can't possibly make loads of dosh from it: they probably just have capacity that is available and it makes sense to sell it even if its worth $1.

oddsod

7:00 pm on Dec 13, 2005 (gmt 0)

Spare capacity? It wasn't that long ago they were short of the darn thing.

caspita

7:08 pm on Dec 13, 2005 (gmt 0)

One thing is allowing crawlers to collect our pages for ranking, SERPS, etc. But collecting billions of pages and then sale the pages is a different thing. I mean, is it even legal?. Will they also deliver access to pages forbidden for other crawlers but alexa for example? what about the "no cache" option?, now spamers will be able to copy all your work because alexa will give away the raw data, they won't even need to find a way into your websites, they just pay alexa, period.

internets

7:09 pm on Dec 13, 2005 (gmt 0)

How exactly does Alexa work? They still say "powered by Google," yet they do their own crawl and are now making that raw data available? What part does Google have in this?

Does Alexa consider itself a competitor in the search market (MSN, Yahoo, Google)? I always thought they were just using Google's data for search, and their own data from the Alexa bar to "rank" sites.

Now that I talk it out, I really don't get Alexa. What are they?

Lord Majestic

7:20 pm on Dec 13, 2005 (gmt 0)

Spare capacity? It wasn't that long ago they were short of the darn thing.

It was not long ago when I paid $200 for 500 MB hard disk and was happy, and today I am ordering about 2,000 times more storage for just twice as much dosh :)

physics

9:16 pm on Dec 13, 2005 (gmt 0)

I have the same question as internets ... why are they "POWERED BY GOOGLE" if they do their own crawls and can process the data?

Ocean10000

10:52 pm on Dec 13, 2005 (gmt 0)

I have the same question as internets ... why are they "POWERED BY GOOGLE" if they do their own crawls and can process the data?

One reason that I can think off the top of my head is simply they know they can not compete with Google or other search engines results at this current point. But they can make some $$ on the raw data and bank it for possible future projects, may they be a search engine or what ever at the moment they decided to spend it on.

claus

11:35 pm on Dec 13, 2005 (gmt 0)

>> why

Well, frankly because any kid with some harddrives, some bandwith, and a free script can crawl the web. Ranking results is what Google does really well. At least it's pretty dang hard to do it just as well as them.

It's two different things, that's all.

---
And, unlike John Battelle I'm pretty sure you can find an aged post by me somewhere that mentions this exact thing. I'm probably not even the first to mention it, as I recall having the discussion of an open source crawler with other members here - mostlikely more than a year ago. But nevermind, as I haven't got a blog.

ionchannels

11:53 pm on Dec 13, 2005 (gmt 0)

I just tried to set up an account - seems to only accept US addresses... another pointless restriction on the WORLD wide web - I ... am ... Canadian

carguy84

1:34 am on Dec 14, 2005 (gmt 0)

Umm, so let me get this straight....they're going to be selling MY content? Ya, I don't think so....

Some one didn't think this all the way thru.

internets

3:22 am on Dec 14, 2005 (gmt 0)

good point, carguy...what are they really selling? saved copies of everyone's webpages!

Jack_Hughes

12:33 pm on Dec 14, 2005 (gmt 0)

this has got to be the ultimate button pressers dream tool. i can see a whole load of sites banning alexa's bot.

howiejs

1:26 pm on Dec 14, 2005 (gmt 0)

This shows how important "vertical search" will become

Where users will go to their "engineering search engine" vs. their "summer european travel search engine"

will people scrape it? sure
but they scrape google and everyone else just the same . . .

afterburner

1:33 pm on Dec 14, 2005 (gmt 0)

I don`t think this will catch on

vfilip

2:50 pm on Dec 14, 2005 (gmt 0)

Has anybody opened an account and authorized to use this service? It seems they are not ready yet .

Jack_Hughes

2:53 pm on Dec 14, 2005 (gmt 0)

This shows how important "vertical search" will become
Where users will go to their "engineering search engine" vs. their "summer european travel search engine"

that was my first idea of what to do with it. problem is, i just can't see a vertical engine adding much value to what is already available via the horizontal engines. to work a vertical engine would have to leverage it's knowledge of it's particular area of expertise to produce results that are better than google. other than a few niches i can't see how that would work well.

I can however see SEOs jumping on this for performing competitive analysis. want to know an accurate backlink count for sites linking to another site? well, that would be fairly painless and cheap with this tool.

physics

7:13 pm on Dec 14, 2005 (gmt 0)

One reason that I can think off the top of my head is simply they know they can not compete with Google or other search engines results at this current point.

Even if their search wasn't "as good" at least it would be something different. Plus they could use all that traffic data they collect to help their ranking algo.
Also, I'm aware that crawling and indexing are different things and the difficulty of creating an index ... still I think they should at least give it a college try.

Clark

7:20 pm on Dec 14, 2005 (gmt 0)

I love the idea of offering this to legit software houses or legit programmers. But hate the idea because the biggest customers will be spammers.

Dang.

We need a new Internet. Hey guys, want to get together and start Internet 3.0. New protocols. Get rid of old baggage.

*No more "www".
*No more spoofed email addresses.
*You can use any .end domain you want, not just .com and the official few.
*The .end part of the domain will be built into the protocol in such a way that you can programmatically detect the difference between a subdomain and the "end part".
* so much more.

OK, dream sequence over.

Namaste

7:56 pm on Dec 14, 2005 (gmt 0)

so search has become a comoditty web service...was bound to happen.

time for me to set-up that portal and not worry about technology, only getting users in. Any VCs around feel free to contact me ;)

mhhfive

5:26 pm on Dec 15, 2005 (gmt 0)

it doesn't look like Alexa lets anyone create their own spiders to run... THAT would be cool. Correct me if I'm wrong, but it looks like all you can do is write your own stuff to sift through what Alexa has already spidered and stored. It would be way cooler if Alexa let ppl alter how its crawler(s) actually worked...

Still, it's pretty neat to have access to a relatively large search index that's already created. People can test their spiders on Alexa's index.. but then you're sorta trapped in Alexa's way of doing things -- which is the point, I think.

This 41 message thread spans 2 pages: 41