Welcome to WebmasterWorld Guest from 54.196.238.210

Forum Moderators: bakedjake

Message Too Old, No Replies

Alexa opens up crawler to the public

For a fee....

     

grelmar

3:05 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From Wired [wired.com]:

In a move with potentially far-reaching implications for the search market, Alexa Internet is opening up its huge web crawler to any programmer who wants paid access to its rich trove of internet data.

From Alexa Web Search Platform [websearch.alexa.com]

The Alexa Web Search Platform provides public access to the vast web crawl collected by Alexa Internet. Users can search and process billions of documents -- even create their own search engines -- using Alexa's search and publication tools.

The pricing scheme is confusing, but it looks like it would be fairly cheap for what its offering.

Anyone in here know what this is really going to accomplish?

vincevincevince

3:10 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Now this is very interesting....

Lord Majestic

3:43 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



$1 per 50 GB processed

I assume here they mean 50 GB of raw uncompressed data - it therefore follows that its $1 per about 2.5 mln web pages, or $2000 per 5 bln (what appears to be their 2 month worth of crawling) processed.

Getting your data out will cost more - $1 per GB is not cheap if you are building full text index: at 2 KB per page (10 times less than raw size) it will take 10 TB or $10,000. It will cost the same if you keep the data on their system for a year.

All in all it is not THAT expensive - spammers will certainly like the idea of paying just 2K for processing of 5 bln pages for email addresses...

vincevincevince

3:57 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member



When you compare it to the costs of building your own index, it's peanuts. I have the perfect application for this - just got to find the perfect time to code it!

Lord Majestic

3:59 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Its still more expensive - however the main issue for me would have been the fact that its Alexa that controls all data: anybody building anything serious on top of any platform should be 100% sure of long-term conditions of such platform use.

PaulPA

5:10 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



Posted inside earlier this morning

[webmasterworld.com ]

Rosalind

5:20 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wonder whether this will have any impact whatsoever on overall bot activity?

surfin2u

5:24 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



I've noticed alexa crawling my site more recently. Now I know why. I wonder if there's more to this for alexa than just collecting fees. Will the type of requests for their data be a source of valuable information to alexa, and even more importantly to their parent amazon?

Jon_King

6:44 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Huh. I don't get it. I am obviously slow but why are they doing this?

Kirby

6:48 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>why

Money.

Lord Majestic

6:56 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Money.

I doubt it - prices are so low and the product is so exotic that they can't possibly make loads of dosh from it: they probably just have capacity that is available and it makes sense to sell it even if its worth $1.

oddsod

7:00 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Spare capacity? It wasn't that long ago they were short of the darn thing.

caspita

7:08 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



One thing is allowing crawlers to collect our pages for ranking, SERPS, etc. But collecting billions of pages and then sale the pages is a different thing. I mean, is it even legal?. Will they also deliver access to pages forbidden for other crawlers but alexa for example? what about the "no cache" option?, now spamers will be able to copy all your work because alexa will give away the raw data, they won't even need to find a way into your websites, they just pay alexa, period.

internets

7:09 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



How exactly does Alexa work? They still say "powered by Google," yet they do their own crawl and are now making that raw data available? What part does Google have in this?

Does Alexa consider itself a competitor in the search market (MSN, Yahoo, Google)? I always thought they were just using Google's data for search, and their own data from the Alexa bar to "rank" sites.

Now that I talk it out, I really don't get Alexa. What are they?

Lord Majestic

7:20 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Spare capacity? It wasn't that long ago they were short of the darn thing.

It was not long ago when I paid $200 for 500 MB hard disk and was happy, and today I am ordering about 2,000 times more storage for just twice as much dosh :)

physics

9:16 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have the same question as internets ... why are they "POWERED BY GOOGLE" if they do their own crawls and can process the data?

Ocean10000

10:52 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I have the same question as internets ... why are they "POWERED BY GOOGLE" if they do their own crawls and can process the data?

One reason that I can think off the top of my head is simply they know they can not compete with Google or other search engines results at this current point. But they can make some $$ on the raw data and bank it for possible future projects, may they be a search engine or what ever at the moment they decided to spend it on.

claus

11:35 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> why

Well, frankly because any kid with some harddrives, some bandwith, and a free script can crawl the web. Ranking results is what Google does really well. At least it's pretty dang hard to do it just as well as them.

It's two different things, that's all.

---
And, unlike John Battelle I'm pretty sure you can find an aged post by me somewhere that mentions this exact thing. I'm probably not even the first to mention it, as I recall having the discussion of an open source crawler with other members here - mostlikely more than a year ago. But nevermind, as I haven't got a blog.

ionchannels

11:53 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



I just tried to set up an account - seems to only accept US addresses... another pointless restriction on the WORLD wide web - I ... am ... Canadian

carguy84

1:34 am on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Umm, so let me get this straight....they're going to be selling MY content? Ya, I don't think so....

Some one didn't think this all the way thru.

internets

3:22 am on Dec 14, 2005 (gmt 0)

10+ Year Member



good point, carguy...what are they really selling? saved copies of everyone's webpages!

Jack_Hughes

12:33 pm on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



this has got to be the ultimate button pressers dream tool. i can see a whole load of sites banning alexa's bot.

howiejs

1:26 pm on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This shows how important "vertical search" will become

Where users will go to their "engineering search engine" vs. their "summer european travel search engine"

will people scrape it? sure
but they scrape google and everyone else just the same . . .

afterburner

1:33 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



I don`t think this will catch on

vfilip

2:50 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



Has anybody opened an account and authorized to use this service? It seems they are not ready yet .

Jack_Hughes

2:53 pm on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This shows how important "vertical search" will become

Where users will go to their "engineering search engine" vs. their "summer european travel search engine"

that was my first idea of what to do with it. problem is, i just can't see a vertical engine adding much value to what is already available via the horizontal engines. to work a vertical engine would have to leverage it's knowledge of it's particular area of expertise to produce results that are better than google. other than a few niches i can't see how that would work well.

I can however see SEOs jumping on this for performing competitive analysis. want to know an accurate backlink count for sites linking to another site? well, that would be fairly painless and cheap with this tool.

physics

7:13 pm on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




One reason that I can think off the top of my head is simply they know they can not compete with Google or other search engines results at this current point.

Even if their search wasn't "as good" at least it would be something different. Plus they could use all that traffic data they collect to help their ranking algo.
Also, I'm aware that crawling and indexing are different things and the difficulty of creating an index ... still I think they should at least give it a college try.

Clark

7:20 pm on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I love the idea of offering this to legit software houses or legit programmers. But hate the idea because the biggest customers will be spammers.

Dang.

We need a new Internet. Hey guys, want to get together and start Internet 3.0. New protocols. Get rid of old baggage.

*No more "www".
*No more spoofed email addresses.
*You can use any .end domain you want, not just .com and the official few.
*The .end part of the domain will be built into the protocol in such a way that you can programmatically detect the difference between a subdomain and the "end part".
* so much more.

OK, dream sequence over.

Namaste

7:56 pm on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



so search has become a comoditty web service...was bound to happen.

time for me to set-up that portal and not worry about technology, only getting users in. Any VCs around feel free to contact me ;)

mhhfive

5:26 pm on Dec 15, 2005 (gmt 0)

10+ Year Member



it doesn't look like Alexa lets anyone create their own spiders to run... THAT would be cool. Correct me if I'm wrong, but it looks like all you can do is write your own stuff to sift through what Alexa has already spidered and stored. It would be way cooler if Alexa let ppl alter how its crawler(s) actually worked...

Still, it's pretty neat to have access to a relatively large search index that's already created. People can test their spiders on Alexa's index.. but then you're sorta trapped in Alexa's way of doing things -- which is the point, I think.

This 41 message thread spans 2 pages: 41
 

Featured Threads

Hot Threads This Week

Hot Threads This Month