Welcome to WebmasterWorld Guest from 54.234.244.30

Forum Moderators: bakedjake

Message Too Old, No Replies

Alexa opens up crawler to the public

For a fee....

     
3:05 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:683
votes: 0


From Wired [wired.com]:

In a move with potentially far-reaching implications for the search market, Alexa Internet is opening up its huge web crawler to any programmer who wants paid access to its rich trove of internet data.

From Alexa Web Search Platform [websearch.alexa.com]

The Alexa Web Search Platform provides public access to the vast web crawl collected by Alexa Internet. Users can search and process billions of documents -- even create their own search engines -- using Alexa's search and publication tools.

The pricing scheme is confusing, but it looks like it would be fairly cheap for what its offering.

Anyone in here know what this is really going to accomplish?

3:10 pm on Dec 13, 2005 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


Now this is very interesting....
3:43 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


$1 per 50 GB processed

I assume here they mean 50 GB of raw uncompressed data - it therefore follows that its $1 per about 2.5 mln web pages, or $2000 per 5 bln (what appears to be their 2 month worth of crawling) processed.

Getting your data out will cost more - $1 per GB is not cheap if you are building full text index: at 2 KB per page (10 times less than raw size) it will take 10 TB or $10,000. It will cost the same if you keep the data on their system for a year.

All in all it is not THAT expensive - spammers will certainly like the idea of paying just 2K for processing of 5 bln pages for email addresses...

3:57 pm on Dec 13, 2005 (gmt 0)

Senior Member from MY 

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 1, 2003
posts:4847
votes: 0


When you compare it to the costs of building your own index, it's peanuts. I have the perfect application for this - just got to find the perfect time to code it!
3:59 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


Its still more expensive - however the main issue for me would have been the fact that its Alexa that controls all data: anybody building anything serious on top of any platform should be 100% sure of long-term conditions of such platform use.
5:10 pm on Dec 13, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 24, 2004
posts:388
votes: 0


Posted inside earlier this morning

[webmasterworld.com ]

5:20 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 16, 2003
posts:992
votes: 0


I wonder whether this will have any impact whatsoever on overall bot activity?
5:24 pm on Dec 13, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 19, 2004
posts:394
votes: 0


I've noticed alexa crawling my site more recently. Now I know why. I wonder if there's more to this for alexa than just collecting fees. Will the type of requests for their data be a source of valuable information to alexa, and even more importantly to their parent amazon?
6:44 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 16, 2002
posts:2133
votes: 1


Huh. I don't get it. I am obviously slow but why are they doing this?
6:48 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 1, 2003
posts:1201
votes: 0


>why

Money.

6:56 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


Money.

I doubt it - prices are so low and the product is so exotic that they can't possibly make loads of dosh from it: they probably just have capacity that is available and it makes sense to sell it even if its worth $1.

7:00 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts:2259
votes: 0


Spare capacity? It wasn't that long ago they were short of the darn thing.
7:08 pm on Dec 13, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Dec 1, 2003
posts:311
votes: 0


One thing is allowing crawlers to collect our pages for ranking, SERPS, etc. But collecting billions of pages and then sale the pages is a different thing. I mean, is it even legal?. Will they also deliver access to pages forbidden for other crawlers but alexa for example? what about the "no cache" option?, now spamers will be able to copy all your work because alexa will give away the raw data, they won't even need to find a way into your websites, they just pay alexa, period.
7:09 pm on Dec 13, 2005 (gmt 0)

New User

10+ Year Member

joined:June 3, 2005
posts:15
votes: 0


How exactly does Alexa work? They still say "powered by Google," yet they do their own crawl and are now making that raw data available? What part does Google have in this?

Does Alexa consider itself a competitor in the search market (MSN, Yahoo, Google)? I always thought they were just using Google's data for search, and their own data from the Alexa bar to "rank" sites.

Now that I talk it out, I really don't get Alexa. What are they?

7:20 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


Spare capacity? It wasn't that long ago they were short of the darn thing.

It was not long ago when I paid $200 for 500 MB hard disk and was happy, and today I am ordering about 2,000 times more storage for just twice as much dosh :)

9:16 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2001
posts:2547
votes: 0


I have the same question as internets ... why are they "POWERED BY GOOGLE" if they do their own crawls and can process the data?
10:52 pm on Dec 13, 2005 (gmt 0)

Administrator

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month

joined:Jan 14, 2004
posts:852
votes: 0


I have the same question as internets ... why are they "POWERED BY GOOGLE" if they do their own crawls and can process the data?

One reason that I can think off the top of my head is simply they know they can not compete with Google or other search engines results at this current point. But they can make some $$ on the raw data and bank it for possible future projects, may they be a search engine or what ever at the moment they decided to spend it on.

11:35 pm on Dec 13, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 15, 2003
posts:2395
votes: 0


>> why

Well, frankly because any kid with some harddrives, some bandwith, and a free script can crawl the web. Ranking results is what Google does really well. At least it's pretty dang hard to do it just as well as them.

It's two different things, that's all.

---
And, unlike John Battelle I'm pretty sure you can find an aged post by me somewhere that mentions this exact thing. I'm probably not even the first to mention it, as I recall having the discussion of an open source crawler with other members here - mostlikely more than a year ago. But nevermind, as I haven't got a blog.

11:53 pm on Dec 13, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 22, 2005
posts:95
votes: 0


I just tried to set up an account - seems to only accept US addresses... another pointless restriction on the WORLD wide web - I ... am ... Canadian
1:34 am on Dec 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 13, 2005
posts:1077
votes: 0


Umm, so let me get this straight....they're going to be selling MY content? Ya, I don't think so....

Some one didn't think this all the way thru.

3:22 am on Dec 14, 2005 (gmt 0)

New User

10+ Year Member

joined:June 3, 2005
posts:15
votes: 0


good point, carguy...what are they really selling? saved copies of everyone's webpages!
12:33 pm on Dec 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 10, 2003
posts:654
votes: 0


this has got to be the ultimate button pressers dream tool. i can see a whole load of sites banning alexa's bot.
1:26 pm on Dec 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 3, 2003
posts:1092
votes: 0


This shows how important "vertical search" will become

Where users will go to their "engineering search engine" vs. their "summer european travel search engine"

will people scrape it? sure
but they scrape google and everyone else just the same . . .

1:33 pm on Dec 14, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Apr 5, 2002
posts:210
votes: 0


I don`t think this will catch on
2:50 pm on Dec 14, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 9, 2004
posts:65
votes: 0


Has anybody opened an account and authorized to use this service? It seems they are not ready yet .
2:53 pm on Dec 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 10, 2003
posts:654
votes: 0


This shows how important "vertical search" will become

Where users will go to their "engineering search engine" vs. their "summer european travel search engine"

that was my first idea of what to do with it. problem is, i just can't see a vertical engine adding much value to what is already available via the horizontal engines. to work a vertical engine would have to leverage it's knowledge of it's particular area of expertise to produce results that are better than google. other than a few niches i can't see how that would work well.

I can however see SEOs jumping on this for performing competitive analysis. want to know an accurate backlink count for sites linking to another site? well, that would be fairly painless and cheap with this tool.

7:13 pm on Dec 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2001
posts:2547
votes: 0



One reason that I can think off the top of my head is simply they know they can not compete with Google or other search engines results at this current point.

Even if their search wasn't "as good" at least it would be something different. Plus they could use all that traffic data they collect to help their ranking algo.
Also, I'm aware that crawling and indexing are different things and the difficulty of creating an index ... still I think they should at least give it a college try.
7:20 pm on Dec 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 8, 2002
posts:2335
votes: 0


I love the idea of offering this to legit software houses or legit programmers. But hate the idea because the biggest customers will be spammers.

Dang.

We need a new Internet. Hey guys, want to get together and start Internet 3.0. New protocols. Get rid of old baggage.

*No more "www".
*No more spoofed email addresses.
*You can use any .end domain you want, not just .com and the official few.
*The .end part of the domain will be built into the protocol in such a way that you can programmatically detect the difference between a subdomain and the "end part".
* so much more.

OK, dream sequence over.

7:56 pm on Dec 14, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 2, 2002
posts:792
votes: 0


so search has become a comoditty web service...was bound to happen.

time for me to set-up that portal and not worry about technology, only getting users in. Any VCs around feel free to contact me ;)

5:26 pm on Dec 15, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 9, 2004
posts:169
votes: 0


it doesn't look like Alexa lets anyone create their own spiders to run... THAT would be cool. Correct me if I'm wrong, but it looks like all you can do is write your own stuff to sift through what Alexa has already spidered and stored. It would be way cooler if Alexa let ppl alter how its crawler(s) actually worked...

Still, it's pretty neat to have access to a relatively large search index that's already created. People can test their spiders on Alexa's index.. but then you're sorta trapped in Alexa's way of doing things -- which is the point, I think.

This 41 message thread spans 2 pages: 41
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members