vincevincevince

msg:466062 | 3:10 pm on Dec 13, 2005 (gmt 0) |
Now this is very interesting....
|
Lord Majestic

msg:466063 | 3:43 pm on Dec 13, 2005 (gmt 0) |
I assume here they mean 50 GB of raw uncompressed data - it therefore follows that its $1 per about 2.5 mln web pages, or $2000 per 5 bln (what appears to be their 2 month worth of crawling) processed. Getting your data out will cost more - $1 per GB is not cheap if you are building full text index: at 2 KB per page (10 times less than raw size) it will take 10 TB or $10,000. It will cost the same if you keep the data on their system for a year. All in all it is not THAT expensive - spammers will certainly like the idea of paying just 2K for processing of 5 bln pages for email addresses...
|
vincevincevince

msg:466064 | 3:57 pm on Dec 13, 2005 (gmt 0) |
When you compare it to the costs of building your own index, it's peanuts. I have the perfect application for this - just got to find the perfect time to code it!
|
Lord Majestic

msg:466065 | 3:59 pm on Dec 13, 2005 (gmt 0) |
Its still more expensive - however the main issue for me would have been the fact that its Alexa that controls all data: anybody building anything serious on top of any platform should be 100% sure of long-term conditions of such platform use.
|
PaulPA

msg:466066 | 5:10 pm on Dec 13, 2005 (gmt 0) |
Posted inside earlier this morning [webmasterworld.com ]
|
Rosalind

msg:466067 | 5:20 pm on Dec 13, 2005 (gmt 0) |
I wonder whether this will have any impact whatsoever on overall bot activity?
|
surfin2u

msg:466068 | 5:24 pm on Dec 13, 2005 (gmt 0) |
I've noticed alexa crawling my site more recently. Now I know why. I wonder if there's more to this for alexa than just collecting fees. Will the type of requests for their data be a source of valuable information to alexa, and even more importantly to their parent amazon?
|
Jon_King

msg:466069 | 6:44 pm on Dec 13, 2005 (gmt 0) |
Huh. I don't get it. I am obviously slow but why are they doing this?
|
Kirby

msg:466070 | 6:48 pm on Dec 13, 2005 (gmt 0) |
>why Money.
|
Lord Majestic

msg:466071 | 6:56 pm on Dec 13, 2005 (gmt 0) |
I doubt it - prices are so low and the product is so exotic that they can't possibly make loads of dosh from it: they probably just have capacity that is available and it makes sense to sell it even if its worth $1.
|
oddsod

msg:466072 | 7:00 pm on Dec 13, 2005 (gmt 0) |
Spare capacity? It wasn't that long ago they were short of the darn thing.
|
caspita

msg:466073 | 7:08 pm on Dec 13, 2005 (gmt 0) |
One thing is allowing crawlers to collect our pages for ranking, SERPS, etc. But collecting billions of pages and then sale the pages is a different thing. I mean, is it even legal?. Will they also deliver access to pages forbidden for other crawlers but alexa for example? what about the "no cache" option?, now spamers will be able to copy all your work because alexa will give away the raw data, they won't even need to find a way into your websites, they just pay alexa, period.
|
internets

msg:466074 | 7:09 pm on Dec 13, 2005 (gmt 0) |
How exactly does Alexa work? They still say "powered by Google," yet they do their own crawl and are now making that raw data available? What part does Google have in this? Does Alexa consider itself a competitor in the search market (MSN, Yahoo, Google)? I always thought they were just using Google's data for search, and their own data from the Alexa bar to "rank" sites. Now that I talk it out, I really don't get Alexa. What are they?
|
Lord Majestic

msg:466075 | 7:20 pm on Dec 13, 2005 (gmt 0) |
| Spare capacity? It wasn't that long ago they were short of the darn thing. |
| It was not long ago when I paid $200 for 500 MB hard disk and was happy, and today I am ordering about 2,000 times more storage for just twice as much dosh :)
|
physics

msg:466076 | 9:16 pm on Dec 13, 2005 (gmt 0) |
I have the same question as internets ... why are they "POWERED BY GOOGLE" if they do their own crawls and can process the data?
|
Ocean10000

msg:466077 | 10:52 pm on Dec 13, 2005 (gmt 0) |
| I have the same question as internets ... why are they "POWERED BY GOOGLE" if they do their own crawls and can process the data? |
| One reason that I can think off the top of my head is simply they know they can not compete with Google or other search engines results at this current point. But they can make some $$ on the raw data and bank it for possible future projects, may they be a search engine or what ever at the moment they decided to spend it on.
|
claus

msg:466078 | 11:35 pm on Dec 13, 2005 (gmt 0) |
>> why Well, frankly because any kid with some harddrives, some bandwith, and a free script can crawl the web. Ranking results is what Google does really well. At least it's pretty dang hard to do it just as well as them. It's two different things, that's all. --- And, unlike John Battelle I'm pretty sure you can find an aged post by me somewhere that mentions this exact thing. I'm probably not even the first to mention it, as I recall having the discussion of an open source crawler with other members here - mostlikely more than a year ago. But nevermind, as I haven't got a blog.
|
ionchannels

msg:466079 | 11:53 pm on Dec 13, 2005 (gmt 0) |
I just tried to set up an account - seems to only accept US addresses... another pointless restriction on the WORLD wide web - I ... am ... Canadian
|
carguy84

msg:466080 | 1:34 am on Dec 14, 2005 (gmt 0) |
Umm, so let me get this straight....they're going to be selling MY content? Ya, I don't think so.... Some one didn't think this all the way thru.
|
internets

msg:466081 | 3:22 am on Dec 14, 2005 (gmt 0) |
good point, carguy...what are they really selling? saved copies of everyone's webpages!
|
Jack_Hughes

msg:466082 | 12:33 pm on Dec 14, 2005 (gmt 0) |
this has got to be the ultimate button pressers dream tool. i can see a whole load of sites banning alexa's bot.
|
howiejs

msg:466083 | 1:26 pm on Dec 14, 2005 (gmt 0) |
This shows how important "vertical search" will become Where users will go to their "engineering search engine" vs. their "summer european travel search engine" will people scrape it? sure but they scrape google and everyone else just the same . . .
|
afterburner

msg:466084 | 1:33 pm on Dec 14, 2005 (gmt 0) |
I don`t think this will catch on
|
vfilip

msg:466085 | 2:50 pm on Dec 14, 2005 (gmt 0) |
Has anybody opened an account and authorized to use this service? It seems they are not ready yet .
|
Jack_Hughes

msg:466086 | 2:53 pm on Dec 14, 2005 (gmt 0) |
| This shows how important "vertical search" will become Where users will go to their "engineering search engine" vs. their "summer european travel search engine" |
| that was my first idea of what to do with it. problem is, i just can't see a vertical engine adding much value to what is already available via the horizontal engines. to work a vertical engine would have to leverage it's knowledge of it's particular area of expertise to produce results that are better than google. other than a few niches i can't see how that would work well. I can however see SEOs jumping on this for performing competitive analysis. want to know an accurate backlink count for sites linking to another site? well, that would be fairly painless and cheap with this tool.
|
physics

msg:466087 | 7:13 pm on Dec 14, 2005 (gmt 0) |
One reason that I can think off the top of my head is simply they know they can not compete with Google or other search engines results at this current point. |
| Even if their search wasn't "as good" at least it would be something different. Plus they could use all that traffic data they collect to help their ranking algo. Also, I'm aware that crawling and indexing are different things and the difficulty of creating an index ... still I think they should at least give it a college try.
|
Clark

msg:466088 | 7:20 pm on Dec 14, 2005 (gmt 0) |
I love the idea of offering this to legit software houses or legit programmers. But hate the idea because the biggest customers will be spammers. Dang. We need a new Internet. Hey guys, want to get together and start Internet 3.0. New protocols. Get rid of old baggage. *No more "www". *No more spoofed email addresses. *You can use any .end domain you want, not just .com and the official few. *The .end part of the domain will be built into the protocol in such a way that you can programmatically detect the difference between a subdomain and the "end part". * so much more. OK, dream sequence over.
|
Namaste

msg:466089 | 7:56 pm on Dec 14, 2005 (gmt 0) |
so search has become a comoditty web service...was bound to happen. time for me to set-up that portal and not worry about technology, only getting users in. Any VCs around feel free to contact me ;)
|
mhhfive

msg:466090 | 5:26 pm on Dec 15, 2005 (gmt 0) |
it doesn't look like Alexa lets anyone create their own spiders to run... THAT would be cool. Correct me if I'm wrong, but it looks like all you can do is write your own stuff to sift through what Alexa has already spidered and stored. It would be way cooler if Alexa let ppl alter how its crawler(s) actually worked... Still, it's pretty neat to have access to a relatively large search index that's already created. People can test their spiders on Alexa's index.. but then you're sorta trapped in Alexa's way of doing things -- which is the point, I think.
|
| This 41 message thread spans 2 pages: 41 ( [1] 2 ) > > |
|
|