Forum Moderators: phranque

Message Too Old, No Replies

Webaroo - copying the internet to your hard disk

Offline Browsing

         

Iguana

9:07 am on Apr 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Webaroo is going to be attempting to distill the Internet onto a hard drive for people to access offline:

[news.bbc.co.uk ]

So, they are going to download your entire site but what happens to any Adsense ads? There won't be a Google Ad syndicator on your hard disk will there? If you have other Ads then the click throughs won't work.

I've not seen the bot yet but it is going to be

user-agent: WebarooBot

so get your robots.txt ready

trillianjedi

9:28 am on Apr 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmmm - legal minefield this one.

Didn't Alexa try this a couple of years ago? They were going to "sell the entire internet" on DVD-ROM as I recall?

On the ads point, your ads won't display, but theirs will according to that article, when the user searches the offline content:-

Like many other net start-ups, Webaroo relies on adverts for its revenue stream. Those searching pages via those stored on the Webaroo browser will see a couple of relevant text ads before the list of search results.

TJ

foxtunes

5:09 pm on Apr 11, 2006 (gmt 0)

10+ Year Member



So they download your content, and profit from it.

The creators of the sites ripped make nothing.....hmmmmm

I suppose you could ban them in your .htaccess file

Beagle

10:54 pm on Apr 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Internally, Webaroo has likened this distilling of data into a portable format to Douglas Adams' iconic Hitchhiker's Guide to the Galaxy.

But the Hitchhiker's Guide to the Galaxy pays the people who write for it - even though it can't promise them a return trip.

Demaestro

11:04 pm on Apr 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Questions:

If you are off line then how do you get to this data?

Do you have to store it on your hard drive?

If so how much disk space would that take, and if I am off line then how would it get updated?

This honestly sounds like one of those dumb client requests that you see threads for.

client: "We want to have internet in our office. Can you burn it on a disk for me?"

Anyone with a brain: "?!?!"

Iguana

11:03 am on Apr 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can see the point of it. With Wi-Fi hotspots being few and far between and mobile phone datacomms being unreliable outside of the major cities, there are plenty of mobile workers who could benefit from up to 40gb of internet data avaialable on their laptop.

The next question is how are they going to achieve this? I had to ban archive.org from my site because they were attempting to follow the javascript but failing miserably and generating thousands of 404s. There is also all the canonicalisation/redirect/session id/affiliate id problems that Search Engines face to be overcome. I don't think this idea is really going to fly.

Matt Probert

12:56 pm on Apr 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To start with, Webaroo has put together software that lets people download the bits of the web they find want so they can use them when they are mobile, and fast net connections are thin on the ground and even thinner in the air.

Not from my web site if I can stop them <g>

I sell offline versions, and the online version is free to access but advertising supported, heck it's over 15 years work and ongoing. This webaroo business would be blatant theft from a data site like mine.

Thanks for the warning.

Matt

Matt Probert

1:06 pm on Apr 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From the Webaroo web site:


To exclude your entire site:
Add the following text to your Robots.txt file:
User-Agent: WebarooBot
Disallow: /

Matt

Matt Probert

1:29 pm on Apr 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh dear!

My own tests suggest that Webaroo ignores the robots.txt file and downloads content anyway.

Matt

bcolflesh

1:41 pm on Apr 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What is the address range of their robot?

Demaestro

3:56 pm on Apr 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I still don't understand how they are going to make this data available to offline users.

I think the fact that there are more and more Wi-Fi services popping up, that this will mean having an offline versions of the net is not as needed as it may have been even 5 years ago.

I just do not see this idea taking off. I know some people on here want to protect their content, but try to keep in mind there is no "Internet Law". There is nothing saying that a site indexing bot must comply with a robots.txt entry, it is more of a courtesy then anything.

If you are publishing stuff on the internet in a publicly accessable area. You are making it availible to whoever wants to come by and take a look. When they do take a look, your content is copied onto the computer they are viewing it on.

If I wanted I could browse your entire site, then unplug my network cable taking me offline, I would still have your entire site on my computer and I can browse all the pages that I have already seen on your website at my leisure, offline. Basically giving me an offline version of your site, this is a reality of web technology, so if a company decides they want to do exactly that, it is hardly "blatant theft".