Forum Moderators: phranque
So, they are going to download your entire site but what happens to any Adsense ads? There won't be a Google Ad syndicator on your hard disk will there? If you have other Ads then the click throughs won't work.
I've not seen the bot yet but it is going to be
user-agent: WebarooBot
so get your robots.txt ready
Didn't Alexa try this a couple of years ago? They were going to "sell the entire internet" on DVD-ROM as I recall?
On the ads point, your ads won't display, but theirs will according to that article, when the user searches the offline content:-
Like many other net start-ups, Webaroo relies on adverts for its revenue stream. Those searching pages via those stored on the Webaroo browser will see a couple of relevant text ads before the list of search results.
TJ
If you are off line then how do you get to this data?
Do you have to store it on your hard drive?
If so how much disk space would that take, and if I am off line then how would it get updated?
This honestly sounds like one of those dumb client requests that you see threads for.
client: "We want to have internet in our office. Can you burn it on a disk for me?"
Anyone with a brain: "?!?!"
The next question is how are they going to achieve this? I had to ban archive.org from my site because they were attempting to follow the javascript but failing miserably and generating thousands of 404s. There is also all the canonicalisation/redirect/session id/affiliate id problems that Search Engines face to be overcome. I don't think this idea is really going to fly.
To start with, Webaroo has put together software that lets people download the bits of the web they find want so they can use them when they are mobile, and fast net connections are thin on the ground and even thinner in the air.
Not from my web site if I can stop them <g>
I sell offline versions, and the online version is free to access but advertising supported, heck it's over 15 years work and ongoing. This webaroo business would be blatant theft from a data site like mine.
Thanks for the warning.
Matt
I think the fact that there are more and more Wi-Fi services popping up, that this will mean having an offline versions of the net is not as needed as it may have been even 5 years ago.
I just do not see this idea taking off. I know some people on here want to protect their content, but try to keep in mind there is no "Internet Law". There is nothing saying that a site indexing bot must comply with a robots.txt entry, it is more of a courtesy then anything.
If you are publishing stuff on the internet in a publicly accessable area. You are making it availible to whoever wants to come by and take a look. When they do take a look, your content is copied onto the computer they are viewing it on.
If I wanted I could browse your entire site, then unplug my network cable taking me offline, I would still have your entire site on my computer and I can browse all the pages that I have already seen on your website at my leisure, offline. Basically giving me an offline version of your site, this is a reality of web technology, so if a company decides they want to do exactly that, it is hardly "blatant theft".