Forum Moderators: open

Message Too Old, No Replies

Scraping May Be A Biological Imperative

Scraping is actually a TREME according to Stephen Hawking

         

incrediBILL

10:01 pm on May 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



According to Stephen Hawking, scraping is in our DNA, it's a biological imperative:
Then it took until about a million years ago for the second replicator to emerge – we call them memes (link is external) . Our ancestors began to imitate each other, copying sounds and actions that varied and then were selectively copied by others.
Now the silicon based machinery that we have built is capable of copying, varying and selecting digital information. This information, I suggest, could be a third replicator. I first called these replicators the ‘temes’ – for ‘technological memes’ – but people were so confused by the spelling that I have changed the name to ‘tremes’.

[psychologytoday.com...]

While we sit here trying to stop tremes (technologically replicating memes, aka scraping) all day long it turns out we're fighting a biological imperative that has evolved into it's technological equivalent. Turns out what we're doing is akin to trying to stop salmon from swimming upstream.

Web content scraping is no different than the MP3s that were being downloaded from the notorious Napster site with the difference in web pages are crawled to be captured vs. Napster where people were uploading and downloading directly. How it was done doesn't matter as it's still a treme, and it's in our DNA to collect and disseminate information.

Some actually believe that the whole meaning of life is to collect and disseminate information. That's what DNA does, collects and passes down evolutionary information. The human brain is the ultimate information collector and to that end we've created a new life form if you will, the internet, which is the current pinnacle of this biological imperative to gather and pass on more information to make sense out of the universe.

Therefore, by blocking crawlers, bots and other types of information collectors and aggregators we're actually stopping tremes, stopping them from performing an innate biological need. We're interfering with the drive of all these individuals to hunt, gather, collect and share which has merely been enhanced by the technology.

If you follow what Hawking and some others are saying, tremes supposedly could risk life as we know it which is exactly why we block bots, because they risk our financial and intellectual property well being. The truth is by putting our information on teh web we're actively creating tremes but we don't want others to take over our tremes, which is what happened at Napster, people were taking over the tremes of others.

We're stopping them from creating tremes.using our tremes is basically the bottom line.

I found the whole concept fascinating and it puts the purpose of our whole online life and this particular forum in a totally different light.

Knowing the biological and psychological drive behind the treme we call scraping might even help us fight it knowing the pathology behind it all.

I'm not sure what it all means yet, but it had a profound impact on my view of scraping and bot blocking thanks to some mind blowing insights from Hawking.

I'll be surprised if it doesn't change the way some view this activity as it's kind of hard not to now.

What do you think about this?

lucy24

11:21 pm on May 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: looking at datestamp and wondering how I managed to mislay 1 month plus 13 days ::

keyplyr

11:29 pm on May 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




:: looking at datestamp and wondering how I managed to mislay 1 month plus 13 days ::

On Bill's post above yours? Date displays correctly for me. Maybe it was corrected (secretly by Bill and his elves) or possibly a caching issue?

LifeinAsia

11:59 pm on May 14, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Date displays correctly for me.
I believe Lucy was suggesting that this was an April Fool's post. ("May 14" - "April 1" = "1 month plus 13 days.")

Can't exactly compare to MP3 copying- that was done to avoid paying for the songs, while still acknowledging the original content's author. A lot of scraping is done to try to make money from other people's work and/or pass that work off belonging to the scraper.

But following on Hawking's concept, I view the content creators as the more advanced of the species and the content scrapers as the less advanced. In terms of evolution and improving the gene pool, it makes perfect sense to do everything possible to stop scraping and choke the life out of the less advanced members of the species (preferably before they reproduce) to help clean up the gene pool.

mrtonyg

12:40 am on May 15, 2015 (gmt 0)

10+ Year Member



We humans have advanced to where we are by basically copying each others ideas.

Some of those ideas are copied and in turn improved and the process repeats.

So yes, I totally agree...humans build on prior ideas/process and most of those are "stolen" for lack of a better word.

[edited by: mrtonyg at 12:57 am (utc) on May 15, 2015]

keyplyr

12:55 am on May 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



this was an April Fool's post

Ahhh... with all the work on the site lately, I assumed it was another bug report :)

When confronted with the accusation of copyright infringement, quite often the offender will defend themselves with the excuse they were not profiting by the theft. I've had at least one individual tell me that the content was displayed on *his* computer in *his* living room so he had the right to do whatever he wished with it.

This gives evidence to the possibility that society has become so comfortable with the technology to replicate almost anything, that the uniqueness of the work has lost its significance, thus ownership devalued.

Seems more cultural than biological.

slipkid

5:23 am on May 15, 2015 (gmt 0)

10+ Year Member



Thanks IncrediBill for distracting me from some research I was conducting!

My initial reaction is I take issue with the rather disjointed argument advanced in the article, however, I need to read Dr. Hawkins' paper to understand the basis the author of the link you provided has derived her proposition.

Although I acknowledge that the "internet of things" -- which is getting significant financial investment in the business community -- is a fascinating technological advancement, I feel that the author's suggestion of machines developing qualities, peculiarly described as "reminiscent of a living brain" is something i utterly reject as an electrical engineer.











i

Hobbs

7:00 am on May 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



ok, now researching how to detect & htaccess block the greed DNA
possibly augmenting captcha with a mouth swab or urine sample
but knowing scrapers, they'll easily fake that too & use someone else's
which poses the dilemma, how can one not be greedy yet sell their DNA samples to scrapers
and would that system block advertisers and even me from accessing my own site
oh dear

incrediBILL

7:58 am on May 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"reminiscent of a living brain" is something i utterly reject as an electrical engineer.

Well, in the 80s I would have rejected that too, but the rapid evolution of electronics and software has changed my mind because the technology that didn't exist now does.

Just like a brain, all of the specialized components are now becoming available for sight, sound and even touch, taste and smell. We have all sorts optical recognition, facial recognition and OCR to the point the computer is capable of visually processing many things even concurrently. The audio side is amazing as well with recognition of almost any music or TV show within a second or two, and the whole text to speech and speech to text subsystems are phenomenally good and as more context is being taken into account, the right words are being used in the right places so often today that if I didn't understand the underlying technology I'd call it magic Not to mention the fact that a handheld phone can currently do the equivalent of a Star Trek universal translator for almost any language on the planet in near real time, only limited by the network it's using.

So we pretty much have sound tackled with natural language processing of the content getting better by the day.

Optical capabilities are rapidly catching up and is using crowd sourcing to help identify more and more things, so before long computer vision will be phenomenal.

Then IBM came out with new synaptical processing chips that can handle live streaming visual data in real time which was totally amazing.

Sometimes some of the personal assistants like Siri and Robin are doing so many things and responding to human requests by verbal commands at times even doing interactive things. It's often easy to forget it's not a person until it makes a big faux pas then the illusion breaks. But every year it's getting better and better and before long you'll have a hard time knowing it's not human even with the limited AI being used which is still more impressive than some people I know. The computer assistant has the entire knowledge base of the internet behind it, unlike the poorly educated people these days, so what the PA lacks in understanding nuances it makes up in sheer depth of content.

I was shocked the other day when I asked a simple question like "Who won the fight" the day after the boxing match and the computer just responded confidently with the correct answer. Wow.

However, none of the above really matters because it takes a lot less of a turing test to fool our websites.

To fool the best bot blockers:
- Residential IPs
- Headless browser (PhantomJS, SlimerJS, etc.)
- Pass full browser headers with no mistakes
- Send text to fields with randomized typing routines to fake human input
- Randomizing the 'hardware' characteristics to avoid hardware fingerprinting of a headless browser
- Passing captchas using blow thru technics or possibly real optical processing as it's improving, even to the point the computer can tell if it's a picture of a tree, dog, etc.

Really to stop others from streaming our treme, I've given up on what we call "bot blocking" and changed my philoophy to "human validation". Not exactly a turing test mind you, but it's a lot easier to look for things all browsers having in common than try to play whack-a-mole with all the bots and scrapers. The short list above are the ways to hide bots as humans, and likewise give clues to how to possibly validate humans or where to probe harder to stop technology that looks like humans.

For the most part, it's still surprisingly simple to protect tremes from the internet of things, but many still jump through all sorts of hoops and still tracking bad IP ranges and such, bad user agents, etc. and more. I block most bot traffic by simply proving it's not human with no captcha as the majority is lame and easy to tame. As a mater of fact, a filter on one missing browser field alone made my blog spam free.

But I do agree with some of the conjecture from some articles and TV shows on SCI recently claiming that the technology spreading tremes, AI based or otherwise, will be the end of many of us. They are somewhat correct, but I see that as a financial end, not the end of our reason to be as we still have other purposes that have nothing to do with the information age, just ask any Amish.

For some reason I'm expecting a Dr. Seuss book about extreme memes growing from tremes down by the video streams but I dream.

PS. I went to see EX MACHINA tonight, timing is everything :)

slipkid

4:43 pm on May 15, 2015 (gmt 0)

10+ Year Member



As far as improvements in technology brought about by innovation, your preaching to the choir, buddy.

But I do agree with some of the conjecture from some articles and TV shows on SCI recently claiming that the technology spreading tremes, AI based or otherwise, will be the end of many of us.


I imagined that mankind would be reduced to an unlucky few from an Act of God, e.g., an asteroid striking mother earth. You appear to embrace a vision of a world inhabited by a minimal number of giga-wealthy ,non-person industrialists who no longer have to think, but merely live their lives as structured by and provided to them by their human analog automatons.

P.S. Your time frame reference is wrong. The march of technological enlightenment did not start in the 1980s, but in 1930s Germany.

incrediBILL

7:26 am on May 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, 1930s is wrong too, the first world wide web technically started with the first wireless telegraph and I'd even hazard to say Gutenberg started the whole treme thing but Edison made some major progress with music and movie tremes.