Welcome to WebmasterWorld Guest from 18.205.109.82

Forum Moderators: DixonJones & mademetop

Message Too Old, No Replies

When Bots Become a Problem

What Happend When Copyright Bots and Search Bots Become a Major Pain?

     
3:21 pm on Jan 13, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts:1526
votes: 0


This is sort of a rhetorical question, and a "conversation piece," so to speak, but something hit me as I was poking through my weblogs this morning.

I have a dinky little site. It's really nothing that makes the bandwidth meter on my (lowest tier) account even twitch. I'm ecstatic when one of my pages makes GPR 4.

One of the biggest features of my site is a few galleries with about 150 various photos and images I've taken/painted/designed. One gallery is mildly popular because it is a series on a local abandoned mental hospital, and draws a few links from "ghost chaser" sites and whatnot.

Most of the other sites I've done are very specialized NPO sites that have even smaller audiences than my own site, so this isn't an issue for them.

In any case, I noticed a sudden big spike in my bandwidth yesterday, and in the page hits. They seemed to all resolve to the MSN bot.

Sadly, search engines seem to be the bulk of my visitors. This was certainly the case yesterday.

I know that there are many new services like CopyScape [copyscape.com] and DigiMarc [digimarc.com] that "spider" the Web, looking for copyright infringement. I have a feeling that this MSN thing may be a service they are providing, because they have never downloaded all my images before.

Of course, they won't find it on my site, because all of my images are completely original, but that doesn't prevent them from DOWNLOADING MY ENTIRE FREAKIN' GALLERY AT ONCE.

Yesterday, the spike showed that they seem to be getting even more aggressive than usual.

Now, as an artist and a designer, I am all for copyright protection, but, as a Web designer who specializes in REAL CHEAP installations for people who can't afford much, this bothers me. In addition, real artists tend to be the "starving" variety, and usually can't afford expensive hosting. They tend to get the lowest tier service. They want people to see their work. They wouldn't have a Web site otherwise. The key word here is "people." They, I am quite sure, would not be thrilled to find that 75% of their bandwidth was occupied by robots that think they are potential crooks.

For example, I could see a search 'bot being bothersome, but it brings a reward, by ranking your page in a search index. However, a copyright 'bot doesn't give you anything other than a feeling like Butthead had after his cavity search ("Woah! Did I just score?") They will throw out the stuff they downloaded (or use it against you), and will not give you any benefit besides an indelible stain on your bandwidth meter.

This sounds like a good topic for discussion. If an RIAA 'bot or a DigiMarc spider pushed your banwidth into a new tier of service (or brought down your site), what would you do/think/want to do, and what recourse would you have? Can you block them without throwing the baby out with the bathwater or inviting more aggressive behavior?

8:58 pm on Jan 13, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 18, 2005
posts:817
votes: 0


This is like asking what types of searches airline security should do on passengers. Should they pick people at random and do a full search of their body and belongings? Or should they pick a random thing to search of every passenger?

Another way of looking at it is how thorough a search of the Internet you would want if you suspected someone was making money off of your art?

9:41 pm on Jan 13, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 21, 2005
posts: 1526
votes: 0


Exactly. This is not a simple issue.

The one thing that does bother me is that bandwidth is money. The price differential between two adjacent tiers of an ISP can be substantial.

These services charge a LOT of money to their customers, yet pay the people whose sites they search nada for using what can be pretty large swaths of bandwidth.

If I can't get my art on the Web because the bandwidth requirements are too high, then it's a mixed blessing/curse. As long as that bandwidth is people looking (maybe stealing) my images, then it's OK. Some artists rely on "viral marketing," and stealing of images is OK. Most artists don't believe that at all, so they like the idea of having copyright protection. And, they are probably the ones that get ripped off the most. No one is going to steal Roger Dean's images, because they are instantly recognizable. However, why not nab some schuck's images from the Seattle area for your coffee shop in New York? They can't afford MarcSpider, so you are probably OK.

I'll lay odds that the provenance of a heck of a lot of this "free clip art" is pretty suspect, and has been ripped off from artists that have little ability to track it down.

Here's the rub, though: The artists who can afford DigiMarc MarcSpider (or whatever it's called these days) are the ones that can afford high-bandwidth sites. The ones that can get swamped by MarcSpider are the ones that can't afford it, so you quite literally have a situation of the Haves making money on the backs of the Have Nots.

Not such a simple issue.

7:01 am on Jan 14, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jtara is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 26, 2005
posts:3041
votes: 0


Block them. They don't do a thing for you. They don't have a "right" to access your images.

It would be different if they offered some benefit. For example, some kind of seal certifying that your site is free of copyright violations.

Maybe these bots will offer that, if enough webmasters just block them.

9:38 pm on Jan 14, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 18, 2005
posts:817
votes: 0


It would be different if they offered some benefit. For example, some kind of seal certifying that your site is free of copyright violations.

If webmasters block copyright bots, they won't be able to access enough sites to develop any credibility. Thus, no one will believe or honor their seals.

11:57 am on Jan 17, 2007 (gmt 0)

Administrator

WebmasterWorld Administrator jatar_k is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 24, 2001
posts:15756
votes: 0


I'm with jtara, ban 'em

search engines bots are just as bad, anything that takes from your site with no return or over spiders is a pain. Block them from parts of your site, use rbots.txt or .htaccess, whatever works for you.

we need incredibill in this thread ;)

3:49 pm on Jan 21, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5506
votes: 5


MSN, Google and Yahoo have multiple bots. Many crawling simultaneously.

If this particular bot was an image or media bot, than denial of access may be a preference.

There are constant problems (even with these major SE's) in maintaining compliance to robots.txt for images.

5:11 pm on Jan 21, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 11, 2004
posts:1014
votes: 0


We get hit by malicious bots, and just ban the ip addresses using ipchains (iptables).

The better behaved search robots all respond to requests to slow down (some require emailing, others accept a parameter in robots.txt)

Matt

8:02 pm on Jan 21, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5506
votes: 5

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members