Forum Moderators: open

Message Too Old, No Replies

Purge Information from domain tools and aboutus.org?

         

doni

3:33 am on Mar 8, 2009 (gmt 0)

10+ Year Member



Hello everybody

I'm a total noob to this forum, and to be honest I'm not super tech saavy but I do my best to get by... on to the issue:

Is there anyway to remove/purge your info from a site like domain tools or aboutus.org?

I've found another post on the site made by IncrediBill which showed me how to block the perpetrators IP Addresses... but I'm not sure if we can take it to the next level and get rid of all their stolen information.

I'm a musician, but I don't make enough money yet to be free from the employment world. The problem is these sites make it dam near impossible for me to have clean google results for my first+lastname. I am making a general assumption that in a tough job market, employers would choose somebody who does not have any other extra-curricular activities that they work on over somebody who does; especially when it comes to music, because let's face it, all musicians are peace pipe smoking liabilities (totally not true, but I believe this is the stigma...)

Another thought, am I the only one who finds these domain tools type websites horrible offensive and a huge breach of privacy laws? I actually feel sick that some rouge company would publish my backend information so they can create more advertising revenue.

Any help would be greatly appreciated. Keep up the good work everyone

[edited by: incrediBILL at 10:23 am (utc) on Mar. 10, 2009]
[edit reason] fixed filter issue [/edit]

incrediBILL

4:40 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wonder what would happen to either of these (and similar) sites in the event a web site owner was issued with a legal notice to remove content.

That has already happened with SE cache and Archive.org

That's why I always advocate using NOARCHIVE in every page and block all the archive tools in robots.txt and .htaccess just to make sure.

A good resource on the topic and the issues around it is [noarchive.net...] which has a lot of links back to discussions on WebmasterWorld.

GaryK

4:45 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My IP attorney once told me I can't be held legally responsible for what scraper sites steal from me in the event of a copyright dispute. I don't think that lets these sites off the hook. But at least it's not my problem according to her.

Edit Reason: I can't type tonight.

[edited by: GaryK at 4:46 am (utc) on Mar. 15, 2009]

incrediBILL

4:57 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gary, there's a big difference between being held responsible and being used as evidence.

The archives can sometimes prove you're INNOCENT, but the lawyers are scraping them to find EVIDENCE, so the answer is NOARCHIVE for me:
[webmasterworld.com...]

[edited by: incrediBILL at 5:09 am (utc) on Mar. 15, 2009]

GaryK

5:08 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Gotcha. I didn't think about that aspect of it. ;)

enigma1

11:54 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



so the answer is NOARCHIVE for me...

Bill, I am not sure, how this will accomplish anything.

How will anyone ever know if a spider doesn't store and archive web content? Just because it is not listed, doesn't say anything.

It could easily be stored and be accessible only for "private use" and when the "time is right" will be presented.

And from what I am reading online, it is likely the case. So many "spiders" indexing content and do not even have a search page.

incrediBILL

5:39 pm on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How will anyone ever know if a spider doesn't store and archive web content?

You know the SEs still have that cache page even if it's not displayed, but that's not the issue. Having it internally for indexing purposes and displaying it externally to allow others is two different things. You really can't stop the internal use unless you take your site offline but you can stop public displays.

MarkDilley

9:51 pm on Mar 16, 2009 (gmt 0)

10+ Year Member



AboutUs is dedicated to providing a useful service for the web and we take very seriously what you say about what we're doing. We may not agree with you on every point, but you can be assured that we're open to hearing what you have to say, here or anywhere.

A few of us have been distilling this forum thread. Here is where we are right now with all of your input:

* Some good ideas around the NoArchive / retro removal of algorithmic data - including the NoArchive.net link.
* We have known that we needed a clearer info page, so this conversation is prompting us to do something like a FAQ page, more visibly.
* We are working to improve our related domains algorithm.
* Recently we added an entry onto BugMeNot.com for people who don't want to login.
* We will also work to make a no-login contact form.

We are available through many channels, please see our contact page: [aboutus.org...]

dstiles

10:40 pm on Mar 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, Mark: basically you are going to carry on with pre-planned development and ignore the awkward points made in this forum, including copyright issues and reaction to robot rejection.

And obviously, since you haven't tried to contact me through this forum, you don't want me to send my list of domains (and no, I'm not going through your nightmare forms again).

MarkDilley

11:11 pm on Mar 16, 2009 (gmt 0)

10+ Year Member



We don't believe there is a copyright problem here and this conversation did change our direction - sorry if that wasn't clear.

incrediBILL

12:07 am on Mar 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



* Some good ideas around the NoArchive / retro removal of algorithmic data - including the NoArchive.net link.
* We have known that we needed a clearer info page, so this conversation is prompting us to do something like a FAQ page, more visibly.

Mark,

Sounds like you're prepared to make some good progress to address some of our concerns.

However, the one thing people don't like is that your site shows up when people search for their domain or brand name in the search engine.

If you won't let people remove their page containing their brand name / site name from your service, how will they be able to stop you, short of exploring legal options, from ranking alongside their website for their own brands?

That's the rub that you don't seem to be addressing.

I can see someone easily making a leap from new law trying to be passed in Utah which would ban Trademark-triggered competitive ads to extend to behavior of how AboutUs and similar services are similarly piggy-backing onto Trademarks in the search engines:

[webmasterworld.com...]

The best way to avoid having our industry legislated is to provide an opt-out for those that want it as simple technology today can thwart government interference tomorrow.

incrediBILL

12:37 am on Mar 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Whitelisting and re-checking is fine for a few sites or if you can automate it, automating robots.txt on IIS requires a few changes I'm not prepared to make.

You don't have to automate robots.txt to whitelist it, you can expand the list I showed to contain lots of paths and such. It's just that anything not specifically mentioned gets kicked to the curb.

Someone that specializes in doing this in IIS is Ocean10000 right here on WebmasterWorld, I'm sure he would give some pointers. ;)

dstiles

2:53 am on Mar 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mark, there IS a copyright problem. I did not give your organisation permission to take my text and then corrupt it - which is what you allow people to do; I did not give you permission to display my logos or other images; I did not give permission to take snapshots of my sites for display. In fact, as noted in my earlier post, some of my sites specifically prohibit such use of text, images and other content.

incrediBILL

5:15 am on Mar 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I did not give you permission to display my logos or other images

The only precedent I'm aware of is photographer that litigate for unlicensed usage and if those images have actually Federal copyright, then there are statutory damages possible.

Unfortunately, the intent of the use and fair use still come into play and I don't think the AUP policy on a web site has been tested yet, but I could be wrong.

However, a "License Policy" for materials on a site has been used by photographers so that anyone copying content was sent a bill for the usage fees plainly displayed.

I'm sure the fine people at Getty could sort this out ;)

dstiles

11:10 pm on Mar 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I became aware of copyright (as opposed to just knowing it existed) when I began selling photos and writing for magazines, alas too long ago; happy days! :(

The point that a LOT of people miss is that ALL content on the internet is copyright from the moment it's written (whatever) even if it isn't published, and does not actually need a notice to make it so (I'm considering Western Hemisphere here - certain Asian countries seemingly do not subscribe to the idea that something could be someone else's property).

The purpose of copyright notices on web sites is not to CLAIM copyright but to bring the fact to the attention of potential looters who may be unaware of copyright legislation.

There are obviously reasons for waiving copyright online (eg to allow SEs to bring in trade) but there must be some return on the "investment", which sites like aboutus don't provide.

"Copyright Law fact sheet P-09 : Understanding Fair Use" gives the reason for Fair Use as preventing stifling of free speech and allowing news reporting. It says the use should be "deemed acceptable under the terms of fair dealing." This implies it should not be used for profit.

It finishes: "To avoid problems, if you are in any doubt, you are advised to always get the permission of the owner, prior to use."

I would quote the URL of the online UK copyright service but it would get slapped down. :)

It may be argued that SEs are making a profit from the information they hold, usually through advertising, but since everyone knows about the major engines the site owners have the right (and means) to prevent them taking and using the material. And, of course, there is a return for the copyright holder since SEs bring traffic to the site.

dstiles

5:03 pm on Mar 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Further to this, Mark Dilley emailed me 17th March asking for a list of my domains. I sent a list of 377 domains, most of which were active, a few of which were inactive and a few that were no longer registered but were, at that point, still in aboutus.org. Mark Dilley acknowledged receipt of the list the same day (GMT), promising to let me know when they were all removed.

Seven days later I have not received this notification. I have not tried to contact him on the grounds that it is "his move".

A test site I checked before sending the list still has the same textual information and logo displayed.

I would have thought they would have an automated take-down - paste in a list of domains, all are "blocked". Two minutes top?

So, Mark Dilley?

GaryK

1:31 am on Mar 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seven days later I have not received this notification. I have not tried to contact him on the grounds that it is "his move".

I for one would not stand on ceremony. If I hadn't heard from him by now I'd surely get in touch with him and insist on knowing why he hadn't upheld his end of the agreement.

dstiles

2:38 am on Mar 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, surely he will read this thread again? :)

You're right, though.

GaryK

3:00 am on Mar 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have the feeling he won't be back here anytime soon. But I said that before and then he showed up again. ;)

keyplyr

6:22 am on Mar 25, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To his credit, Mark Dilley *did* remove my trademark, info and content from wrongly being tagged to a 301 hijacker domain shortly after I posted.

I have no problem with my stuff being used on the aboutus page that correctly links to my pages since I allow limited license of use in newsletters, websites, etc for the sole purpose of promoting (broadly interpretative) my website and business services.

What I will not allow is caching of my property and I have removed my site from several directories and search services for not supporting the nocache directive. IMO this is copyright infringement.

MarkDilley

4:15 am on Mar 27, 2009 (gmt 0)

10+ Year Member



Using the quick reply -

Folks I have been buried, I have a 'post' I am working on to answer some of these questions. I am slowly working through the 300+ websites. I appreciate your passion for this and will post something more substantive... I appreciate your patience.

keyplyr

7:52 am on Mar 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Correction: intended to type "noarchive" (not "nocache") in post #:3878267 above.

dstiles

4:52 pm on Mar 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mark: So you don't have an automated script to bulk-process domains? Hmm. What will you do when you get a legal take-down with a few hours to comply? It does happen, you know.

dstiles

7:43 pm on Apr 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Update on this: some of the domains have been removed but others haven't.

Those that haven't still show the copyrighted images and text and the copy / thumbnails are out of date ('cause I've blocked their bot and they refuse to accept the hint).

Those that have been removed invite people to submit their own content, so anyone can write what they like about the sites.

An overall comment on these creatures would have to include several characters from the Shifted keys on the top row of the keyboard.

GaryK

3:34 am on Apr 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Those that have been removed invite people to submit their own content, so anyone can write what they like about the sites.

Seems like a great way for competitors to write bad stuff about you and tarnish your reputation. They write it, Google spiders it, and from then on everyone searching for things related to you get to see bad stuff about you. Lovely!

dstiles

7:48 pm on Apr 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I tracked down a few more of this ilk last night (google for your domain name) and on the assumption they were crawling (worm-like - hah!) from the same server range made sure their server IP range was blocked - most of them were.

A few seem to be allied to aboutus but some seemed to offer owner-only protection, although I didn't check to see if it was effective. Others seemed to be happy to let anyone put stuff in but either hadn't crawled or had got blocked.

One (Alexa, from memory) said my server was down - obviously hadn't bothered to discern between 4xx and 5xx rejection codes. I suppose that in itself could be harmful.

dstiles

10:10 pm on Apr 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mark Dilley has now contacted me to say all of the domains have now been cleaned and the content replaced by a message saying the owner does not want bot-generated info to be shown.

What it does NOT say is that the owner has requested that NO data be displayed at all and that no one should be allowed to enter comments on the domain.

For the record, I replied to him thanking him for the removal, with the rider that I obviously had not had time to check them all and trusted his word on it. I added:

"You STILL have the option for other people to comment on the domains. That is NOT good - if we detect any adverse comments there WILL be trouble."

GaryK

10:30 pm on Apr 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please keep us updated.

MarkK

3:33 pm on Jul 19, 2009 (gmt 0)

10+ Year Member



I thought I'd reply to this thread as it's fairly similar.

I found that the domain tools bot recently started to ignore my robots.txt. So things like historical thumbnails are displayed. What I found annoying is that even when it did work, it looks like it took snapshots of those periods? I do not want any of my content archived by another. More so when it's used for commercial purposes. On top of that, the way their spider follows the robots.txt has always been off.

I tried contacting domain tools and received no reply.

Any suggestions on what to do, or perhaps MarkDilley can take care of it?

wilderness

12:24 am on Jul 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Spry has been blocked for some time on my server.

A 2004 thread under a different name [webmasterworld.com]

Current UA "Jakarta Commons-HttpClient/3.1"

Have Jakarta denied since 2003 and the Spry Class C (0-31) been denied since the 2004 thread.
The UA also gets caught by two additional terms.

Any suggestions on what to do, or perhaps MarkDilley can take care of it?

Relying or not relying upon these bit administrators and/or internet providers are hard learned lessons.

In the beginning administration of your sites (s), it's imperative that you create and/or consistently follow a working plan:
1) Either you initiate litigation when possible (long term
and expensive solutions).
2) You initiate a long range plan and/or policy to act
against (denial) of all these similar types of unwanted
intrusions (whether white-listing or black-listing).

The result of the latter (after some time) will be that you simply won't require the litigation option, and that these, and many other pests, will simply be left in the past.

Many beginning webmasters find the task of comprehending and implementing this awareness and/or action into website (s) daily administration overwhelming, however the only alternative is continued litigation.

You make the choice and determine what is beneficial or detrimental to your website (s).

Woz

1:42 am on Jul 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am getting very annoyed with these AboutUS.ORG type people. I purged scraped information from their page on my site, it was reversed, I purged again, and so they lock the page. What part of Copyright do they not understand?

Of course, the other problem is that if you want to check if they have an entry for one of your sites or not, and it turns out they haven't, the mere act of checking triggers a page being built which compounds the problem.

Not Impressed!

Onya
Woz

This 68 message thread spans 3 pages: 68