Forum Moderators: open
I'm a total noob to this forum, and to be honest I'm not super tech saavy but I do my best to get by... on to the issue:
Is there anyway to remove/purge your info from a site like domain tools or aboutus.org?
I've found another post on the site made by IncrediBill which showed me how to block the perpetrators IP Addresses... but I'm not sure if we can take it to the next level and get rid of all their stolen information.
I'm a musician, but I don't make enough money yet to be free from the employment world. The problem is these sites make it dam near impossible for me to have clean google results for my first+lastname. I am making a general assumption that in a tough job market, employers would choose somebody who does not have any other extra-curricular activities that they work on over somebody who does; especially when it comes to music, because let's face it, all musicians are peace pipe smoking liabilities (totally not true, but I believe this is the stigma...)
Another thought, am I the only one who finds these domain tools type websites horrible offensive and a huge breach of privacy laws? I actually feel sick that some rouge company would publish my backend information so they can create more advertising revenue.
Any help would be greatly appreciated. Keep up the good work everyone
[edited by: incrediBILL at 10:23 am (utc) on Mar. 10, 2009]
[edit reason] fixed filter issue [/edit]
The only way to solve the problem is to get a private registration using a proxy service, but if you do that after your domain has been registered previously without a private registration, there's history available for anyone willing to pay to find out what it was previously.
As far as I know the only way to avoid the problem is to cancel the old domain and register a new domain and make it private from the initial registration.
Using a 301 redirect you should be able to point the old domain to the new domain, then discard the old domain after a period of time.
I don't think they track an old domain being 301d to a new domain, so that's probably your only recourse.
Typically Aboutus.org will scrape pages labelled "About" or addressed about.php, about.htm, and so on. So if you want to erase their data more effectively you need to let the IP in, but use a script to cloak it and send it to a page that no-one else will see. A short message about the wickedness of copyright infringement might be appropriate.
Of course if you don't want potential employers to look you up, your best bet is to adopt a stage name that they won't be searching for, and go in for some reputation management SEO for your real name.
I signed up for an account and began to remove the information. Within an hour some moron began replacing it. Meanwhile my account had been blocked.
I found a way of complaining (their contact form is not a model of functionality or ease of use). I was told by email to submit my domains to them and they would remove all non-domain information from their database. I submitted something like 300 domains.
I checked a handful of the domains and they seemed to be empty of info. Re-checking now I see that some of my domains now have info again, taken from the site - AND they have stolen my logos!
To be fair, other domains have a note that we did not wish to be listed. Perhaps they ran out of time to complete the removals.
At the time of submitting the domains list I did warn that if their listing, which is commonly high on google, caused my company damage by mis-representation I would consider legal action.
From my own experience, ANYONE can sign up to add or alter information about a domain - potentially with criminally damaging effect without proving ownership. I suspect replacing my data was due to me completely removing the information.
Nor are they above listing domains that have no web site. A message comes up asking you wait whilst they compile information, then ask if it is a valid domain - you tell me, you suggested it! (Yes, the domain is valid but the robot block was successful.)
These so-called information sites are proliferating and contributing to the innacuracy of the web, as well as suppressing real sites by being listed high on search engines for stolen content. They are using OUR sites for THEIR gain. Google is partly to blame for this in allowing them to rise high: they should be listing the real sites not someone's un-checked opinion of them on a scam site. General wikis are bad enough but when they claim to be this authorative...
At the moment the web browser is displaying a really annoying and continuous popup on the aboutus pages saying "The google apps api key used on this site was registered for a different web site. You can generate a new key for this site at (google URL)." As if I cared!
Their site is hosted with Spry Hosting as Name Intelligence Inc on the range 66.249.16.0 - 66.249.17.255.
Their bot came in as:
IP: 66.249.16.nnn
UA: Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)
Spry has been blocked for some time on my server.
Just checked my main money site and they've got outdated information from late 2006. They also list a slew of related domains that I don't own and never have owned including php.net. I wish I owned php.net!
I don't understand how a site like this can get away with listing such badly outdated and outright incorrect information without someone making a legal issue of it. Why haven't the folks at PHP taken issue with AboutUs stating that I own PHP's domain name?
Also, how many people can afford to begin a law suite, especially outside their own country? Apart from large companies, of course, and then see above. :(
As I said, I blame google for letting them (and other similar scavengers) get away with it.
<update>
I hadn't looked at AboutUs in ages, it appears I'm incorrect!
[aboutus.org...]
Not sure that'll stop the screen shots or Domain Tools though so I have their IP range blocked.
</update>
[edited by: incrediBILL at 9:47 pm (utc) on Mar. 14, 2009]
How do I prevent the bot from gathering info about my site?Using a robots.txt file, you can choose not to have your future AboutUs.org pages initialized with selected content from your website. This doesn't mean that we won't create a Wiki Page for your website. Our users should still have the opportunity to contribute their own content describing your site, as well as adding their own reviews.
In my experience they will not alter it once it's there, as witness a complete change of purpose for a couple of my sites in the past few years. Which can be very damaging if you buy a domain that previously belonged to a baddie, especially if you've never heard of the scammers.
If you used DMCA how can they get away with that? If they refuse to remove it isn't their service provider/host required to remove it considering it's rather obvious AboutUs has no rights to your content?
According to Section 4 of their Intellectual Property Policy ( [aboutus.org...] ), they do comply with DMCA letters.
In Section 2 of that policy, they argue that some content is "fair use". I don't know if section 2 supersedes section 4?
Any lawyers here?
As part of the community team at AboutUs, I wanted to respond to some of information above.
* Is there anyway to remove/purge your info from a site like aboutus.org?
** Yes, [aboutus.org...]
* I signed up for an account and began to remove the information. Within an hour some moron began replacing it. Meanwhile my account had been blocked.
** What is your account name? I'd like to see who did the blocking and find out why.
* I suspect replacing my data was due to me completely removing the information.
** You are correct, it is our policy to replace pages that are simply blanked. If someone edits the page to remove info, but leaves the headings, we do not replace it.
* I checked a handful of the domains and they seemed to be empty of info. Re-checking now I see that some of my domains now have info again, taken from the site - AND they have stolen my logos!
** Could you please let me know which pages? I'm not aware of us ever having repopulated information from a page that was formerly 'NoBotted'. I'd like to check it out.
* They also list a slew of related domains that I don't own and never have owned including php.net.
** Related Domains have never been intended to imply co-ownership. I'm not entirely sure how our bot decides what's related. Sometimes it appears to be sites you're linking to from your site or being linked to from another site. Other times, I'm not so sure. Our developers are in the process of re-writing the bot and revising the related domains algorithm.
* Robots.txt has no impact on AboutUs.org, there is no bot page, they don't care about robots standards best I can tell.
** We do care and have that addressed here: [aboutus.org...]
* At the moment the web browser is displaying a really annoying and continuous popup on the aboutus pages saying "The google apps api key used on this site was registered for a different web site. You can generate a new key for this site at (google URL)."
** Which URL are you visiting when you see this error? We would like to get this bug fixed.
* Also, how many people can afford to begin a law suite, especially outside their own country?
** I understand how you feel; many times it seems as if the threat of a law suit is the only way to get a big company to pay attention. We strive to be different than that, community is at the heart of AboutUs, and working to resolve community concerns is what I do. I am happy to talk with you by phone or email, or even here in this public forum.
Best, Mark
[edited by: incrediBILL at 2:43 am (utc) on Mar. 15, 2009]
[edit reason] no signature links tos #13, specifics removed [/edit]
class-action suit
Nope, we can't discuss legal actions here, short of the use of DMCA for copyright takedowns, per TOS#26:
26. Claims of action, flames, and calls to action against any company or person will be removed.
If you used DMCA how can they get away with that?
DMCA doesn't stop fair use so small snippets of text can still remain regardless of the DMCA, be careful when using the DMCA because if you aren't on solid grounds with copyright laws and get a site disabled they can counter-sue for damages.
[edited by: incrediBILL at 9:50 pm (utc) on Mar. 14, 2009]
I'm glad to see you've added robots.txt options to your site, but does it retroactively remove content like Archive.org does?
What I mean is if I blocked the AboutUsBot today, does it remove the information previously gathered from our sites on the next visit?
[edited by: incrediBILL at 10:33 pm (utc) on Mar. 14, 2009]
* NoBot
I originally looked for removal information and could find no link to (eg) that page - or any other such page for that matter. When I tried to fill in the only form I could find it took several submissions with a password I was sure was correct before it was accepted. I was not pleased at the end of that session and even less so when all the information returned! And why should I have to password a complaints form?
From that nobot page:
"We choose not to completely remove pages from our system because AboutUs aims to be a guide to websites, and deleting a page would make us that much more incomplete."
Instead you prefer to have inaccurate, misleading and obsolete information stolen from my web sites. I did NOT give your company permission to hold extracts of my data nor to display my logos or thumbnails of my sites. And do not quote "fair use". I know what that means and you are abusing the principle if for no other reason than you permit others to modify the text to make it say what it was never intended to say but which viewers might think it did.
All of our newer sites include, in the AUP, the phrases:
"Content may not be held on another web server or used in any commercial form without the content owner's express written permission."
and
"You may not harvest or otherwise obtain or use information from this web site for commercial resale or advantage."
I think that about covers your site's abuse of mine. Have you inspected the AUP of ANY site you list?
* account name
<myaccount>- replaced by <employee> who then volunteered to remove my domains. I guess she ran out of patience part way through the list of 370+ domains.
* blanking sites
Some information to that effect would help. However, there is still the implication that ANYONE can change the entry, with drastic consequences for a web site, especially if the web site owner is blissfully unaware of you.
That, of course, assumes that anyone reads your site in the first place. Why anyone should place credence in a wiki that anyone can alter I have no idea. I certainly don't BELIEVE anything I read in a wiki without corroborating it elsewhere.
* Unremoved domains
The person named above has a full list or can copy the list to you upon request.
* related domains
So you are quite happy to associate domains that may be "bad neighbourhood" with my sites. You DEFINITELY need to work on that one. Likewise keeping up to date with obsolete domains.
Has it occurred to anyone there that if a domain returns a 4xx error then it may not exist or has blocked you for scraping? 403, for example, means you are not welcome. TAKE THE HINT! Remove the site.
* robots.txt
There is an entirely false assumption, mostly by botmasters, that a site can easily avoid being scanned by adding a line to this file. In practice the file is absolutely useless except to GUIDE major, well-known SEs (and by implcation permit them to hold approved data from the site).
Most bots, if they take any notice of it at all, are usually unknown to site owners. To pursue every bot that lands on a site, test it for compliance and add it to this file is far too time consuming for webmasters. It is far easier to block by IP range or, if the bot is honest, by UA - one word entered in a server-wide file against several lines in every one of several dozen files scattered across the server for each of - oh, let's say, 5,000 bots and counting?
* annoying popup
I have no idea now - it was the page that displayed my site details.
* There is no way I can afford a long conversation with America, either in time or cash. If you really care about us then...
1) don't steal our content without EXPLICIT permission ;
2) don't allow non-owners to modify the information;
3) fix all domains so that they can only be modified IF the web site's header has a specific meta tag tied to a login (as google etc do);
4) when an owner removes ALL information, keep it that way!
5) when you get a 4xx error, dump the site.
YOU may think that "community" is at the heart of aboutus. Why should anyone care? If we want information about a web site there are plenty of ways of finding out and I doubt yours is top of most peoples' list. Your "service" is of little help to web surfers and can be annoyingly misleading; and it is potentially dangerous to web site owners if anyone finds the incorrect information and BELIEVES it.
Whatever you may think about your "community", at the end of the day the site is a commercial undertaking. It is making money out of US, first from exploiting our registration details and then from theft of our COPYRIGHTED site content.
[edited by: incrediBILL at 2:45 am (utc) on Mar. 15, 2009]
[edit reason] removed specifics tos #13, keep it civil tos #4 [/edit]
To pursue every bot that lands on a site, test it for compliance and add it to this file is far too time consuming for webmasters.
The problem is it's a robots EXCLUSION protocol, not an INCLUSION protocol
That's why I whitelist my robots.txt file and convert it to an INCLUSION protocol:
#allow these bots
User-agent: googlebot
User-agent: slurp
User-agent: msnbot
User-agent: teoma
User-agent: Mediapartners-Google*
Disallow:#block all other bots that ask
User-agent: *
Disallow: /
Then you stop chasing thousands of bots that honor robots.txt and you can review a months worth of robots.txt requests at your leisure and see if anything is worth letting in.
I have felt for several years that robots.txt is, like a major part of the internet, rather antiquated (can you say that about something only about 20 years old?). It's all very well google patching and darning but it'll take a major impetus to get robots sorted out, starting with killing botnets and working up.
Aboutus.com uses it as their "archive" link. I found several "saved copies" of my web sites. Sent DMCA notices.
From the iterasi.net home page:
Every day you find web pages you may never see again. Which is fine, unless you actually need that information. Bookmarks don’t cut it. They lead you to where that information was — but not the information itself. With iterasi, you can save any web page and return to it anytime, from anywhere, forever.