Forum Moderators: open

Message Too Old, No Replies

Domain Tools Exposed

         

incrediBILL

4:29 am on Jan 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not a bad bot exactly but there's no way to opt-out of their crap so I did an in depth report on Domain Tools Whois and AboutUs.org, not reprinted here, but if you want to keep their crawlers off your site or stop people snooping with their SEO tools everything you need to know is listed below.

These are the basic IPs you need to block Domain Tools Whois SEO tool, screen shots and AboutUs robot.

66.249.16.*
66.249.17.*
64.246.165.* (screen shots)

You could try using robots.txt for the AboutUs bot, but it appears to read robots.txt LAST, so I just whacked the IPs.

Enjoy.

wilderness

7:44 pm on Jan 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{REMOTE_ADDR} ^66\.249\.([0-9]¦[1-2][0-9]¦3[01])\. [OR]

Works for me ;)

Many thanks for the Compass.

Don

incrediBILL

9:39 pm on Jan 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They are also running screenshots from 216.145.16.*

wilderness

9:50 pm on Jan 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"216.145.16."

Have 0-31 of the Class B denied.

Mokita

10:23 pm on Jan 3, 2008 (gmt 0)

10+ Year Member



You might want to add this CIDR to your denies as well: 216.145.16.0/24

As Bill says, it only asked for robots.txt after retrieving the home page - twice in less than a minute, and with different UAs.

It then went on to request the home page again, with a third UA plus all supporting files. I had one of their IPs denied, but not the other two.

It is also interesting to see that the bot sporting the IE user agent is anything but IE - note the difference in file size to the first two requests. I use gzip to deliver pages which slashes the file size significantly, and all the major browsers support it. When a request is made using a browser UA but receiving the full file size, it is a dead give-away that it is a bot doing the asking.

66.249.16.*** - - [02/Jan/2008:13:46:53 +1000] "GET / HTTP/1.1" 200 2173 "http://whois.domain tools.com/example.com" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"
66.249.16.*** - - [02/Jan/2008:13:47:32 +1000] "GET / HTTP/1.1" 200 2173 "-" "Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)"
66.249.16.*** - - [02/Jan/2008:13:47:33 +1000] "GET /robots.txt HTTP/1.1" 200 2017 "http://www.example.com/" "Mozilla/5.0 (compatible; AboutUsBot/0.9; +http://www.aboutus.org/AboutUsBot)"
216.145.16.*** - - [02/Jan/2008:13:50:02 +1000] "GET / HTTP/1.0" 200 7012 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; {E12EDDF0-EE40-C76D-85D0-8861BDE2E7AE}; SV1; .NET CLR 1.1.4322)"
64.246.165.*** - - [02/Jan/2008:13:50:03 +1000] "GET /style.css HTTP/1.0" 403 156 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; {E12EDDF0-EE40-C76D-85D0-8861BDE2E7AE}; SV1; .NET CLR 1.1.4322)"
64.246.165.*** - - [02/Jan/2008:13:50:03 +1000] "GET /images/pic1.jpg HTTP/1.0" 403 164 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; {E12EDDF0-EE40-C76D-85D0-8861BDE2E7AE}; SV1; .NET CLR 1.1.4322)"
64.246.165.*** - - [02/Jan/2008:13:50:04 +1000] "GET /images/file.swf HTTP/1.0" 403 164 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; {E12EDDF0-EE40-C76D-85D0-8861BDE2E7AE}; SV1; .NET CLR 1.1.4322)"
216.145.16.*** - - [02/Jan/2008:13:50:04 +1000] "GET /images/pic2.jpg HTTP/1.0" 200 22077 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; {E12EDDF0-EE40-C76D-85D0-8861BDE2E7AE}; SV1; .NET CLR 1.1.4322)"

<edit>Doesn't pay to start a reply then get distracted - or other people post the same info before you :-/ </edit>

[edited by: Mokita at 10:49 pm (utc) on Jan. 3, 2008]

[edited by: volatilegx at 3:05 pm (utc) on Jan. 4, 2008]
[edit reason] obfuscated ip addresses [/edit]

incrediBILL

1:11 am on Jan 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The AboutUs wiki today accused me of confusing AboutUs and Domain Tools.

I'm not sure what the confusion is because I never claimed they were the same company.

However, they operate from the exact same nameintel IP pool, have the same street address on their WHOIS listing, share thumbnails make by Domain Tools, link from to AboutUs from Domain Tools Whois... yup, the Alzheimers set in, I'm totally confused.

[edited by: incrediBILL at 1:11 am (utc) on Jan. 4, 2008]

wilderness

3:59 am on Jan 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bill,
Word is that Wiki has a new bot that is supposed to begin on Jan 7th.

incrediBILL

4:37 am on Jan 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yup, word is they used nutch so if it's not one of the 585 nutch IPs I just posted the other day I'll be shocked.

blend27

4:15 pm on Jan 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just Checked one domain there:

Whois History: 39 records have been archived since 2002

I wonder what they mean by that, cause it's realy big news for me.

bcolflesh

4:26 pm on Jan 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check incrediBILL's blog for more info on this - pretty interesting... one of his blog comments claims that Name Intelligence are the ones behind both.

incrediBILL

7:42 pm on Jan 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Other than the fact that they operate from the same IP pool...

Here's another connection:

One of the questions posed was about our connection with Name Intellignece. Jay Westerdal, CEO of Name Intelligence.com, in fact, recently stepped down as AboutUs CTO...

<snip>

[edited by: incrediBILL at 7:43 pm (utc) on Jan. 4, 2008]

[edited by: volatilegx at 11:09 pm (utc) on Jan. 6, 2008]
[edit reason] no blog links please [/edit]