homepage Welcome to WebmasterWorld Guest from 54.211.180.175
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
DomainCrawler
What purpose does it serve?
GaryK




msg:3768840
 9:33 pm on Oct 18, 2008 (gmt 0)

DomainCrawler/1.0 (info@domaincrawler.com; [domaincrawler.com...]

The above UA, where example.com replaces my actual domain names, has been visiting my sites once a week for the last week or so. Does anyone know what DomainCrawler.com is all about? Is anyone blocking them from crawling their site(s)? Thanks.

 

markbiz




msg:3768979
 6:20 am on Oct 19, 2008 (gmt 0)

Are your domains expiring soon? They visited one of my domains today, but there's no content on it, it redirects to another site of mine. But this domain expires next month......

jmccormac




msg:3768981
 7:15 am on Oct 19, 2008 (gmt 0)

Blocked here.

Regards...jmcc

GaryK




msg:3769195
 7:45 pm on Oct 19, 2008 (gmt 0)

Last night this bot crawled about 30 domains, only a few of which expire soon. Based on that I added it to my banned list. Thanks, guys.

jmccormac




msg:3769755
 5:07 pm on Oct 20, 2008 (gmt 0)

It seems to be scraping/caching whois data as well.

Regards...jmcc

zeus




msg:3780922
 10:27 pm on Nov 5, 2008 (gmt 0)

yes there is getting more and more of those scraping/caching whois sites, its getting on my nervs

zeus




msg:3780939
 11:18 pm on Nov 5, 2008 (gmt 0)

lets make a list of all those #*$!rapers

right now im blocking

<Limit GET HEAD POST>
order allow,deny
deny from 216.145.16.0/24
deny from 66.249.16.*
deny from 66.249.17.*
deny from 64.246.165.*
deny from 64.79.192.0/19
deny from 64.246.165.128/25
deny from 66.249.0.0/19
deny from 209.59.192.0/19
deny from 216.145.16.0/24
deny from 83.168.240.5
deny from 66.249.0.0/19
allow from all
</LIMIT>

but maybe we should place a name also and is there a option to block those on whole server, instead changing htaccess on each domain

GaryK




msg:3780949
 11:49 pm on Nov 5, 2008 (gmt 0)

One of the things I like about IIS is the ability to block an IP Address or range across all domains. Surely there's a way to do that with Apache.

zeus




msg:3780953
 11:55 pm on Nov 5, 2008 (gmt 0)

yes I thought so, I think I have done that some time ago, but Im no server dude - I used some free software where I saw my server like windows folders, but I can remember the soft. or which folder I have made changes to.

Also lets have a list of bad #*$!raper IPs

I do have APache

jdMorgan




msg:3780986
 1:29 am on Nov 6, 2008 (gmt 0)

I'd suggest something like this.

I removed IP address range duplicates and overlaps.

I also removed the <Limit> container, since using <Limit> as shown would have allowed other methods, such as PUT and DELETE -- without any restrictions.

I show a safer, friendlier Allow/Deny construct:

All IP addresses can access robots.txt (it is only fair to warn them that they are Disallowed, and many simple-minded robots will treat any error fetching robots.txt as permission to spider the entire site).

All IP addresses are also allowed to fetch the custom 403 error page, so that a 403 error when an IP address is blocked does not result in another 403 as the client attempts to fetch the custom 403 error document, and another 403 because that fails, and another, and another... (This avoids a looping self-inflicted Denial-of-Service attack.)

I also show an example of the syntax for blocking the "example.com" domain.

# Set EnVar to allow all IP addresses to access robots.txt & custom 403 error page
SetEnvIf Request_URI ^/robots\.txt$ allowit
SetEnvIf Request_URI ^/path-to-your-custom-403-error-page\.html$ allowit
#
# Configure so that Allow is the default state, and Allows can override Denys
Order Deny,Allow
#
# Not sure who this denies
Deny from 64.79.192.0/19
# Name-Intelligence, Whois.ess-cee, Do-main-Tools at Compass
Deny from 64.246.165.0/19
# More from Compass
Deny from 64.246.165.128/25
# Name-Intelligence at Spry Hosting
Deny from 66.249.0.0/19
# Domaincrawler at Chrystone AB
Deny from 83.168.240.5
# More Spry Hosting
Deny from 209.59.192.0/19
# Name-Intelligence at Compass
Deny from 216.145.16.0/24
#
# An example of hostname blocking
Deny from example.com
#
Allow from env=allowit

Note that adding this "Deny from example.com" directive will put the server into "reverse-DNS lookup mode." Unless your server logging is configured properly, you may find that your server now logs hostnames instead of IP addresses, which can be quite annoying. The only way to fix that is either to change the logging format (see mod_log_config) or to remove all Allows and Denys that use the "Deny from hostname" notation.

Jim

zeus




msg:3794919
 9:30 pm on Nov 26, 2008 (gmt 0)

just got a a reply from my email, I told them to stop place all whois content on there site just to make money, they replyed with ok, but we are allowed to do that.

Here's a small snippet of godaddys privacy policy:
"We will share your information in order to comply with ICANN's rules, regulations and policies."

ICANNs privacy policy says that all information must be made public. If you don't want it to
be public it's possible to get around that by purchasing "Whois Privacy Protection" by godaddy.

and there own domain they also have privacy shield on it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved