Forum Moderators: open

Message Too Old, No Replies

User-Agent: Nelian Pty Ltd - Spider v2.1

Owner claims he occasionally crawls as Google

         

GaryK

3:12 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member


User-Agent: Nelian Pty Ltd - Spider v2.1 ( http://pcaccessoriesparts.com )

I have a page on one of my websites where people can download files related to user agents. This page can also be used to check for updates. However, it's a heavy page so I've set up tools to make it easy for people to check for updates without putting any noticeable load on my server.

Despite my pleas that people use the aforementioned tools there are still a lot of people who check the main page. So many that I added a clause to my Terms of Service stating that checking the main page more than once a day is a violation that will result in one or more IP Addresses being added to my ban list.

Let me try and be brief. Something that's hard for me to do. ;)

The above user agent has been hitting that page multiple times per day.

So I finally wrote to the webmaster and politely but firmly told him to stop it or I'll ban his entire company's range of IP Addresses. He wrote back to tell me the page isn't in my robots.txt file and that he can crawl it as often as he wants.

My reply to him made it clear my problem wasn't with him indexing downloads.asp. In fact I want search engines to index it. It's the top ranking page in all the majors using my keyword(s). My problem, as I told him, was that he was violating my Terms of Use.

Here's where it gets interesting.

He wrote back and told me that bots don't have to abide by a site's Terms of Use.

The other thing he said, and I really do not know why, is that he often crawls the web [b]spoofed as Google[/b] and that maybe I was seeing duplicate entries because both bots crawl from the same IP Address. I'm not. When he crawls as Google he uses IP Addresses from an ISP in Queensland, AU. When he crawls using the above user agent the IP Addresses are from a company called ATMLINK, INC. in Los Angeles.

Between his abuse of my Terms of Use, and his admitting to spoofing Google user agents I have enough information to consider him worthy of being banned.

volatilegx

9:30 pm on Oct 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> He wrote back and told me that bots don't have to abide by a site's Terms of Use.

What arrogance! He's right, though. He is under no obligation (nor is anyone) to obey your terms of use. Your only remedy is to refuse service, which means banning him.

I'd have banned him in a heartbeat.

bobothecat

9:38 pm on Oct 3, 2006 (gmt 0)



What arrogance!

I agree... and certainly a site/IP range to be deemed worthy of the 403 ban.

GaryK

12:27 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know the Terms are ultimately only enforceable by banning. Usually though when I contact someone about this problem they apologize and correct their behavior to at least not hit the page more than once a day.

This was the first time I've had a problem with arrogance in the extreme.

BTW this guy just doesn't know when to stop typing. He claims he used to own another bot that I used to have problems with: bdncentral, an Australian company that used to have a search engine, but now seems to simply be a registrar, site designer and host. He encouraged me to Google him to see what a great person he is. Bah!

That's all I have to say about this. I've relayed the facts he shared with me. As always it's up to each of us to make our own decisions about what to do next.

wilderness

1:04 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gary,
I've had the ATM Class C range denied since July of 2005.
Have a note in my records as "mail spammer", howevever no notes about how I determined that.

216.240.159.** - - [30/Jun/2005:19:27:45 -0700] "GET / HTTP/1.0" 200 9106
"-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"

[edited by: volatilegx at 3:00 am (utc) on Oct. 4, 2006]
[edit reason] obfuscated ip address [/edit]

GaryK

3:02 am on Oct 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the info Don. I'm banning ATM at the C range too. It's a small enough range for me to feel comfortable doing that. For now though I just banned the single address from Australia. If he still gets through I'll widen the ban. They have two rather large C ranges.

oceaniana

5:41 am on Oct 6, 2006 (gmt 0)

10+ Year Member


Hello Gary

The truth to the whole story from the original email is listed below, there is also the original second email cause you chose not to reply from the original.

Google terms for a crawler
http://www.google.com/intl/en/terms_of_service.html
From Google site very interesting facts that there are not search engines on the web that read your terms of use
_______________________________________________________
Content Linked to by Google
The sites displayed as search results or linked to by Google Services are developed by people over whom Google exercises no control. The search results that appear from Google's indices are indexed by Google's automated machinery and computers, and Google cannot and does not screen the sites before including them in the indices from which such automated search results are gathered. A search using Google Services may produce search results and links to sites that some people find objectionable, inappropriate, or offensive. We cannot guarantee that a Google search will not locate unintended or objectionable content and assume no responsibility for the content of any site included in any search results or otherwise linked to by the Google Services.
________________________________________________________________

Hello

Your robots text file

User-Agent: *
Disallow: /contact-me/
Disallow: /error/
Disallow: /template/
Disallow: /tools/
Disallow: /versions/
Disallow: /stream.asp

Your robots text file is not going to stop any crawler there(downloads.asp).
Full info here how to setup your robots.txt file
http://pcaccessoriesparts.com/Spider.php

or add this to your robots.txt file. Full block on our bot only for your full site
User-Agent: Nelian Pty Ltd - Spider v2.1 ( http://pcaccessoriesparts.com )
Disallow: /

Block on that page for all robots
User-Agent: *
Disallow: /downloads.asp

Block on our robot only
User-Agent: Nelian Pty Ltd - Spider v2.1 ( http://pcaccessoriesparts.com )
Disallow: /downloads.asp

Add these to your robots txt file for a crawl delay
User-agent: *
Crawl-delay: 17

Your meta tags. You have two robots tags on this page, this is not compliant to stop indexing on this page.You have two strings for robots,

<meta name="robots" content="noarchive">
<meta name="robots" content="index,follow">

Should be like below for no index and no following the links
<meta name="robots" content="noindex,nofollow">


Now after you make necessary changes to prevent your page from being indexed that comply to "The Robots Exclusion Protocol" adopted worldwide as a standard, i can asure you now our spider wont have a hope of indexing materials you dont want indexed.

Full details on "The Robots Exclusion Protocol" is located at the bottom of our page here
http://pcaccessoriesparts.com/Spider.php


Thank You
Brian Neilen
Nelian Pty Ltd



<snip: no emails can be posted anywhere on this system>

Nelian Pty Ltd - Spider v2.1 ( http://pcaccessoriesparts.com )
216.240.157.3
beaver.unixbsd.info
-----
09/27/2006 08:31:03 200 GET browsers.garykeith.com
/downloads.asp browsers.garykeith.com
Nelian+Pty+Ltd+-+Spider+v2.1+(+http://pcaccessoriesparts.com+)
09/27/2006 08:31:03 200 HEAD browsers.garykeith.com
/downloads.asp browsers.garykeith.com
Nelian+Pty+Ltd+-+Spider+v2.1+(+http://pcaccessoriesparts.com+)
09/29/2006 06:56:36 200 GET browsers.garykeith.com
/downloads.asp browsers.garykeith.com
Nelian+Pty+Ltd+-+Spider+v2.1+(+http://pcaccessoriesparts.com+)
09/29/2006 06:56:36 200 HEAD browsers.garykeith.com
/downloads.asp browsers.garykeith.com
Nelian+Pty+Ltd+-+Spider+v2.1+(+http://pcaccessoriesparts.com+)
09/30/2006 06:08:43 200 GET browsers.garykeith.com
/downloads.asp browsers.garykeith.com
Nelian+Pty+Ltd+-+Spider+v2.1+(+http://pcaccessoriesparts.com+)
09/30/2006 06:08:43 200 HEAD browsers.garykeith.com
/downloads.asp browsers.garykeith.com
Nelian+Pty+Ltd+-+Spider+v2.1+(+http://pcaccessoriesparts.com+)
09/30/2006 06:10:44 200 GET browsers.garykeith.com
/downloads.asp browsers.garykeith.com
Nelian+Pty+Ltd+-+Spider+v2.1+(+http://pcaccessoriesparts.com+)
09/30/2006 06:10:44 200 HEAD browsers.garykeith.com
/downloads.asp browsers.garykeith.com
Nelian+Pty+Ltd+-+Spider+v2.1+(+http://pcaccessoriesparts.com+)

Second email

<snip>

___________________________________________________________________

I very rarely use internet explorer and as stated above i use k-meleon and its usually set to google, its easy to serve up expoits based on user agent string, its not as widly used any more(that technich), and at times when im testing my scrits for detection of uer agents i will add my user agent in k-meleon and it can at times stay that way for a few days. Within a week or two my testing will be complete. And i wont be on the net as actively as i am testing.

Well if you truely believe my engine should be banned that does not worry me in the least, you just need to convice the billion odd web masters in the world to do it, while my engine will probaby only get to 20,000 in dex docs.

All the best gary, u need it, too highly stung or stressed over the internet being used.

Thank You
Brian Neilen

[1][[b]edited by[/b]: Brett_Tabke at 12:36 pm (utc) on Oct. 6, 2006][/1]

Brett_Tabke

12:39 pm on Oct 6, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This is one of the risks of this forum.

3 things:

1- when you post someones crawler agent with an ip and url - they are 99.9% of the time going to come here and see it.
2- if you run a bot - do so respectifully - it is your responsibility to make sure it runs in a respectiful manner. Don't be surprised when website owners get upset from you.
3- this thread has come to a useful end. You guys want to talk/discuss it more - I invite you to do so in email.