Forum Moderators: open
Is this a new bot? Shall I ban it permanently?
[edited by: volatilegx at 8:28 pm (utc) on Nov. 15, 2004]
[edit reason] fixed link [/edit]
Is this a new bot? Shall I ban it permanently?
[#*$!...]
their website proclaims a database for business information and peronnel (more or less.)
Should your website stand to benefit from this kind of listing than allow.
I'd be interested to know the IP from your log line?
Since george's canufly is down, I used two others and obtained the follwing IP's.
207.31.249.196
207.69.188.193
My sites have noting to share or benefit from this bot or it's traffic.
Don
The ip information is:
207.31.251.140--[14/Nov/2004:07:24:51-0500]GET /robots.txt HTTP/1.120062-NextGenSearchBot 1 (for information visit [eliyon.com...]
207.31.251.140--[14/Nov/2004:07:24:51-0500]GET / HTTP/1.120015403-NextGenSearchBot 1 (for information visit [eliyon.com...]
In looking at the website and its robots.txt file, I am wondering if perhaps the error is in the robots.txt file. It currently reads:
User-agent: *
Disallow:
User-agent: *
Disallow: /trap.pl
According to the robots.txt standard at [robotstxt.org...] a disallow line is defined as: “The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.”
On the homepage of the website, there is a link to a document at /cgi-bin/trap.pl, which I am assuming is the robot trap. According to the standard, this link would not be covered by the disallow line of “/trap.pl”, and would explain why NextGenSearchBot went ahead and visited the document.
Again, I apologize for any inconvenience, and if I am misinterpreting the standard, I would appreciate feedback.
[edited by: volatilegx at 2:03 am (utc) on Nov. 21, 2004]
[edit reason] removed link [/edit]