NextGenSearchBot 1

Forum Moderators: open

Message Too Old, No Replies

NextGenSearchBot 1

Did not obey robots.txt. Is it new?

cybertime

1:04 pm on Nov 14, 2004 (gmt 0)

NextGenSearchBot 1 - The associated user agent is NextGenSearchBot 1 (for information visit
[eliyon.com...] )- did not obey robots file and got caught in my trap today.

Is this a new bot? Shall I ban it permanently?

[edited by: volatilegx at 8:28 pm (utc) on Nov. 15, 2004]
[edit reason] fixed link [/edit]

wilderness

7:23 pm on Nov 15, 2004 (gmt 0)

Is this a new bot? Shall I ban it permanently?

[#*$!...]
their website proclaims a database for business information and peronnel (more or less.)
Should your website stand to benefit from this kind of listing than allow.

I'd be interested to know the IP from your log line?
Since george's canufly is down, I used two others and obtained the follwing IP's.
207.31.249.196
207.69.188.193

My sites have noting to share or benefit from this bot or it's traffic.

Don

cybertime

8:00 pm on Nov 15, 2004 (gmt 0)

Thanks wilderness.

The ip information is:

207.31.251.140--[14/Nov/2004:07:24:51-0500]GET /robots.txt HTTP/1.120062-NextGenSearchBot 1 (for information visit [eliyon.com...]
207.31.251.140--[14/Nov/2004:07:24:51-0500]GET / HTTP/1.120015403-NextGenSearchBot 1 (for information visit [eliyon.com...]

wilderness

8:30 pm on Nov 15, 2004 (gmt 0)

Thanks cyber.

The dreaded Verio ;) almost as bad as ThePlanet ;)

Eliyon

7:12 pm on Nov 19, 2004 (gmt 0)

I read the posting that NextGenSearchBot did not obey the robots file at (link removed). I am a representative of Eliyon Technologies, and I am extremely concerned by this. It is our goal that NextGenSearchBot obey all robot exclusions without fail. I apologize for the inconvenience this may have caused and would like to correct the problem as quickly as possible.

In looking at the website and its robots.txt file, I am wondering if perhaps the error is in the robots.txt file. It currently reads:

User-agent: *
Disallow:

User-agent: *
Disallow: /trap.pl

According to the robots.txt standard at [robotstxt.org...] a disallow line is defined as: �The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html.�

On the homepage of the website, there is a link to a document at /cgi-bin/trap.pl, which I am assuming is the robot trap. According to the standard, this link would not be covered by the disallow line of �/trap.pl�, and would explain why NextGenSearchBot went ahead and visited the document.

Again, I apologize for any inconvenience, and if I am misinterpreting the standard, I would appreciate feedback.

[edited by: volatilegx at 2:03 am (utc) on Nov. 21, 2004]
[edit reason] removed link [/edit]

volatilegx

2:04 am on Nov 21, 2004 (gmt 0)

Welcome to WebmasterWorld, Eliyon. We appreciate your willingness to figure out the robots.txt problem.