Forum Moderators: open

Message Too Old, No Replies

Spider deliberately looking for Disallow: content

Bad bot 199.231.148.148 (URL_Spider_SQL/1.0)

         

Romeo

4:46 pm on Nov 16, 2002 (gmt 0)

10+ Year Member



Just got visited by a bot deliberately looking for content in the "Disallow:" section of my robots.txt.
It started with /robots.txt and looked at it 5 times, fetched the main page "/" and then jumped right into the disallowed stuff, where it got trapped and was blocked out immediately. Tried a few other pages and gone.
Coming from IP address 199.231.148.148 with UA URL_Spider_SQL/1.0.
Beware.

Regards,
R.

mack

5:14 pm on Nov 16, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The spider you saw , I think is part of an sql search engine developed by innerprise. The problem with this kind of spider is that you run it from your desktop machine therefore it could be used down a dial up line and may not have a fixed ip address. By default that bot is designed to respect robots.txt but the user does have the option to alter this setting.

GaryK

6:32 pm on Nov 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My experience with this bot has been it ignores robots.txt more than it looks for it. I've also seen it looking for a robotsx.txt file. Then it goes about its business downloading everything it can.

I can confirm that I've seen it from multiple IP's each of which traces back to one of the major ISPs so it's no doubt desktop users who are using this thing.

This one is in my block list.

mack

12:44 am on Nov 17, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The developers aim for this software was to construct topic specific search engines and/or web directories. It is compatible with a lot of other scripts such as links 2, links sql and hyperseek. when using the software you give it a list of URL's to act as starting points. it them spiders the sites and follows of site links. What makes this bot different from others is that it will only index pages that contain keywords specified by the user.

Hope this throws some more light on it.

Metal480

3:48 am on Dec 9, 2002 (gmt 0)



In addition to what has already been said, URL Spider SQL is a very out-of-date first public release of what is now known as Enterprise Search. One of the bugs it contained was its inability to obey the robots.txt (it would download it but ignore it).

One correction though to what’s already been said, this product isn’t the one that is compatible with the third-party scripts (Hyperseek, Links, etc.). This one only works with Microsoft SQL Server. URL Spider Pro is the product that works with other database formats.