Forum Moderators: open

Message Too Old, No Replies

Several new spiders

New to me at least

         

volatilegx

4:04 pm on Jul 30, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



# Agada
# UA "Mozilla/4.0 (agadine3.0) www.agada.de"
81.209.140.139

# Ay-Up
# UA "Fred/0.01-dev (Fred; [ay-up.com;...] fred@ay-up.com)"
69.57.157.54

# Nusearch
# UA "NuSearch Spider www.nusearch.com"
82.68.206.22

# Peerbot.com
# UA "PEERbot www.peerbot.com"
213.239.197.150
213.239.206.109

# Terrawiz
# UA "TerrawizBot/1.0 (+http://www.terrawiz.com/bot.html)"
24.6.176.192

# uk-searcher.co.uk
# UA "uk-Searcher(HTTP://WWW.UK-SEARCHER.CO.UK)"
81.27.96.248

wilderness

1:02 am on Aug 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



thanks for the heads-up Dan.
Balam was razzing me the other day implying that I was getting soft ;)
Unfortunately or fortunately?
I'll never see at least three of those spiders as a result of having most of RIPE and othe non-North American ranges denied.

volatilegx

1:38 pm on Aug 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You going soft? If you're soft, I'd hate to meet a hardliner :-)

Dan

wilderness

2:48 pm on Aug 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One which is likley a rogue in which the UA ends with:
";)" I denied them a few days ago and the visitor contines to return to 403's.

I've had a problem for some time with numerous deep-links to my my favicon and referrals from IconSurf. I recently removed my favicon and within a couple of days the Iconsurf surf bot paid a visit "http ://iconsurf.com/" "IconSurf/2.0 favicon monitor (see [iconsurf.com...]

I've had a Indy Library user catch a 403 and change to another UA with the same time (to the second) in a consecutive hit.
216.0.****.x (Anybody desire the full-ip, sticky me.)

Regarding Co-location servers: There are more and more of these crawling out of the cracks. Many facilities are mixing their services to offers normal internet service (dial-up, broadband, T1)with hosting and co-location. It seems a natural utilization of their computers however makes identification or new rogues more difficult.

wilderness

2:50 pm on Aug 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



this smiliey should have read " ; )" without the blank space.

wilderness

10:01 pm on Aug 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've had two different UA's from the same IP six days days apart.
The first was an open source tool for parsing html and the second (today) a new bot name.

The IP ranges fell under a Level 3 range which I had denied long ago. 64.152.xx.xx

This IAR article appears to be related
http ://www.clickz.com/news/article.php/3387971

The article and concept of the article may have good intent, it remains however to me still another 3rd party use for websites.

wilderness

4:56 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



62.241.33.28 - - [23/Aug/2004:08:30:03 -0700] "GET /robots.txt HTTP/1.1" 403 - "-" "amibot"
62.241.33.28 - - [23/Aug/2004:08:30:04 -0700] "HEAD /myfolder/ HTTP/1.1" 403 - "http://www.amidalla.com" "amibot"

403'd off my RIPE denials.
Checked Bull's list and site search with no returns.

Lord Majestic

4:59 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why are you denying access to robots.txt?

GaryK

5:15 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't think he's specifically denying access to robots.txt. He's 403ing everything in certain IP Address ranges.

wilderness

5:36 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why are you denying access to robots.txt?

It's my site!
Reason enough?
If not?
Too Bad!
So sad!

wilderness

5:44 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't think he's specifically denying access to robots.txt.

Gary,
Your are correct. I have the majority of RIPE denied access. (It should be noted that is my personal choice and something that I would not desire to influence others to follow suit.)

I was provided with a solution to allow reading of robots when a range or UA is denied however for some reason the entry fails in my htaccess.
My htaccess is quite extensive with a very small amount of redirects and even though the courtesy of providing bots access to robots.txt is desireable it is not a personal agenda of my sites. Rather my agenda is keeping the desired bots and/or visitors out.

As has been stated many times, my preferences are quite over bearing and not applicable to the majority of websites.
Each webmaster must decide what is beneficial or detrimental to their own website.

GaryK

7:20 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I hope we don't have a misunderstanding. I was not passing judgment on your methods. I was only trying to explain why your robots.txt file was 403ed. I completely agree with you that each webmaster needs to evaluate his or her own needs and take the action he or she deems appropriate. ;)

wilderness

7:23 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gary,
We're ok. Thanks.

Don

wilderness

9:37 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



this bot came by today.
grabbed a main page and three 3rd level pages which do NOT have links off that viewed main page.
I had an old line in which was "begins with" and changed to "contains".

63.200.38.186 - - [23/Aug/2004:10:08:28 -0700] "GET /robots.txt HTTP/1.1"
200 2599 "-" "ScSpider/0.2"

I haven't had anybody from Pac Bell bothering me in quite a while.
Guess they love me again ;)

Lord Majestic

9:50 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Guess they love me again

And my guess is that they noticed you as much as big elephant notices small ant - luckily irrational behavior like yours is a rarity and can be written off :)

wilderness

10:14 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



attempting to hijack another thread are you :(

No need to answer! The question was rhetorical.

Lord Majestic

10:27 pm on Aug 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



my friend - you have 10 out of 16 posts or 62.5% of all posts in this thread, and you did not even start it!

I would not have a chance hijacking your thread ;)

volatilegx

1:37 am on Aug 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey let's keep the discussion on-topic. As I recall, this was MY thread LOL.

wilderness

1:41 pm on Aug 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This thing been poking around for a while and finally did a crawl grabbing main page and 2nd level links. (the refer having been previously discussed, is an active URL.)

24.248.168.184 - - [28/Aug/2004:11:28:14 -0700] "GET / HTTP/1.1" 200 9690
"www.av.com" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

Main page
66.139.77.92 - - [28/Aug/2004:22:00:02 -0700] "GET / HTTP/1.1" 200 9690 "-"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

Main page and 2nd level links
67.138.247.2 - - [29/Aug/2004:05:31:48 -0700] "GET / HTTP/1.1" 200 7326 "-"
"Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"