Forum Moderators: open

Message Too Old, No Replies

Szukacz

claims to honour robots.txt but doesn't

         

Mokita

2:11 am on Aug 2, 2006 (gmt 0)

10+ Year Member



This crawled one of our sites for the first time last month and I disallowed it in robots.txt. It returned a few days later and fetched but ignored robots.txt.

It has been served 403s ever since, but still it comes back regularly to try again:

193.218.115.7 - - [02/Aug/2006:10:15:15 +1000] "GET /robots.txt HTTP/1.1" 200 1706 "-" "Szukacz/1.5 (robot; www.szukacz.pl/html/jak_dziala_robot.html; info@szukacz.pl)"
193.218.115.7 - - [02/Aug/2006:10:15:16 +1000] "GET /directory/file.htm HTTP/1.1" 403 - "-" "Szukacz/1.5 (robot; www.szukacz.pl/html/jak_dziala_robot.html; info@szukacz.pl)"

Aside from the rudeness, this site belongs to a tiny community located in Australia - I don't understand what interest its highly parochial contents would hold for anyone in Poland.

wilderness

6:48 pm on Aug 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Aside from the rudeness, this site belongs to a tiny community located in Australia - I don't understand what interest its highly parochial contents would hold for anyone in Poland.

That bot been around a long time, in fact, I believe we had a reply in this forum at one time from the folks who run the thingy.

In order to undertand that reigon on the world?
All you need to do is look at old-maps and new-maps and compare how the boundaries have changed multiple times over the years.

Our resident freind (Bull) will fill you in (I Hope) when he gets some free time.
He's currently on a sabatacle with a dingy in city that has channels and channels of water ;)

Don

Mokita

1:57 am on Aug 3, 2006 (gmt 0)

10+ Year Member



That bot been around a long time, in fact, I believe we had a reply in this forum at one time from the folks who run the thingy.

Thanks wilderness - yes I did do a search before I posted and found the thread you refer to:

[webmasterworld.com...]

In this one, you reported that it was using blank UA to request robots.txt but its own UA to fetch a page:
[webmasterworld.com...]

However, no-one had posted that it openly and definitively ignored robots.txt even though they claim to:
[szukacz.pl...]

Also, the threads were quite old, 2002 and 2003, so I felt it deserved a fresh mention.

wilderness

2:24 am on Aug 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



also, the threads were quite old, 2002 and 2003

I've been coming here too long!
Hope Webmaster World has a solid pension plan ;)

GaryK

4:28 pm on Aug 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Szukacz first visited one of my sites in 2000. Its most recent visit was on July 22, 2006. It has always obeyed robots.txt on my sites.

Polish is not limited to Poland. :)

I have a friend who lives in Warsaw who hosts a site in Polish on one of my servers that's located in the US. Hence the reason for Szukacz's frequent visits, 38 so far this year. Any mention of something Polish, like a member's location that shows up in threads on my model car website, seems to attract Szukacz. Or when a discussion mentions something related to Poland.

Their user agent way back in 2000 was: Szukacz/1.5 (robot; www.szukacz.pl/jakdzialarobot.html; info@szukacz.pl) which suggests to me they're fairly happy with their software.

[I wish spell checkers checked grammar too!]

[edited by: GaryK at 4:41 pm (utc) on Aug. 3, 2006]