Forum Moderators: open

Message Too Old, No Replies

acont.de's ACONTBOT

         

Pfui

1:31 am on Feb 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



admin.acont.de
[hilfe.acont.de...] ACONTBOT

German bot, been around for years apparently, but I've never seen it before today, when it stopped by and log-spammed with: [spider.acont.de...]

Of course, the acont.de site has a robots.txt here [acont.de]; and on the German language bot info page, as translated via Google here [translate.google.com], they advise site owners:

'If you do not want indexing ACONTBOT, then adjust your robots.txt accordingly.'

Of course, that's all well and good if your bot bothers to ask for it in the first place...

robots.txt? NO

Can you say 4-0-3 in German?
(Vier-Null-Drei?:)

Samizdata

4:20 am on Feb 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you say 4-0-3 in German?

Verboten!

...

GaryK

6:26 pm on Feb 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've had it banned since 2000.

news7a

10:19 am on Mar 4, 2009 (gmt 0)

10+ Year Member



You have banned ACONTBOT in your robots.txt and however they added your URL to their index?
Could you write the concerning URL and the corresponding Log-Entries?
I think the engine respects the robots.txt.

What's the thing with 4-0-3 ?!

tangor

10:23 am on Mar 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



pfui is pretty sharp on this...

I've seen the bot, too. It does not respect robots.txt. Denied. That gives the 403 response.

news7a

10:29 am on Mar 4, 2009 (gmt 0)

10+ Year Member



thank you, tangor.
I didn't understand your last thing: Where do you get a 403-response?

tangor

10:41 am on Mar 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



.htaccess

Order
Deny,Allow

Search WW for a wealth of info on that topic!

news7a

10:55 am on Mar 4, 2009 (gmt 0)

10+ Year Member



I know about htaccess...
What file was/is htaccess-protected? The robots.txt?!
I still didn't get, what's the link to the robots.txt topic.
Who got the 403-response and where?

tangor

11:16 am on Mar 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Read this forum fully... but the short answer is

You allow robots.txt to anyone. If they don't obey you boot them with:

SetEnvIfNoCase User-Agent "whateveryouwantgone" ban

Then apply a Deny

news7a

11:37 am on Mar 4, 2009 (gmt 0)

10+ Year Member



ok, so it's just a way to react if the spider doesn't respect the robots.txt.

My demand would be to write some details (URL, Log-Entries, Date) or to report it directly to the company (hilfe at acont.de).
So, they could give a statement and if there is a bug, they could fix it.

tangor

11:43 am on Mar 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Deny will still give you log details. Shows up as a 403. As for reporting to the bot company that's your dime (email). Don't expect any response. And if it were me, I wouldn't give any bad behavior bot the time of day much less my email address. Ban it and get on with business!

Pfui

10:07 pm on Mar 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@news7a: tangor is pretty sharp on this... (smiles) If I e-mailed every botrunner of every bad bot that hit even one site, I'd have no time to do anything else. That's not counting that I could find a legit address for every admin, that they'd get my e-mail, that they'd read it, that they'd agree, (or, in the case of individuals using some new somethingorother, that they'd even understand), that they'd do anything about my complaint, that they'd reply...

It's simply easier, quicker, and cheaper, to say:

"Hi, Robot. Here you go, here's this site's robots.txt file, a.k.a. our Terms of Use made specifically for you. See where it says 'Disallow: /'? That includes you, sorry. Goodbye."

And if they ignore the rules? I mod_rewrite them into oblivion just like any other troublemaker. Because that's all they are. Because life's too short.

[edited by: Pfui at 10:10 pm (utc) on Mar. 4, 2009]

GaryK

11:16 pm on Mar 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I suppose if you only get a few rogue bots each week it might be feasible to contact the owner of each one, assuming you could find them. A big assumption IMO since most don't include any way of knowing who owns it, much less how to make contact.

After a few years of this, and scores of badly behaved bots visiting each week, it becomes easier to take Pfui's approach: Obey robots.txt or eat 403s. Or worse, get your IP range blocked. :)

Pfui

3:13 am on Jun 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Undeterred by 403s and increasingly obnoxious:

admin.acont.de
[hilfe.acont.de...] ACONTBOT

robots.txt? NO

Aggressive? YES:

06/15 19:23:58
06/15 19:23:59
06/15 19:23:59
06/15 19:24:00
06/15 19:24:01
06/15 19:24:02
06/15 19:24:03
06/15 19:24:04
06/15 19:27:33
06/15 19:27:34
06/15 19:27:35
06/15 19:27:35
06/15 19:27:36
06/15 19:27:37
06/15 19:27:38
06/15 19:27:39
06/15 19:30:33
06/15 19:30:34
06/15 19:30:34
06/15 19:30:35
06/15 19:30:36
06/15 19:30:37
06/15 19:30:38
06/15 19:30:39

All with fake Ref: [spider.acont.de...]

Pfui

11:04 pm on Jun 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, this bot's fake referer/log spam has been: [spider.acont.de...]

Today, it's repeatedly --

06/21 13:49:51 +http://spider.acont.de
06/21 13:55:28 +http://spider.acont.de
06/21 14:01:10 +http://spider.acont.de
06/21 14:06:10 +http://spider.acont.de
06/21 15:34:35 +http://spider.acont.de
06/21 15:39:55 +http://spider.acont.de
06/21 15:45:11 +http://spider.acont.de

-- and as always:

robots.txt? NO

(I should probably drop them a leave-me-alone/cease-and-desist message. Their one bot hits more than MSN's cadre o' crawlers.)