Forum Moderators: open

Message Too Old, No Replies

Anyone knows this bot: "Mozilla/4.0 efp@gmx.net"?

question about new bot efp@gmx.net

         

aleksl

8:34 pm on Jan 3, 2003 (gmt 0)



This bot came today, seems to be from Germany.

Identifies itself as "Mozilla/4.0 efp@gmx.net"
Came from one IP: 66.230.140.66, which identifies as argon.oxeo.com

It hit about 20% of pages on my website with an interval of about 1.5 seconds.

GMX.net is some popular german portal.

I'll check logs tomorrow to see if it read robots.txt and whatelse...

Anyone knows this beast?

eddier

4:30 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



You've said it yourself. If it takes most of the site at 1,5 second intervals it's bound to be a spambot.

gmx.de is one of the largest German ISPs, however the IP you've listed is registered to a New York location...
So that should answer it a little bit.

It's probably just another spam bot

bird

8:08 pm on Jan 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



gmx.de is not an ISP. They offer free e-mail accounts similar to hotmail, yahoo etc., so that this address doesn't really say anything about who is behind it.

eddier

8:24 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



I didn't know about them offering free e-mail services as well.

But I've seen gmx.de a lot in the logs. After t-online.de they are the largest german ISP as far as I can tell. That's because they offer really cheap DSL. Just check their site

aleksl

8:38 pm on Jan 4, 2003 (gmt 0)



It's "bound to be"?.... Hmmm...Googlebot sometimes hits my site with 3 second intervals. They have 10,000 servers, it's very difficult to control, and I don't mind that. Some other spiders (specific to the topic of my site) sometimes come in under 1 second intervals. Why is it considered to be impolite?

eddier

9:00 pm on Jan 4, 2003 (gmt 0)

10+ Year Member



I've noticed that Google comes in small bursts. But all in all it never takes that many pages at a time. And the same applies to all major search engines. They may query 100 pages/day but it's quietly spread over the day.

If somebody grabs the whole site with 1 second intervals (or less!) following the exact order or the reverse order of your index page, then
1) The IP often has no reverse DNS.
2) It doesn't look at robots.txt and falls into the spam bot trap.
3) There is no information in the browser type about what it is.
4) If there is a reverse DNS then checking the IP through spamcop often gives an amazing amount of spam from the IP block
5) It's a spambot...

Actually it is impolite to query 100 pages within 5 minutes. If you have just a slow server (most people don't have a big server all to themselves y'know...), it means that you are using a significant amount of processor time which is meant for humans and not for machines.

bird

11:18 pm on Jan 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



GMX is first and foremost an e-mail service, and that's what they're known for with most people.

They're definitively not a big ISP by any measure. Their DSL flatrate is actually exactly the same price (and requires the same DSL line from German Telecom) as I pay for T-online, which are the market leader by a huge margin (~90%).

aleksl

1:12 am on Jan 5, 2003 (gmt 0)



I could find almost nothing about it on the web.
This link shows server statistics at the University of Illinois at Urbana-Champaign. There's a hit from "mail_fetch efp@gmx.net", which I think could be the same bot....therefore successfully identified as spammer.

Here's a link:
[cen.uiuc.edu ]

spamcop.net doesn't recognize this IP. From what I see on other people's logs it does read robots.txt file. My ISP doesn't give me info about robots, unfortunately...

[edited by: Marcia at 5:10 am (utc) on Jan. 5, 2003]
[edit reason] no sigs or URLs please, per TOS [/edit]

wilderness

4:50 am on Jan 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>Here's a link:
[cen.uiuc.edu...]

Under:
Browser Versions - All

Somebody took the time to compile a lot of useless data.
Many duplications are just variations of browsers with additional software or plug-ins added. Especially the mozilla section.

jmccormac

9:25 am on Jan 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've seen it on by site and purely by instinct I deepsixed it (blocked). The reason that I blocked it was that the site runs on a 128K leased line and has about 64K webpages. It looked like a spambot and traced back to a hosted site.

It did read robots.txt on two sites here which may indicate that it is not a spambot.

Regards...jmcc

transistor

11:13 pm on Jan 7, 2003 (gmt 0)

10+ Year Member



It seems to me, because of the structure of the name, it is some use of the Larbin indexing bot, which often looks like this:
larbin_2.6.2 larbin2.6.2@unspecified.mail
It is my guess that someone configured it to look like a Mozilla browser.
Anyway, I think you did right banning this one, I have to.

weesnich

12:07 pm on Jan 8, 2003 (gmt 0)

10+ Year Member



Just want to support bird.

GMX started as a free Webmailer and is the biggest of its kind in Germany. You can get a gmx-Adress with almost any fake data. Many people here use gmx-eMail-Adresses on a throw-away-basis.
Thats fine in newsgroups or for websites that require you to register, but using it as a spider-id means someone wants to stay in the shadows.

Very recently they now tried to make cash of their customer base and sell i.E. DSL-Accounts. But are only a Reseller of Germanys leading DSL-Provider, the formerly state owned Deutsche Telekom. The offer of GMX is in no way outstanding and they dont have much marketshare.

citydiscounts

10:51 pm on Feb 13, 2003 (gmt 0)



Hi,

I realise that this is an old thread, but i was looking into this after seeing it in my logs too. I got in touch with the people behind the bot/spider and they said...

We are trying to build a META-Search engine. Is there any problem that came to your attention through Larbin running on our Server? Feedback is very important to us.

I wrote back to them suggesting they put up a webpage about the project or at least something to reassure people who find their bot in server logs.

Justin

jmccormac

12:10 am on Feb 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A META-search engine is essentially one that submits searches to other search engines and formats the results for the user. Thus I find the META-search engine explanation a bit hard to believe. Also the manner in which it was hitting my site (which is a 64K page directory as well as two search engines) indicated that it was just trawling webpages like a spambot. Perhaps if they published more info as per your suggestion, people may allow them access to their sites but until that happens, they are just a waste of bandwidth and directory operators will instinctively ban them.

Regards...jmcc