Forum Moderators: open
Identifies itself as "Mozilla/4.0 efp@gmx.net"
Came from one IP: 66.230.140.66, which identifies as argon.oxeo.com
It hit about 20% of pages on my website with an interval of about 1.5 seconds.
GMX.net is some popular german portal.
I'll check logs tomorrow to see if it read robots.txt and whatelse...
Anyone knows this beast?
If somebody grabs the whole site with 1 second intervals (or less!) following the exact order or the reverse order of your index page, then
1) The IP often has no reverse DNS.
2) It doesn't look at robots.txt and falls into the spam bot trap.
3) There is no information in the browser type about what it is.
4) If there is a reverse DNS then checking the IP through spamcop often gives an amazing amount of spam from the IP block
5) It's a spambot...
Actually it is impolite to query 100 pages within 5 minutes. If you have just a slow server (most people don't have a big server all to themselves y'know...), it means that you are using a significant amount of processor time which is meant for humans and not for machines.
They're definitively not a big ISP by any measure. Their DSL flatrate is actually exactly the same price (and requires the same DSL line from German Telecom) as I pay for T-online, which are the market leader by a huge margin (~90%).
Here's a link:
[cen.uiuc.edu ]
spamcop.net doesn't recognize this IP. From what I see on other people's logs it does read robots.txt file. My ISP doesn't give me info about robots, unfortunately...
[edited by: Marcia at 5:10 am (utc) on Jan. 5, 2003]
[edit reason] no sigs or URLs please, per TOS [/edit]
Under:
Browser Versions - All
Somebody took the time to compile a lot of useless data.
Many duplications are just variations of browsers with additional software or plug-ins added. Especially the mozilla section.
It did read robots.txt on two sites here which may indicate that it is not a spambot.
Regards...jmcc
GMX started as a free Webmailer and is the biggest of its kind in Germany. You can get a gmx-Adress with almost any fake data. Many people here use gmx-eMail-Adresses on a throw-away-basis.
Thats fine in newsgroups or for websites that require you to register, but using it as a spider-id means someone wants to stay in the shadows.
Very recently they now tried to make cash of their customer base and sell i.E. DSL-Accounts. But are only a Reseller of Germanys leading DSL-Provider, the formerly state owned Deutsche Telekom. The offer of GMX is in no way outstanding and they dont have much marketshare.
I realise that this is an old thread, but i was looking into this after seeing it in my logs too. I got in touch with the people behind the bot/spider and they said...
We are trying to build a META-Search engine. Is there any problem that came to your attention through Larbin running on our Server? Feedback is very important to us.
I wrote back to them suggesting they put up a webpage about the project or at least something to reassure people who find their bot in server logs.
Justin
Regards...jmcc