keyplyr

msg:4114628 | 10:11 pm on Apr 12, 2010 (gmt 0) |
I have it coming from Leaseweb, Amsterdam: 85.17.226.0 - 85.17.226.255 Other Leaseweb ranges: 85.17.173.0 - 85.17.173.255 I've blocked them for quite a while.
|
jdMorgan

msg:4119150 | 12:17 am on Apr 21, 2010 (gmt 0) |
You may wish to expand on that range a bit. The RIPE route info indicates they've got the whole /16: route: 85.17.0.0/16 descr: LEASEWEB Jim
|
keyplyr

msg:4153090 | 6:08 pm on Jun 15, 2010 (gmt 0) |
twengabot also coming from another Leaseweb range: 94.75.247.0 - 94.75.247.255 route: 94.75.192.0/18 Robots.txt: yes
|
Staffa

msg:4155284 | 3:52 pm on Jun 19, 2010 (gmt 0) |
I saw it today coming from 94.75.247.nnn Took robots.txt and stepped right into the honeypot. Then left to go wash its feet ;)
|
Pfui

msg:4155320 | 7:08 pm on Jun 19, 2010 (gmt 0) |
In addition to asking for but ignoring robots.txt, it appears TwengaBot has already changed its name and URL string/country. This just in: 94.75.247.24n TwengaBot-Discover (http://www.twenga.fr/bot-discover.html) 06/19 11:51:38 /robots.txt 06/19 11:51:39 /
|
wilderness

msg:4155337 | 7:54 pm on Jun 19, 2010 (gmt 0) |
Leaseweb? No robots.txt, grabbed every page: 94.75.229.zzz - - [18/Jun/2010:14:32:48 -0600] "GET / HTTP/1.1" 200 5356 "-" "Mozilla/5.0 (compatible; Purebot/1.1; +http://www.puritysearch.net/)"
|
seoogle

msg:4174952 | 8:11 pm on Jul 22, 2010 (gmt 0) |
Interesting press release from Twenga. leaseweb(dot)com/en/press/twenga However, most of the Internet traffic is generated via Twenga’s crawler. "A good search engine that displays accurate results with the right products and prices needs a lot of capacity," says Anes. "At this moment the TwengaBot uses about 100 dedicated servers that together scan around 100 million web pages per day. But I expect to be needing at least 500 dedicated servers at the data center within two years." As a result, the Twenga crawler also needs a huge quantity of bandwidth. "We consume a total of 2000 Terabytes of data traffic per month. Three- quarters of that is generated by the crawler, but fortunately LeaseWeb operates a hosting network with a lot of bandwidth," laughs Anes. Nice to know that while Twenga is expanding on the backs of small businesses they are getting a good laugh at your expense.
|
blend27

msg:4174995 | 9:07 pm on Jul 22, 2010 (gmt 0) |
having a NOSCRIPT installed in FF, and only allowing the site it self allowed, it makes think it is just another bottom feeder. Withou a GOOG spyware allowed it is good for NARA.
|
Hedgehog_UK

msg:4182813 | 10:55 pm on Aug 5, 2010 (gmt 0) |
This new TwengaBot-Discover showed up twice today. Took 4 copies of robots.txt, then went on to grab every page on the site. Just a couple of hours later, it came back and took another complete copy of every page. It wouldn't be so bad, but the site doesn't sell anything. Same IP address reported by Pfui. As it ignores robots.txt it shouldn't have surprised me that it does not understand Crawl-delay.
|
Pfui

msg:4183822 | 2:19 pm on Aug 8, 2010 (gmt 0) |
Name/string change #3 -- for the worse. Again from Leaseweb: 85.17.226.11n TwengaBot/2.0 text/* robots.txt? Yes (Asterisk in string is original. "n" in IP is obfuscation.)
|
blend27

msg:4183851 | 4:41 pm on Aug 8, 2010 (gmt 0) |
LEASEWEB IP Ranges from RIPE: [db.ripe.net...] All though I only have wider ranges, 2-2-2 much abuse over the years that comes from that Hosting Company: start_ip - end_ip 95.211.0.0 - 95.211.255.255 85.17.0.0 - 85.17.255.255 62.212.64.0 - 62.212.95.255 82.192.64.0 - 82.192.95.255 83.149.64.0 - 83.149.127.255
|
Web_Savvy

msg:4202669 | 10:14 am on Sep 16, 2010 (gmt 0) |
Latest appearance: IP: 94.75.247.241 UA: TwengaBot-Discover (http://www.twenga.fr/bot-discover.html) Sucking up (100s of) pages, like there's no tomorrow ;-) Will let it run for a while and see.
|
Pfui

msg:4202879 | 5:12 pm on Sep 16, 2010 (gmt 0) |
@Web_Savvy: If it ignored your robots.txt, block it ASAP. Why let some stranger abuse your resources -- content and cash -- for their own purposes, with no apparent benefit to you?
|
Dijkgraaf

msg:4202995 | 9:16 pm on Sep 16, 2010 (gmt 0) |
Project Honeypot on 94.75.247.240 Geographic Location [Netherlands] Netherlands Spider First Seen approximately 3 months, 1 week ago Spider Last Seen within 1 week Spider Sightings 4,347 visit(s) User-Agents seen with 4 user-agent(s) Threat Rating 46 (Read More) First Rule-Break On approximately 2 months, 4 weeks ago Last Rule-Break On within 2 months, 4 weeks Rule Breaks 2 web page navigation rule(s) broken by this IP TwengaBot TwengaBot/2.0 (http://www.twenga.com/bot.html) TwengaBot-Discover (http://www.twenga.fr/bot-discover.html) TwengaBot-Discover (http://www.twenga.fr/bot-discover.html),gzip(gfe),gzip(gfe) 85.17.226.119's User Agent Strings metal-warrior-surfer TwengaBot/2.0 TwengaBot/2.0 (http://www.twenga.com/bot.html) TwengaBot/2.0 text/* TwengaBot-Discover (http://www.twenga.fr/bot-discover.html) WebSurfer text/*
|
dstiles

msg:4203044 | 10:52 pm on Sep 16, 2010 (gmt 0) |
Interesting. I recently enabled this bot (IPs 94.75.247.240-94.75.247.241) as an apparently reasonable SE. I'll keep an eye on it. Haven't seen it on the 85.17 range yet but 85.17/16 is leaseweb, hence banned anyway, so it may not have registered with me beyond "another leaseweb hit".
|
Web_Savvy

msg:4203689 | 6:51 pm on Sep 18, 2010 (gmt 0) |
@Pfui: Thanks for the input. Actually, this site of ours is a kind of 'working experiment' - we're deriving a lot of research data and intelligence out of all this and that's why we let these bots ply their trade awhile.
|
dstiles

msg:4204419 | 8:20 pm on Sep 20, 2010 (gmt 0) |
Twenga just followed a nofollow link into a trap in a folder that was disallowed in robots.txt. Stupid Twenga! Going to have to watch this bot. <sigh>
|
dstiles

msg:4208483 | 12:44 am on Sep 29, 2010 (gmt 0) |
Well, I gave it a fair shot. Accepted a high rate of access on the grounds they were new sites to the bot, even allowed it a minor indiscretion as regards forms, but when it hit every blocked mail-a-friend page, that's it. Banned.
|
|