Welcome to WebmasterWorld Guest from 54.147.44.13

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

TwengaBot

     
7:51 pm on Apr 12, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2002
posts:3171
votes: 8


85.17.226.*
TwengaBot/2.0 (http://www.twenga.com/bot.html)

can't resolve the ip on a reverse-dns

it's for one of those very useful (not) shopping comparison sites

it also seems to be trying to return a cookie which seems odd, given the UA identifies it as a bot.

been hitting me quite hard.

i've banned it for now.

[edited by: incrediBILL at 9:23 pm (utc) on Apr 12, 2010]
[edit reason] Obscured IPs [/edit]

10:11 pm on Apr 12, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5803
votes: 64


I have it coming from Leaseweb, Amsterdam: 85.17.226.0 - 85.17.226.255

Other Leaseweb ranges: 85.17.173.0 - 85.17.173.255

I've blocked them for quite a while.
12:17 am on Apr 21, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


You may wish to expand on that range a bit. The RIPE route info indicates they've got the whole /16:

route: 85.17.0.0/16
descr: LEASEWEB

Jim
6:08 pm on June 15, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:5803
votes: 64


twengabot also coming from another Leaseweb range:

94.75.247.0 - 94.75.247.255
route: 94.75.192.0/18

Robots.txt: yes
3:52 pm on June 19, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 24, 2002
posts:894
votes: 0


I saw it today coming from 94.75.247.nnn
Took robots.txt and stepped right into the honeypot.

Then left to go wash its feet ;)
7:08 pm on June 19, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


In addition to asking for but ignoring robots.txt, it appears TwengaBot has already changed its name and URL string/country. This just in:

94.75.247.24n
TwengaBot-Discover (http://www.twenga.fr/bot-discover.html)

06/19 11:51:38 /robots.txt
06/19 11:51:39 /
7:54 pm on June 19, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


Leaseweb?

No robots.txt, grabbed every page:

94.75.229.zzz - - [18/Jun/2010:14:32:48 -0600] "GET / HTTP/1.1" 200 5356 "-" "Mozilla/5.0 (compatible; Purebot/1.1; +http://www.puritysearch.net/)"
8:11 pm on July 22, 2010 (gmt 0)

New User

10+ Year Member

joined:Feb 13, 2004
posts:13
votes: 0


Interesting press release from Twenga.

leaseweb(dot)com/en/press/twenga

However, most of the Internet traffic is generated via Twenga’s crawler. "A good search engine that displays accurate results with the right products and prices needs a lot of capacity," says Anes. "At this moment the TwengaBot uses about 100 dedicated servers that together scan around 100 million web pages per day. But I expect to be needing at least 500 dedicated servers at the data center within two years." As a result, the Twenga crawler also needs a huge quantity of bandwidth. "We consume a total of 2000 Terabytes of data traffic per month. Three- quarters of that is generated by the crawler, but fortunately LeaseWeb operates a hosting network with a lot of bandwidth," laughs Anes.

Nice to know that while Twenga is expanding on the backs of small businesses they are getting a good laugh at your expense.
9:07 pm on July 22, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1665
votes: 35


having a NOSCRIPT installed in FF, and only allowing the site it self allowed, it makes think it is just another bottom feeder. Withou a GOOG spyware allowed it is good for NARA.
10:55 pm on Aug 5, 2010 (gmt 0)

New User

5+ Year Member

joined:July 3, 2010
posts:9
votes: 0


This new TwengaBot-Discover showed up twice today. Took 4 copies of robots.txt, then went on to grab every page on the site. Just a couple of hours later, it came back and took another complete copy of every page. It wouldn't be so bad, but the site doesn't sell anything.

Same IP address reported by Pfui.

As it ignores robots.txt it shouldn't have surprised me that it does not understand Crawl-delay.
2:19 pm on Aug 8, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Name/string change #3 -- for the worse. Again from Leaseweb:

85.17.226.11n
TwengaBot/2.0 text/*

robots.txt? Yes

(Asterisk in string is original. "n" in IP is obfuscation.)
4:41 pm on Aug 8, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1665
votes: 35


LEASEWEB IP Ranges from RIPE: [db.ripe.net...]

All though I only have wider ranges, 2-2-2 much abuse over the years that comes from that Hosting Company:

start_ip - end_ip
95.211.0.0 - 95.211.255.255
85.17.0.0 - 85.17.255.255
62.212.64.0 - 62.212.95.255
82.192.64.0 - 82.192.95.255
83.149.64.0 - 83.149.127.255
10:14 am on Sept 16, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 30, 2004
posts:80
votes: 0


Latest appearance:

IP: 94.75.247.241
UA: TwengaBot-Discover (http://www.twenga.fr/bot-discover.html)

Sucking up (100s of) pages, like there's no tomorrow ;-)

Will let it run for a while and see.
5:12 pm on Sept 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


@Web_Savvy: If it ignored your robots.txt, block it ASAP. Why let some stranger abuse your resources -- content and cash -- for their own purposes, with no apparent benefit to you?
9:16 pm on Sept 16, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 31, 2005
posts:1108
votes: 0


Project Honeypot on 94.75.247.240

Geographic Location [Netherlands] Netherlands
Spider First Seen approximately 3 months, 1 week ago
Spider Last Seen within 1 week
Spider Sightings 4,347 visit(s)
User-Agents seen with 4 user-agent(s)
Threat Rating 46 (Read More)
First Rule-Break On approximately 2 months, 4 weeks ago
Last Rule-Break On within 2 months, 4 weeks
Rule Breaks 2 web page navigation rule(s) broken by this IP

TwengaBot
TwengaBot/2.0 (http://www.twenga.com/bot.html)
TwengaBot-Discover (http://www.twenga.fr/bot-discover.html)
TwengaBot-Discover (http://www.twenga.fr/bot-discover.html),gzip(gfe),gzip(gfe)


85.17.226.119's User Agent Strings
metal-warrior-surfer
TwengaBot/2.0
TwengaBot/2.0 (http://www.twenga.com/bot.html)
TwengaBot/2.0 text/*
TwengaBot-Discover (http://www.twenga.fr/bot-discover.html)
WebSurfer text/*
10:52 pm on Sept 16, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3091
votes: 2


Interesting. I recently enabled this bot (IPs 94.75.247.240-94.75.247.241) as an apparently reasonable SE. I'll keep an eye on it.

Haven't seen it on the 85.17 range yet but 85.17/16 is leaseweb, hence banned anyway, so it may not have registered with me beyond "another leaseweb hit".
6:51 pm on Sept 18, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 30, 2004
posts:80
votes: 0


@Pfui: Thanks for the input.

Actually, this site of ours is a kind of 'working experiment' - we're deriving a lot of research data and intelligence out of all this and that's why we let these bots ply their trade awhile.
8:20 pm on Sept 20, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3091
votes: 2


Twenga just followed a nofollow link into a trap in a folder that was disallowed in robots.txt. Stupid Twenga! Going to have to watch this bot. <sigh>
12:44 am on Sept 29, 2010 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3091
votes: 2


Well, I gave it a fair shot. Accepted a high rate of access on the grounds they were new sites to the bot, even allowed it a minor indiscretion as regards forms, but when it hit every blocked mail-a-friend page, that's it. Banned.