Forum Moderators: open

Message Too Old, No Replies

Yodaobot 1.0 is coming

Do you notice the Outfoxbot 0.x? Last weekend it released as Yodaobot 1.0

         

chedong

1:23 pm on Dec 18, 2006 (gmt 0)

10+ Year Member



Many reports on Anonymous Outfoxbot before:

OutfoxBot
For internet experiments outfoxbot. ... OutfoxBot For internet experiments. GaryK #:401697, 2:39 pm on Nov. 13, 2005 (utc 0). User Agent: OutfoxBot/0.3 (For internet experiments; http://; outfox.agent@gmail.com). IP Address: 220.181.8.* ...
[webmasterworld.com...]

OutfoxBot/0.3
outfoxbot/0.3. ... 220.181.8.102 - - [08/Nov/2005:15:16:56 -0700] "GET /robots.txt HTTP/1.0" 403 292 "-" "OutfoxBot/0.3 (For internet experiments; ... Agent: OutfoxBot/0.3 (For internet experiments; http://; outfox.agent@gmail.com) ...
[webmasterworld.com...]

What is this? OutfoxBot
I found this agent crawling around my site what is this? outfoxbot.
[webmasterworld.com...]

OutfoxBot
Disobeys robots.txt outfoxbot. ... 220.181.8.121 - - [25/Jun/2006:00:02:19 -0400] "GET /robots.txt HTTP/1.1" 200 2966 "-" "OutfoxBot/0.1 (For internet experiments; http://www.outfox.com; outfoxbot@gmail.com)" ...
[webmasterworld.com...]

OutfoxBot /0.1?
Anyone know who what where? outfoxbot /0.1? ... Referer: - Agent: OutfoxBot/0.1 (For internet experiments; http://www.outfox.com; outfoxbot@gmail.com). / Http Code: 200 Date: Apr 28 15:24:59 Http Version: HTTP/1.1 Size in Bytes: 31663 ...
[webmasterworld.com...]

IP is come from Beijing China and no other except a gmail address: outfox@gmail.com in UA info.

Last weekend(15th Dec): there is a new released Chinese search engine:
Yodao.com, and outfoxbot renamed to yodaobot.

I tried my crawler identify query, site:example.com crawledby
and found crawler is just Outfoxbot:
phpMan: Unix Man page/ Perldoc / Info page Web Interface
On Apache/1.3.37 (Unix) mod_perl/1.29 mod_gzip/1.3.26.1a PHP/4.4.4 Under GNU General Public License 2006-12-14 04:53 @60.191.80.39 CrawledBy OutfoxBot/0.5 (for internet experiments; http://; outfoxbot@gmail.com)

the new yodaobot detect for awstats: including other Chinese browser and spider updates

diff -r1.44 robots.pm
100d99
< # added TencentTraveler
180,181d178
< # added sogou spider http://corp.sohu.com/20051130/n240842344.shtml
< # added sogou test http://corp.sohu.com/20051130/n240842344.shtml
351a349
> 'lilina',
462a461
> 'gougou',
472a472,474
> 'iaskspider',
> 'hl_ftien_spider',
> 'sogou',
835d836
< 'tencenttraveler', # Must be before msiecrawler
863c864
< 'outfoxbot',
---
> 'yodaobot',
899,900d899
< 'sogou\sspider',
< 'sogou\stest',
973a973
> 'zhuaxia',
1006a1007
> 'lilina','Lilina',
1115a1117
> 'gougou','GouGou',
1125a1128,1130
> 'iaskspider','<a href="http://www.iask.com/" target="_blank">Sina Iask Spider</a>',
> 'hl_ftien_spider','<a href="http://www.hylanda.com/" target="_blank">Hylanda</a>',
> 'sogou','<a href="http://www.sogou.com/" target="_blank">Sogou Spider</a>',
1463d1467
< 'tencenttraveler','TencentTraveler', # Must be before msiecrawler.
1491c1495
< 'outfoxbot','<a href="mailto:outfox.agent@gmail.com?subject=Outfox Bot Information" title="Bot e-mail.">OutfoxBot</a>',
---
> 'yodaobot','<a href="http://www.yodao.com/help/webmaster/spider/" title="Bot e-mail.">OutfoxBot/YodaoBot</a>',
1527,1528d1530
< 'sogou\sspider','<a href="http://corp.sohu.com/20051130/n240842344.shtml" title="Bot home page [new window]" target="_blank">sogou spider</a>',
< 'sogou\stest','<a href="http://corp.sohu.com/20051130/n240842344.shtml" title="Bot home page [new window]" target="_blank">sogou test</a>',
1601a1604
> 'zhuaxia','<a href="http://www.zhuaxia.com/" target="_blank">ZhuaXia</a>',

Che Dong

[edited by: encyclo at 2:48 am (utc) on Dec. 25, 2006]
[edit reason] examplified, fixed formatting [/edit]

GaryK

11:20 pm on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The OP made no sense at all to me. However I did see:

YodaoBot/1.0 (http://www.yodao.com/help/webmaster/spider/; )

in my logs last week. No robots.txt and took disallowed files.

Has anyone figured out what this is yet? Thanks.

Leosghost

11:34 pm on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



outfox is a scraper bot ..from www.outfox.com ..all searches on that domain lead to pages running only multiple overture links ..disguised ( badly :) as search links ..
the yodao version appears to be stat spammer who has dropped around 14,000 backlinks via various sites stats and as a result of which is now in alexa's top ten risers from PRC ..current status 45K ..also apears to run as various google bots when the fancy takes it..

[edited by: Leosghost at 11:50 pm (utc) on Dec. 24, 2006]

GaryK

11:39 pm on Dec 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the quick reply. Can I assume that YodaoBot is related to Outfoxbot?

Leosghost

12:40 am on Dec 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



that I dont know GaryK ..my chinese reading ability is very limited ..:)

the home pages are different in their raison d'etre ..

yodao.com ( the bot home )..is interesting ..and doesn't appear to be running as a scraper ..just omnivore ..like the chinese people it eats everything but the squeal when it sits down to dinner on a site ..knows no bounds :)

I just ran some searches on it in english ..I am pleased ;-) I do very well on it's index there ..:)..( with pages that shouldn't do very well anywhere ..and usually dont ..they are the ones that I was to lazy to change for a few years now ) ..dont expect it to bring me many clients though :)

It's algo looks to be very basic ..sort of AV just before the death ..all on page ..it's susceptible to keyword stuffing bless it ..:)

Doesn't appear to be wholesale ripping..mainly taking home pages and then random internal stuff ..from what I saw of mine and some others that I know ..most of it's own index appears to be taken for now from inside the "wall" ( some of my stuff I know is linked to from inside ) ..would probably have to ask in the Asian area here on WebmasterWorld to get any real hard info on it as it's "about" sector is a blog .

However ..

Just ran some other searches on it for things that normally would only be found in off limits areas of any sites to bots ..it's cache hold some very interesting things for the less well intentioned to exploit ..on the basis of a quick look at what it's pointy lil nose does get into and then hang on the washing line of "cache" ..

I think most people would want to block it entirely ..which might be challenging as it appears to run in various disguises including browsers and Gbots etc ..and mobile devices ..

Another case of if you dont expect customers block the PRC entirely ..?

It's rise is meteoric ..but it is very indiscreet ..

[edited by: Leosghost at 12:43 am (utc) on Dec. 25, 2006]

Mokita

12:54 am on Dec 25, 2006 (gmt 0)

10+ Year Member



/Wonders why this line in the OP wasn't edited and examplified?

I tried my crawler identify query, site:hisdomain.com crawledby

[edited by: Mokita at 12:56 am (utc) on Dec. 25, 2006]

GaryK

1:13 am on Dec 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another case of if you dont expect customers block the PRC entirely ..?

I've done that with Yahoo. I guess it's time to do the same with the PRC. I get no visitors or business from either one so why let them abuse my sites? Thanks for your insight. :)

EDIT: Hi Mokita. Merry Christmas. Sadly I could barely make sense of what the OP was trying to convey to us. All I picked up on was the sub-title about OutfoxBot becoming Yodaobot.

[edited by: GaryK at 1:15 am (utc) on Dec. 25, 2006]

Leosghost

1:18 am on Dec 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



just noticed ..very neat on hover trick on yadao.com results pages ..if you have java script enabled ..the "date of crawl" line has at the left side a small text icon .."on hover" will give you a small pop up with about the first 50 or so words from the page that is the result ..cute

and apparently there is indeed a link between it and outfox ..the yodao.com page asks to set a cookie " OUTFOX_SEARCH_USER_ID" ..duration til 2036..with a "91.164.217.249" return ..

keyplyr and incrediBILL tagged some outfox IP's here [webmasterworld.com...]
in one of the threads mentioned by the OP .. earlier this year ..

Again someone from the Asian foras input would be enlightening maybe ..Bill ( when you've finished with the turkey ..like to comment? ) ..or any other regulars from Asian search ..

I wondered that too Mokita ..figured admins would get to it after the mince pies :)

[edited by: Leosghost at 1:21 am (utc) on Dec. 25, 2006]

Mokita

2:00 am on Dec 25, 2006 (gmt 0)

10+ Year Member



Hi Gary,

Merry Christmas! I thought the OP was understandable - the whole essence is in the title. But he muddied it by including too much pasted info from previous threads and the many lines for updating Awstats' detection of Chinese browsers and spiders.

I blocked large swathes of PRC IPs quite some time ago. They only provide scrapers, harvesters and spambots. Also, I have Taiwan, Korea and to a lesser extent Japan on a similar footing. None of our product sites ship goods to Asia and our information sites would be of no interest to 99.9% of them either.

Leosghost wrote:

...figured admins would get to it after the mince pies

Dan has been eating mince pies since 19th Dec? WOW! He'll need to do some serious dieting and exercise when he gets back :O :D

[edited by: Mokita at 2:04 am (utc) on Dec. 25, 2006]

GaryK

3:27 am on Dec 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh no. Please tell me Dan's not going to be away until after the new year! I have so many messages queued it's not even funny.

Mokita

1:40 am on Dec 26, 2006 (gmt 0)

10+ Year Member



Relax Gary :)

It looks like encyclo and maybe other mods are helping out each other out over the holidays.

GaryK

7:16 am on Dec 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks. My messages got approved today. :)

volatilegx

2:49 pm on Dec 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry guys... was afk for three days. Had a kid in the hospital.

Mokita

11:26 am on Dec 27, 2006 (gmt 0)

10+ Year Member



Sorry guys...

Absolutely no apologies necessary. I was very concerned to hear of your untimely trauma - hope all is well now?

volatilegx

3:31 am on Dec 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, everything is good now. Parker (my oldest son) is out of the hospital and creating havoc as usual ;)

GaryK

6:04 pm on Dec 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm glad all is well now with your son Dan.