Forum Moderators: open

Message Too Old, No Replies

DiaGem

Another bot to ban

         

Son_House

1:48 pm on May 26, 2002 (gmt 0)

10+ Year Member



203.178.88.244
crl.skyrocket.gr.jp
DiaGem/1.1 (http://*www.skyrocket.gr.jp/diagem.html)

On the about page they call it an experimental bot and that the documents will be used internally only.

Made about 55 request in an hour. Gets the robots.txt file and also grabs external css.

wilderness

6:14 pm on May 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You have a market share in the far east?
Do you desire a market share in the far east?
Does your site stand to either benefit from the far east market or provide benefits to the far east?

My site do neither.
As a result I have most of Japan, China, Korea, Taiwan and a few other countries denied.
In my particular instance . . . When they visit my sites their intentions are only malicious or non beneficial to their time and my bandwidth.

mbauser2

10:38 pm on May 26, 2002 (gmt 0)

10+ Year Member



You're not the only it hit: I was coming here to report DiaGem, myself. It came out of nowhere and made 108 requests to my site on Saturday.

I haven't looked very closely at the log to see if it's doing anything especially interesting or stupid. Maybe I'll do that tonight.

mbauser2

1:29 am on May 31, 2002 (gmt 0)

10+ Year Member



Finally got around to looking at that access log. DiaGems's an odd one:

It requests .css files if they're linked from .html files via the LINK element.
It also requested a .pfr (embedded font file) that's linked to in a LINK tag, but didn't request any of the .hdml files that are LINK-linked.

(It didn't request any of the .eot or .css files that are linked from other .css files, suggesting it's not parsing CSS.)

All of my non-HTML files are flagged with the proper content-types, so it's doubly-odd that it grabbed the binary .pfr but skipped the text .hdml files.

It requested one XML file (my site's p3p.xml privacy file), presumbably via the LINK element. It didn't get deep enough into my site to hit any other XML files, so I don't know if it's going after all XML content.

Like too many robots, it grabbed some PGP key files I have available for download. Those are probably useless for data-mining.

It requested dynamic URLs that were linked from static files, but didn't fall into any spidertraps. Did mangle one URL it got from a click-tracking redirect, turning http:// into http%3E/ (turned //: into >/).

Otherwise, DiaGem looks OK to me. Didn't make any duplicate requests, didn't request any graphic files, and didn't request enough to affect my bandwith usage, so I'll let it be for now.

mbauser2

3:28 pm on May 31, 2002 (gmt 0)

10+ Year Member




didn't request any of the .hdml files that are LINK-linked.

My bad: I looked at the log again, and DiaGem did request one .hdml file, but skipped three others it should have known about.

Son_House

6:03 am on Jun 1, 2002 (gmt 0)

10+ Year Member



Thanks for the idea wilderness but our site could be of interest to people in the far east so I have no plans to block whole countries.

Thanks for the info mbauser2. The bot was also well behaved at our site. I banned it anyway because they said it is for internal use only. If they ever do something useful that the public can access, maybe I'll unban them.