Forum Moderators: DixonJones

Message Too Old, No Replies

bad bot, or clumsy downloaders, or something else?

"Deliverant.com webpage capture" violated robots.txt

         

stapel

5:46 am on Oct 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm running a bad-bot [webmasterworld.com] script to block bad bots and other naughty folks from attempting to download my site. This script drops me an e-mail when somebody or something gets caught. I just received notice of the following:

The associated user agent was jkn.com / Deliverent.com webpage capture

After visiting the two web sites mentioned in the "user agent" string, I can't say that I'm certain of what I'm looking at here. The one site appears to be some highly-restictive e-mail service that allows for controlled formatting of HTML-based messages, including complete web pages, and it's teamed up with the other site to provide bulk-mailing services (up to a million e-mails a month, I think).

Any ideas what might be going on? Is somebody maybe trying to mail my site to his clients, as though it's his own product?

Thank you.

Eliz.

jackmurphy

2:27 pm on Nov 8, 2005 (gmt 0)

10+ Year Member



Eliz,

jkn and Deliverent are sites that are owned and operated by Relevance Technology Inc.

I am a representative of Relevance Tech and I would like to take a moment to explain how we use the web capture technology in our products.

jkn is a free service that individuals can use to email a web page to their friend. Since today's web pages are so dynamic and can change at any time, it is useful to email the whole page instead of just the URL. Hence, jkn has to capture the entire web page.

Deliverent can be utilized by websites for:
1. a free send-to-friend service
2. email distribution (websites provide their own opt-in list; under no condition do we sell email addresses)

Deliverent uses the jkn web page capture in two ways.
1. it is the underlying technology for the send-to-friend button. This allows websites to use the same send to-friend button code across their entire site with zero page by page configuration.
2. when a websites wants to publish an email to their their opt-in list, they can import a web page to email from their own site (this makes it easy for websites to have their email match the look and feel of their website.)

In your case, someone probably used jkn to email a web page from your site.

I hope this explains everything you need to know and I would be happy address any other concerns you may have.

Jack Murphy

[edited by: jatar_k at 4:51 pm (utc) on Nov. 8, 2005]
[edit reason] removed urls [/edit]

Leosghost

2:32 pm on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld jackmurphy ..straight talking reps are getting harder to come by here.

Mods may well delink your links ..in spite of the fact that I for one dont think that you could be considered to be spamming ..But they ( links ) are not to "authority" sites in the usual sense of the word here..

stapel

4:21 pm on Nov 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jackmurphy,

Thank you for your reply.

Have you considered having your bot respect the robots.txt file, so it doesn't try to access stuff it isn't supposed to?

(Your bot wouldn't have been banned if it had only forwarded the HTML and images in question. It was banned when it tried accessing forbidden files.)

Eliz.

jackmurphy

5:10 pm on Nov 8, 2005 (gmt 0)

10+ Year Member



Eliz,

jkn only captures web pages for the purpose of sending out an email on behalf of a user.

The purpose of the robots.txt file is to constrain web crawlers primarily for search engines. However, jkn.com does not crawl the web. Our sole purpose is to load up a web page and email it. In that capacity we are no different than the user's browser.

That said, I'm surprised that the web capture tried to access forbidden files because the only external files that it loads up are *.css files (external css link don't work in email; so the full css must be included instead.)

The only thing I can think of is sometimes the web capture has to do file validity checks when converting relative URLs to absolute URLs. This may have shown up as accessing forbidden files.

If you want, you can sticky mail the URL that was captured to me and I can look into specifically what happened.

Jack

[edited by: Receptional at 9:04 am (utc) on Nov. 9, 2005]
[edit reason] Took out personal E-mail address & URL [/edit]