Forum Moderators: open

Message Too Old, No Replies

HTtrack and Webpix

Both his my site, rapid fire.

         

larryhatch

3:00 am on Nov 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In my access_lop file for Saturday 27NOV04 I see that somebody used HTTrack 3.0 to download virtually all my static .html files, plus a few images. The IP traced back to St. Louis, MO.

HTtrack did NOT call up robots.txt . My understanding is that their free version respects robots,
but their paid version can override that.

Questions:

1) What do you suppose the purpose of this is?

2) Might somebody be trying to scrape my whole site?

3) Lets say I disallow HTtrack in robots. Per their website, " disallow / " is too vague!
I have to specify some directory. My whole site is in the root directory.
Can I disallow the root dir, even though that is where robots.txt lives?
Do you see the logical problem with that? I would be disallowing robots.txt too,
thus providing them a legal 'out'.

- Confused in California

wilderness

9:09 pm on Nov 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello Larry,
robots.txt is suggestion which honorable bots comply with.

Dishonorabale bots and/or visitors could care less if robots.txt exists much less the contents.

For answers to your first to questions?
Only the person or machine grabbing the pages has the answer. Assumptions may be made when a webmaster has learned to recognize traffic patterns in their visitor logs.

Your best advantage is to examine the use of htaccess.

Here some old threads:

A Simple Beginning
[webmasterworld.com...]

Close to Perfect
[webmasterworld.com...]

DanA

9:20 pm on Nov 29, 2004 (gmt 0)

10+ Year Member



There is no paid version of HTTrack, only users who override the robots.txt rules.
HTTrack offers 30 different user agents.
You can find the author's Abuse FAQ for webmasters here :
[httrack.com...]

larryhatch

10:10 pm on Nov 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks guys!

I will be taking a hard look at htaccess.

- Larry

wilderness

10:21 pm on Nov 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is a much better description of which I'm sure none of the users bother to read ;)

[httrack.com...]

About as often as visitors read TOS, UAG's or FAQ's on the websites they visit.