Forum Moderators: open
My server logs for the last day and a half have a minimum of 30 entries as follows:
216.180.241.154 - - [08/Jul/2003:01:08:15 -0600] "GET /index.shtml HTTP/1.0" 200 41206 "-" "-"
All entries are for the index page (and same IP), usually in clusters of 6 to 15 hits within a few minutes, but no entries for any of the graphics or anything else on that page.
An IP WHois gives me:
OrgName: Net Depot, Inc.
OrgID: DEPO
Address: 55 Marietta St NW
Address: Suite 1720
City: Atlanta
StateProv: GA
PostalCode: 30303
Country: US
NetRange: 216.180.224.0 - 216.180.255.255
A google search leads me to http:*//www.netdepot.org/ (meta search, but accept site submissions for their own directory?) (I'm assuming that this IS the web page of the bot hitting my page; though I suppose it may just be coincidental that this se shares the same name as the aforementioned company).
Has anyone heard of Net Depot, or had any run-in with IP's in that range before? I don't recall submitting my site to them, but anything is possible.
I'm new at this stuff so I have a few questions, please humour me with a response even if they are laughably obvious:
(1) The fact that the index page is loaded over and over, but there are no entries in my log for graphics downloads must mean it a spider (or a text browser), correct?
(2) For a spider to follow all links on one page, does it have to keep GETting the page? Is that normal behaviour for any spider? Besides, is it not considered bad behaviour for a bot to make requests that often?
(3) I have read a quite a few of the posts on this board over the last few days, but now that it comes to actually looking at my own logs, it seems a lot more difficult to track individual visitors actions than I suspected. The script I am using allows me to dl as much of the access log as I like, but it is difficult to wade through. It does no formatting. Can anyone suggest a good script that can help me make some sense of what is hitting my pages, which pages, and how fast? How do those of you who are reading this analyze your logs to track a specific IP's behaviour on your site without getting mired in every little 'spacer.gif' that is downloaded?
TIA for answering my questions, and offering any additional thoughts or questions I *should* be asking.
paul
(1) The fact that the index page is loaded over and over, but there are no entries in my log for graphics downloads must mean it a spider (or a text browser), correct?
Could just be a visitor that likes your page. Newbies do strange things. If it's a bot? It's either stuck in a loop or not very good in the first place.
(2) For a spider to follow all links on one page, does it have to keep GETting the page? Is that normal behaviour for any spider? Besides, is it not considered bad behaviour for a bot to make requests that often?
(3) I have read a quite a few of the posts on this board over the last few days, but now that it comes to actually looking at my own logs, it seems a lot more difficult to track individual visitors actions than I suspected. The script I am using allows me to dl as much of the access log as I like, but it is difficult to wade through. It does no formatting. Can anyone suggest a good script that can help me make some sense of what is hitting my pages, which pages, and how fast? How do those of you who are reading this analyze your logs to track a specific IP's behaviour on your site without getting mired in every little 'spacer.gif' that is downloaded?
Perhaps some of the others can advise you on log scripts.
I use WordPad to view my RAW non converted logs.
Their are log tools. Analog or an old decent program named Logalizer (if you can find it.)
Viewing RAW logs is just like everything else. You learn. Becoming accustomed to patterns of transgression. After a while the image files that are part of your site, you'll skip right over when reading logs.
No shortcut exists for compliling stats from logs that I'm aware of. You either filter through the lines or rely on a software (such as analog) to sort out the data for you.