Forum Moderators: DixonJones
Related to my monthly bandwidth usage my hosting bill is starting to get a little big. While I have no real proof of hotlinking I'm to a point where I would like to know.
My host provides Webalizer stats. Is there some aspect of them that can provide hotlinking information?
My host generates a new site log for each day of the week. I can download this log. I've looked at it with WordPad but it is very confusing. Is there some program I could run this file through on my own machine so to decipher things?
Analog is also my favorite stat program, but it can take some time to set the config file to suit your needs. There are many ways to prevent hotlinking [google.com] without having to look for the culprits.
There is also another thing to think about, these sites are linkling to your site, thus contributing to it's popularity. If you pull the plug, your rankings on most SE will probably suffer.
If you want to do a quick chek about who links to your site, you can query GG with
-link: www.yoursite.com/ and run trough them manually.
... these sites are linkling to your site, thus contributing to it's popularity. If you pull the plug, your rankings on most SE will probably suffer.
Broadway - There's a folder named: examples, which contains a file named bigbyrep.cfg. Read through it. It shows you how to control what is displayed in each category on your report.html page.
Granted, analog is not a quick study, but it is a terrific tool and highly customizable.
<added>
This is an example of how to show columns for page requests vrs. file requests by referrer in your analog.cfg file - thus showing who's remote linking to image files:
REFERRER ON
REFCHART ON
REFCOLS PR
REFSORTBY REQUESTS
REFFLOOR 1r
REFARGSSORTBY REQUESTS
REFARGSFLOOR 1r
</added>
First you select all log-records that concerns images:
grep -i "(\.gif HTTP¦\.jpg HTTP)" my.log > work.log
"my.log" is your log (of course) and work.log is a temporary file that you pass on to step 2:
grep -i -v "(http://.*.yoursite\.com¦cache¦atomz¦babelfish¦ \"-\")" work.log > hotlink.log
This command will select all log records that are not referred from sites that you control. Remember to replace "yoursite\.com" with your own site-id.
cache, atomz and babelfish are sites that legitimately links to my images.
2X2.30.8.160 - - [09/Mar/2004:16:07:09 +0100] "GET /[b]your-image.jpg[/b] HTTP/1.1" 200 1692 "http://[b]www.yoursite.com/Yourpage.htm[/b]" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; DigExt)" I used the first grep to extract all lines with "jpg" and gif" in them - and the second grep to extract those where the referrer did not (that's the "-i") contain "yoursite.com".
[edited by: DaveAtIFG at 4:58 pm (utc) on Mar. 22, 2004]
[edit reason] Obscured IP [/edit]
You could try this, using the Command Window followed by Excel. I haven't done it in awhile and hope I can remember it right. The main point of this is looking for third-party sites that are referrers of hits to your image files.
First, look inside one of your logs using WordPad or whatever and make sure *.jpg and *.gif files are actually being logged. If so, go to the command window (DOS-looking thing), change to the directory holding the log, and do this:
find ".jpg " ex040322.log > ex040322.jpegs
followed by, if you want, when it's finished:
find ".gif " ex040322.log >> ex040322.jpegs
(I'm just guessing at the logfile name and am arbitrarily using March 22 as the date of the log)
You'll end up with one file containing only jpeg file hits or only jpeg and gif hits. As a last step, open the original log and grab the line at the top starting with #Fields: and put it at the top of the new file.
Explore to the new file, right click on it in Windows, choose "Open with Excel." In Excel, choose Delimited and specify a space as the sole delimiter. 65,000 lines of your log will open in Excel. In the transplanted line that starts with the word "#Fields:," delete the first cell (containing #Fields:), moving everything to the right of it to the left. Now all your field names will line up vertically with the correct fields. Save this as an .xls file before you go any further.
Use Excel to Sort on the Referrer field (probably called cs(Referrer)). Be sure to specify that there is a header row when you do the sort. In the sorted worksheet, scan through the Referrer column. Since it's pretty rare for an image file to have a referrer other than your own site, you'll be looking for referrers that are not your site. When you find any, the images they're grabbing will be in the cs-uri-stem and/or the cs-uri-query fields of that line.
Okay. Now it's time for the smart people in this forum to shoot this full of holes ...