Should i block this or not?

Forum Moderators: DixonJones

Message Too Old, No Replies

Should i block this or not?

Added to my deny list weeks ago but...

walrus

2:54 am on Dec 28, 2004 (gmt 0)

I added deny from server.ntli.net to my htaccess
weeks ago and last couple days i see hundereds of clicks from cache3-brtn.server.ntli.net and brhm-cache-3.server.ntli.net trying to enter over and over every second for up to four minutes.
I cant find the original log to see why i blocked it in the first place but i just read on an old thread that they are related to Inktomi so i just unblocked it.
Does deny from cache3-brtn.server.ntli.net block thousands of people or is it a bot?
I cant see why someone would click over and over for 4 minutes without just giving up and going to another website.
, and if it was a bot would it do that?

jdMorgan

3:17 am on Dec 28, 2004 (gmt 0)

ntli is a broadband service provider.

The reverse-DNS you posted from your logs states that this is their caching proxy accessing your site, not a 'bot and not a human

Since you don't have the original logs that apparently caused you to block this hostname or IP range, I can't advise further, except that to say that IMHO, it's generally not a good idea to block an ISP's cache access to your site. Best case, with a "look-aside" cache, you force the client browsers to re-fetch the pages/images/etc. directly from your server. Worst-case, with an "in-line" cache, they won't be able to access it at all.

Jim

walrus

6:16 am on Dec 28, 2004 (gmt 0)

Thanks Jim!
Im gonna be watching for the next visit
and if its acting weird again i'll try contacting them.

walrus

8:43 pm on Dec 31, 2004 (gmt 0)

Hmmmmmm....
am i being paranoid or is this odd?
Well i allowed it back and this is what i saw in yesterdays logs so i just blocked it all again.

winn-cache-2.server.ntli.net - - [29/Dec/2004:04:25:09 -0500] "GET /wicketsfromspace.html HTTP/1.1" 200 30661 "http://www.google.co.uk/search?q=buy+wickets+from+space&hl=en&lr=&start=20&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; DigExt; Q312461; YComp 5.0.0.0; SV1; .NET CLR 1.1.4322)"

and thats just after these entries hours earlier

midd-cache-4.server.ntli.net - - [29/Dec/2004:05:55:23 -0500] "GET /wicketsfromspace.html HTTP/1.1" 200 29411 "http://webferret.search.com/click?wf,wickets+from+space,,www.spacewickets.com%2Fspace_wickets%2F wicketsfromspace.html ,,aol"

and
glfd-cache-4.server.ntli.net
midd-cache-4.server.ntli.net

Good grief

jdMorgan

9:04 pm on Dec 31, 2004 (gmt 0)

You might want to research caching proxies -- I don't know why you are concerned about these accesses. Caching proxies sit at the edge of large ISP's networks, and cache content that they detect is being requested often. This means that their users can be serverd with the cached content, rather than the ISP having to send the request to your server. This saves bandwidth on the 'net and load on your server.

I believe that what you are seeing is simply ntli users requesting your pages through that ISP's caching proxies, located in the cities/regions/areas designated by "winn," "midd," and "glfd."

The ISP wants to be sure not to serve stale copies of your pages, so it will re-request the page from your server after the Expires time you have declared for that page or if you haven't declared an expiry time, then whenever their cache's default expires time passes.

If you want to see the effect, detect AOL users and send them a no-cache header. You'll see the load on your server go up, as AOL users fetch the same pages over and over from your server. Set the cache-control back to normal, and your server load will drop again. If you need to track 'hits' to your site, then include a small non-cacheable 1x1 tranparent .gif image on each of your cacheable pages; The image will be fetched each time one of your pages is requested, bypassing the caching proxy. You'll still ge the benefit of having your pages cached by the ISP. but you'll be able to track page request based on the 1x1 image requests.

Jim

walrus

1:39 am on Jan 1, 2005 (gmt 0)

I guess i should have read up on that after you mentioned it cause i didnt really understand how it works.
Sure looked strange

Thanks again JD and Happy New Year!

walrus

2:02 pm on Jan 1, 2005 (gmt 0)

I was rushing off for the night when i wrote that,
i forgot to ask about the 60 requests a minute for 4 minutes. You'd think if it was getting an error page that after a few tries it would stop, but it kept up every second for 3-4 minutes several times, thats what seems strange to me still.

Symbios

2:10 pm on Jan 1, 2005 (gmt 0)

Probably someone downloading your whole site using one of the many applications out there.

geekay

6:01 pm on Jan 1, 2005 (gmt 0)

I've been using this 1x1 gif method, but strangely my gif is requested considerably fewer times than their corresponding pages. I once asked about that here and the only explanation available was that browsers' local caches had cached the gif.

The explanation is certainly true, but is it the main one? The discrepancy continues, and as it's in an unexpected direction the usefulness of the method is limited in my case. My question isn't fully on topic, but maybe readers of this thread have some further ideas?

I've also been wondering if ISP proxies cache images too? Say a visitor initially gets my html page from his ISP's proxy. If he then clicks on a link in the page to see my off-page picture the proxy requests solely an image file from my server? But isn't that hotlinking, which I have prevented? The http protocol is outside my knowledge.

walrus

6:45 am on Jan 2, 2005 (gmt 0)

<You might want to research cacheing proxies>
Well I finally took the time to do that and of course you were absolutely right, totally normal,i just unblocked it again. Apparently cacheing proxies are fraut with problems for client and server too.

[faqs.org...]

Thanks again!

walrus

6:37 pm on Jan 2, 2005 (gmt 0)

Actually its probably not fraut with problems so much as that it gets complicated by compatability between different servers settings or configurations that may conflict or not perform properly. Just wanted to clarify that. That webpage shows that some anomolies may be bugs and that some mistakes or problems can end up looking suspicious.

jdMorgan

7:06 pm on Jan 2, 2005 (gmt 0)

> I've been using this 1x1 gif method, but strangely my gif is requested considerably fewer times than their corresponding pages.

You must mark the 1x1 image as uncacheable in order for this method to work. On Apache, see mod_expires and mod_headers. You can use mod_expires to take care of default expiry time, and use mod_headers to write a specific cache-control server response header when the 1x1 image is requested.

The following code, placed in an .htaccess file in your image subdirectory, will mark all images as cacheable, must-revalidate (after 30 days), but serve a no-cache header for "webbot.gif".


# activate mod_expires, set expiry at 30 days after access
ExpiresActive On
ExpiresDefault A2592000
Header unset Cache-Control:
Header append Cache-Control: "must-revalidate"
<Files webbot.gif>
ExpiresDefault A1
Header unset Cache-Control:
Header append Cache-Control: "no-cache, must-revalidate"
</Files>

I use "header unset" and "header append" instead of "header set" because at one time I was hosted on a funky server where "header set" didn't work properly for some reason.

Must revalidate doesn't do exactly what it says it does. Rather, it forces caches to formally respect the "no-cache" and expiry headers.

Jim

ryoko

12:23 pm on Jan 6, 2005 (gmt 0)

I use NTL myself at home, here is a complete list of NTL proxies for reference:

[cableforum.co.uk...]

walrus

6:27 pm on Jan 6, 2005 (gmt 0)

Thanks Ryoko,
and welcome to WebmasterWorld!

Symbios could be right,
are there any other theories as to why it was trying every second for 4 minutes?