Forum Moderators: open

Message Too Old, No Replies

get 'head' is bots?

noticed new gets with this notation

         

Megaclinium

9:25 pm on Jul 4, 2008 (gmt 0)

10+ Year Member



when an IP requests "head" then URL does this indicate a bot?
the same IP then requested a 'GET' to same two URLs.

It also didn't retrieve the thumbnails displayed on the page like a normal user would, leading me to think is a bot.

should I ban anything that wants "head"?
(slap it in the face for asking :)

Staffa

10:51 am on Jul 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to read : [webmasterworld.com...]

your answer is most likely in there.

Megaclinium

2:53 am on Jul 6, 2008 (gmt 0)

10+ Year Member



I don't see any reference to 'head' html command in that AVG discussion. It's hard to search this site now since not indexed.

And AVG doesn't seem to be creating 'head' gets.
I've seen the '1813' tagged on the end in other commands but not these gets. When AVG hits my site with their pre-checker they just seem to repeat the command after adding the ending '/'.

Here's the UA end of one 'head':

.html HTTP/1.1" 200 0 "-" "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

it is doing a 'head' for the web page
followed by a 'get' to the exact same page from the exact same IP.
seems kind of pointless or botlike.

Maybe is some other virus checker besides AVG?

jdMorgan

3:07 am on Jul 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That is very likely the second-generation AVG LinkScanner, as described in the two threads running here. A positive identification cannot be made based on standard log files alone, because they don't show all the information needed to be absolutely sure.

The first generation Linkscanner used the ;1813 User-Agent string.

We are now seeing the third-generation AVG LinkScanner roll out, in response to the issues identified in those threads.

HEAD requests are also issued by caching proxies, commonly used by ISPs like AOL and EarthLink and by corporations. It is frequently used to check the Last-Modified date on a resource, to determine whether the cache should load a new copy from your server, or if it can simply use the previously-cached copy, thus saving your server bandwidth and delivering content faster to the client browser.

So no, HEAD requests do not positively identify a robot. There are many uses for HEAD requests, and many user-agents use them.

Jim

[edited by: jdMorgan at 3:08 am (utc) on July 6, 2008]

Megaclinium

5:59 am on Jul 6, 2008 (gmt 0)

10+ Year Member



Thanks!

Guess I won't deep six them then