The information is automatically sent out by the browser (or other "User Agent") before it ever picks up your page. Most browsers also send information about how the user got to your site. If you have a site of your own, you should look at your raw logs. A typical entry will look like this (I'll pick a well-known robot so I don't have to obfuscate anything):
188.8.131.52 - - [09/Aug/2011:02:41:09 -0700] "GET /hovercraft/index.html HTTP/1.1" 403 1272 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.dotnetdotcom.org/, email@example.com)"
This breaks down to:
184.108.40.206 the IP address of the visitor. The number could be anything from an individual human with a fixed IP address (typically a high-speed connection such as a cable modem) to a workplace all going through one great big router
- - I have no idea what the two separate hyphens mean. Someone else will tell us both. I have never seen text in this location.
[09/Aug/2011:02:41:09 -0700] time to the nearest second, in brackets, here expressed in local time with its relationship to, uhm, UTC? (Thing that used to be Greenwich Mean Time.) I happen to live in the same time zone as my server.
"GET /hovercraft/index.html HTTP/1.1" The request sent by browser (or robot, or other) to your site. GET means the whole page. HEAD means basic information about the page. In general, human browsers use HTTP/1.1 while some robots may use 1.0 instead.
403 1272 The first number is the result of the request, here 403 meaning "get lost". This particular robot never got past the htaccess file. The second number is the size in bytes of the file they got instead, here the custom 403 page. (The one humans see if they blunder into an index-less directory.)
"-" in quotation marks is the "Referer" (sic
) meaning what they clicked to get to your site, or who asked for the file (images will typically give the page they're on as referer). Robots generally have no referer. But neither do some human browsers, and neither do bookmarks or type-ins.
"Mozilla/5.0 (compatible; DotBot/1.1; ht
tp://www.dotnetdotcom.org/, firstname.lastname@example.org)" again in quotes. This is the UA or "User Agent"-- if you're thinking strictly of humans, the browser. If a robot has an unpredictable IP address, they may be locked out by UA instead.