Forum Moderators: DixonJones

Message Too Old, No Replies

Puzzle about multiple hits in a server log

Am I seeing a shady SEM practice?

         

anallawalla

8:25 am on Apr 13, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I have been sent server log extracts by an overseas client but I don't have access to the server. I am seeing *hundreds* of entries like the following extract (all Google):

16/01/2003 01:22:58 aa.bb.cc.dd &q=red+widget
16/01/2003 01:22:59 aa.bb.cc.dd &q=red+widget
16/01/2003 01:27:30 aa.bb.cc.dd &q=red+widget

(identical IP address shown as aa.bb.cc.dd)

There is always the same pattern: The first two are a second apart or in the same second and the third is a few minutes later. About a third of the log is full of these; so it is not like every visitor generates such a pattern. Sometimes the same IP address generates the same query string 5-6 times.

The search string is identical for all three entries, e.g. "domain.com", "BrandName" or "red widget software". I can't imagine why a genuine visitor via Google would go back to Google and execute the same query in this manner.

The IP addresses are from all over the world but mostly in the same state as the client. I certainly expect the client's own staff to test Google from time to time, but those would be largely internal addresses (I see a lot from 0.0.0.0 through 1.0.0.1).

My main objective is to find genuine query strings in the logs to see how visitors are finding the site. I want to understand why I am seeing this triple-hit pattern, e.g. are there people who run some script through multiple anon web surfing sites to artificially boost ranking? If the hits are from seemingly disinterested companies, what could cause such an aberration?

- Ash

sugarkane

9:55 pm on Apr 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A puzzler indeed. The only thing that springs to mind is maybe these are image calls from the google cache? It doesn't look like it from the log excerpts you posted, but it's all I can think of at first glance.

anallawalla

12:00 am on Apr 15, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Thanks. I don't have access to any IIS server so I don't know what the logs look like, but I know these extracts were trimmed for me so that I can see only the Google search strings.

In Apache, which I use, the search string is on the first line whereas the subsequent lines are for each displayed element such as an image file. Here is a suitably anonymised extract of the supplied log file. All IP addresses are fake except for the 1.0.0.1 which is a different mystery. I am not getting any reply from the client on this point. - Ash

2003-02-13 13:15:52 78.160.202.120 [google.com...]
2003-02-13 13:49:08 77.28.30.157 [google.com...]
2003-02-13 13:49:11 77.28.30.157 [google.com...]
2003-02-13 13:49:14 77.28.30.157 [google.com...]
2003-02-13 14:03:53 207.244.148.251 [google.com...]
2003-02-13 14:03:53 207.244.148.251 [google.com...]
2003-02-13 14:03:54 207.244.148.251 [google.com...]
2003-02-13 14:21:16 17.111.51.212 [google.com...]
2003-02-13 15:46:25 1.0.0.1 [google.com...]
2003-02-13 15:46:26 1.0.0.1 [google.com...]
2003-02-13 15:46:27 1.0.0.1 [google.com...]
2003-02-13 15:48:26 1.0.0.1 [google.com...]
2003-02-13 18:27:08 67.227.45.29 [google.com...]
2003-02-13 18:27:08 67.227.45.29 [google.com...]
2003-02-13 18:27:09 67.227.45.29 [google.com...]
2003-02-13 23:21:46 217.96.99.47 [google.com...]
2003-02-14 05:02:35 97.104.111.3 [google.com...]
2003-02-14 05:02:35 97.104.111.3 [google.com...]
2003-02-14 05:02:35 97.104.111.3 [google.com...]
2003-02-14 05:56:45 97.34.115.94 [google.com...]
2003-02-14 05:56:45 97.34.115.94 [google.com...]

sugarkane

1:12 pm on Apr 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do the queries look like feasible ones that a surfer would actually use?

anallawalla

1:34 pm on Apr 15, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Sugarkane,

I have solved the mystery and you were on the right track. The client sent me an unedited extract of the log for one day and I see what has happened. He had trimmed out "excess" text but this included the file names. IIS logs are slightly different.

Those multiple hits were caused by custom 404 pages. The first line was the requested file and the one immediately below was the custom 404 file, and each subsequent line was for every graphic element on the custom 404 page.

The problem is that the Google search string is shown against all those lines! Therefore the trimmed log minus the file names looked like someone was repeating the same query rapidly. Occasionally, the visitor must have tried a second page successfully after getting the 404, so this explains why I would see a subsequent entry 2-3 minutes later from the same address.

Yes, the query strings appear to be what some people might type -- hard to say -- when I look at a search engine voyeur site I used to shake my head because most people don't know how to do it efficiently.

- Ash