Forum Moderators: open

Message Too Old, No Replies

identifying user views resulting from spiders

         

newbie6

4:05 pm on Nov 27, 2007 (gmt 0)

10+ Year Member



I'm trying to determine the benefit/cost ratio of search engines, that is how many real user page views you get as a result of paying for bot bandwidth.

I get 1572 hits/mo from 65.214.39.180 (registered to ask.com) and 1025/mo from 193.95.154.69 (their UK arm), most of which are for my robots.txt; all the other requests from them have Minefield in the requesting agent field, which seems to be a Mozilla project. Are those user views or bot requests?

wilderness

4:34 am on Nov 28, 2007 (gmt 0)

newbie6

12:35 pm on Nov 28, 2007 (gmt 0)

10+ Year Member



Many thanks - I'll ban them too.

I'll try to learn how to use the search feature better.

newbie6

11:49 pm on Nov 28, 2007 (gmt 0)

10+ Year Member



I've found lots of info here about how to identify search bots, but nothing about how to identify the benefits they provide - user page views.

Here's what I think is correct so far:

For each GET *.html entry in my access log, search for bots first (case insensitive):
Google: googlebot 200/mo
MSN: msnbot 1500
Yahoo: slurp 3000
Ask: minefield 500

Once these entries are removed, search for a user hit from the engine:
Google: .google. 10,000
MSN: msn.co OR search.live 300
Yahoo: search.yahoo 1000

So, I seem to be getting a good (48:1) benefit from Googlebot, but less than breakeven from MSN or Yahoo. I haven't figured out how to identify ask.com user views.

Has someone looked into this that I haven't located? If not, any suggestions, particularly of bot-directed users I haven't identified?

wilderness

12:16 am on Nov 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Theoretically the only "gauge" we have for stats are in the referrer field of visitor logs.

Many folks today are using browsers with the refer turned off, or, they may even copy and paste the link (direct access absent of referral), and even proxy or content filters make all this difficult.

"Historically" Jeeves has never provided enough visitors to most websites in comparison to justify the constant crawling their bot (s) do.
Add to that, more Jeeves tools crawling from various ranges simultaneously and the rate of visitors is even lower.

newbie6

3:19 pm on Nov 30, 2007 (gmt 0)

10+ Year Member



Many thanks for your comments, Wilderness.

Further comments welcome any time.

[edited by: engine at 1:20 am (utc) on Dec. 3, 2007]
[edit reason] No URLs, thanks [/edit]

newbie6

4:29 pm on Dec 17, 2007 (gmt 0)

10+ Year Member



An update on determining the user views resulting from search bot hits: I have results for all significant sources I can identify at

<snip>

Two months after robots.txt'ing out Yahoo and MSN, I have an interim result of a 14% drop in user views vs. the 3% expected. Metasearchers aren't significant. I'm looking for other sources of traffic dependent on these two bots, and will appreciate any suggestions beyond the refer-off/proxy browsers that have already been mentioned here.

[edited by: volatilegx at 3:54 pm (utc) on Dec. 18, 2007]
[edit reason] no urls please [/edit]

newbie6

2:28 pm on Apr 2, 2008 (gmt 0)

10+ Year Member



Hi moderator, then how do we refer to data we've found that is too extensive to post? I'd appreciate comments on my method.