homepage Welcome to WebmasterWorld Guest from 54.166.116.36
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
Bots and frames
manfredkooistra



 
Msg#: 4080410 posted 11:35 am on Feb 15, 2010 (gmt 0)

I have a website with frames. For the sake of the discussion, let's assume the page, as viewed in a browser, consists of two documents:

index.php : the document with the framest, containing only one frame

frame.php : the document loaded into the single frame on index.php

Now, when index.php is called, it writes the current timestamp and the remote server ip into a database. When frame.php is called, it reads the database and compares the current timestamp and remote server ip with the entries in the database. If there is an entry with the same ip and a timestamp equal or smaller than the current one (now or in the past), the page is displayed. If there is no entry, this means that frame.php is loaded directly and the user is redirected to index.php.

What this does is make sure that frame.php is always displayed within index.php, not alone.

To make sure that the visitor can not simply visit the frameset and then call frame.php directly, because then there is a database entry with this ip, I installed another trick: index.php writes the value "yes" into the database, when it writes the timestamp and ip; and frame.php looks for a line with the correct timestamp and ip and a "yes", and then updates this row to "no".

Now, when I look at the resulting database, I see that about half the entries show that index.php has been called, but without calling frame.php, because about half of the rows have "yes". That means, frame.php has not been called, because it would have updated the row to "no".

I tried to reverse dns some of the ips, and a few of them show crawlers like googlebot and the bots from yahoo and msn. So obviously they don't follow the src to the framed page. Which is not a problem, since I have a meaningfull noframes content.

What I'm wondering is, what all the other entries are. They could be bots from other search engines, or they could be Lynx users, but for one, my page is a photography page, and all the links leading to my site make this clear, so I'm pretty sure that I woudn't attract so many users with text browsers. Also, several hundred hits per day from different crawlers seems like a lot more search engines than I know of.

So I'm wondering if all of those visitors have frames disabled. Which would surprise me, since have turned off JavaScript or Autoredirection in my browser, but have never even considered turning off frames. What's the point? There is not danger in frames and I'll only have problems viewing the many pages that still use frames.

So they must be bots. And since I run a photography website, those could be image harvesters or site rippers. Or any of the other kinds of bots that search for email addresses or insert comment spam or whatever.

So what are your thoughts on this? Can it be that half the hits I recieve on a website with photography (nude) can be bots? Or do so many users actually have frames disabled? Or what do I see?

I can recognize some bots, because they hit my site every minute at the exact same second, but those are very few. Most of the noframes-hits are singular.

 

corrideat

5+ Year Member



 
Msg#: 4080410 posted 12:14 am on Mar 7, 2010 (gmt 0)

First of all, I wouldn't display the page as you do, as this may cause accessibility problems as the ones you are saying to have.

To get an impression of what these strange visitors you are having, however, you should log their user agents as well (beware that some bots forge real browsers headers)

On the other hand, for the sake of efficiency, instead of writing to the database the "yes" or "no" strings, I would compare the timestap to be lesser than, say, three seconds. Of course that you want to do this AFTER you've figured out why half the visitors don't get the frame.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved