|Bots and frames|
| 11:35 am on Feb 15, 2010 (gmt 0)|
I have a website with frames. For the sake of the discussion, let's assume the page, as viewed in a browser, consists of two documents:
index.php : the document with the framest, containing only one frame
frame.php : the document loaded into the single frame on index.php
Now, when index.php is called, it writes the current timestamp and the remote server ip into a database. When frame.php is called, it reads the database and compares the current timestamp and remote server ip with the entries in the database. If there is an entry with the same ip and a timestamp equal or smaller than the current one (now or in the past), the page is displayed. If there is no entry, this means that frame.php is loaded directly and the user is redirected to index.php.
What this does is make sure that frame.php is always displayed within index.php, not alone.
To make sure that the visitor can not simply visit the frameset and then call frame.php directly, because then there is a database entry with this ip, I installed another trick: index.php writes the value "yes" into the database, when it writes the timestamp and ip; and frame.php looks for a line with the correct timestamp and ip and a "yes", and then updates this row to "no".
Now, when I look at the resulting database, I see that about half the entries show that index.php has been called, but without calling frame.php, because about half of the rows have "yes". That means, frame.php has not been called, because it would have updated the row to "no".
I tried to reverse dns some of the ips, and a few of them show crawlers like googlebot and the bots from yahoo and msn. So obviously they don't follow the src to the framed page. Which is not a problem, since I have a meaningfull noframes content.
What I'm wondering is, what all the other entries are. They could be bots from other search engines, or they could be Lynx users, but for one, my page is a photography page, and all the links leading to my site make this clear, so I'm pretty sure that I woudn't attract so many users with text browsers. Also, several hundred hits per day from different crawlers seems like a lot more search engines than I know of.
So they must be bots. And since I run a photography website, those could be image harvesters or site rippers. Or any of the other kinds of bots that search for email addresses or insert comment spam or whatever.
So what are your thoughts on this? Can it be that half the hits I recieve on a website with photography (nude) can be bots? Or do so many users actually have frames disabled? Or what do I see?
I can recognize some bots, because they hit my site every minute at the exact same second, but those are very few. Most of the noframes-hits are singular.
| 12:14 am on Mar 7, 2010 (gmt 0)|
First of all, I wouldn't display the page as you do, as this may cause accessibility problems as the ones you are saying to have.
To get an impression of what these strange visitors you are having, however, you should log their user agents as well (beware that some bots forge real browsers headers)
On the other hand, for the sake of efficiency, instead of writing to the database the "yes" or "no" strings, I would compare the timestap to be lesser than, say, three seconds. Of course that you want to do this AFTER you've figured out why half the visitors don't get the frame.