Forum Moderators: phranque
When a user logs in, they can use the "advanced search" feature, send private messages to other users etc.
There is also a facility where a user can see which other users have read their profiles.
Recently, I have had a couple of support queries by people saying that a user has been visiting their profile hundreds of times a day! His profile is "closed" i.e. they cannot tell who it is or contact him etc. They are scared it is the FBI or someone trying to hack into their business...
Initially I ignored it. Then when more people complained, I looked into it.
It appears to be a bona-fide account, nothing peculiar apart from this:
1) Yes, this user has been viewing other profiles hundreds of times a day, which IS strange.
2) His IPs are: crawl-66-249-66-107.googlebot.com or similar!
3) He has systematically gone through virtually ALL the profiles on the site.
Am I missing something here? Seems like someone at Google has created a user account, has fed that username and password into Googlebot, and has therefore allowed Googlebot to crawl member-only pages.
Is this known behaviour?
Should I allow it?
Is it good for the site?
I'm worried :-(
Surely there must be a simple rational explanation of how his IP address is 66.249.66.107...?
[edited by: isorg at 8:45 pm (utc) on Oct. 7, 2005]
You might also consider having the user enable cookies before being able to create an account. That will stop a bunch of bots.
When I created the site all those years ago, my clients were not all that IT savy. The site required cookies to log in, but everyone complained that Internet Explorer was not letting them access the site because their cookies were disabled by default. I think this was when Windows 2000 or XP had just come out, and IE disabled cookies by default. My clients did not know how to change the setting, and they complained bitterly that they could not access the site.
So I removed the cookies, and made it so that they would log in, and surf the private area with?username=xyz&sessionid=1234567 in their browser address. Every private page they accessed would require an authentication of the sessionid with that in the login database.
I Googled ["this person's username" site:the_site.com] and found 25,000 links to the site, with the person's username and sessionid.
Now, Google has somehow come to know this user's username and sessionid. Maybe be submitted his profile page to Google and did not remove the sessionid.
So Googlebot has presumably trawled through the site with his?username= in the address bar, and this is how his name has appeared in the logs.
He has logged out of that session, so noone can access his account. What I will try to do is reject a connection if the sessionid has expired/logged out, which is somehow not happening on non-sensitive private pages.
The profile pages do not actually check whether or not the user is logged in, as they are freely accessible irrespective of membership. If the user is logged in, he can set the skins and various other options. This is why Googlebot had access to the pages.
Mind you, it could be an interesting way to get more pages into the index...