|Googlebot has adopted the account of a registered forumuser|
|I Will Make It|
| 3:14 pm on Jan 22, 2011 (gmt 0)|
Well... this is very strange.
I run a forum - 2 years old.
As soon as my users are registered with phpbb3-forum, they get a set of free wedding-tools, which only is viewable by the owner herself of that account.
One of the tools is a "wishlist" for their wedding. Since you have to own the account to look at you own wishlist - it's not possible for google to index those pages - but google has actually done that, and here is how:
I know for sure - that this tdimmen-user is a real person. I know this because in my db, the user have filled out information that a googlebot couldn't have done by itself.
But the tdimmen-user is logged in all the time now - as in: Never logs out, always active, reading forums and stuff, and if I check the user, I get this information from the phpb33 software:
tdimmen IP: 18.104.22.168 » Whois
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Anyone knows how this is possible?
To me, it really looks like googlebot has adopted the tdimmen-useraccount - but that's very wierd, it has to know tdimmen-password to be able to log in!?
| 4:01 pm on Jan 22, 2011 (gmt 0)|
google gets links from loads of places, not just their spider. if the user has google toolbar installed, then that will likely pick up the link. or maybe they bookmarked the page in their broswer, or used google bookmarks.
if the URLs have some kind of userid on the end of the query string and google keeps crawling the url, then presumabely it would look like the user was doing it
|I Will Make It|
| 6:00 pm on Jan 22, 2011 (gmt 0)|
Ok, so what could I do, or should I do?
I don't like the idea of google indexing the parts of the site that is restricted to members only ;)
Even though I get many more pages indexed though :)
| 6:15 pm on Jan 22, 2011 (gmt 0)|
are they actually being included in the SERPs, or just being crawled?
if they are just being crawled, then i wouldn't worry about it. anyone can visit the URL. try it yourself. your script probably doesn't let you see the same content that he sees if you're not properly logged in.
hopefully being logged in requires more than just sticking an ID in the query string, or visiting the member's page. if it doesn't, then you've got bigger problems...
if the spider cannot see the content, but is still including them in the SERPs anyway, then all you've got to do is block those pages in robot.txt, or add a robots meta tag to the pages
|I Will Make It|
| 9:16 pm on Jan 22, 2011 (gmt 0)|
Sure, the pages are unreachable unless you are logged in. This is very secure - and requires of course more than just the page-ID.
But what should I do with the bot, that is logged in as one of my users.
Everyone visiting the forum, can see this user is reading the forums 24/7, and this user isn't active in the forum no more - but she uses the other tools we provide.
As in: Both the real user and the googlebot will very often be logged in as the same user - since googlebot almost always is logged in as tdimmen.
Check at the bottom of bryllupsvenner[dot]no slash bryllupsforum - and look for tdimmen.
While inside my forum, also search this text: "Hvem er i forumet" and press that link in the bottom - you will find the username i'm talking about, with googlebots IP-adress.
| 8:43 am on Jan 23, 2011 (gmt 0)|
have you tried a "fetch as googlebot" in google webmaster tools?
i am guessing you are doing some cloaking that allows the bot to see things that a "non-verified-user" visitor cannot see.
| 12:06 pm on Jan 24, 2011 (gmt 0)|
phpBB3 identifies and has special group for bots, they have their own profiles and permission set.
Your issue might be related to this: [tracker.phpbb.com...]