I recently came across a site that seems to be serving limited functionality to search engine user agents.
I have always been extremely cautious when serving different versions of a site based the user agent for obvious reasons. In this example content and pages are identical so I don’t believe they are in any way fooling engines for potential gains.
In the example I have seen the functionality that is restricted is limited to the sites search facility and check out process.
My first question is why anyone would do this when pages from the aforementioned areas of a site can be excluded via robots protocol?
My second question is whether anyone believes a penalty can be attached to this, when in effect you are only restricting access areas of the site search engines wouldn’t need to index?
I don't think this will be cloaking. Anything that has to do with forms for instance should be restricted via session cookies. You don't want spiders to login, create accounts, checkout, adding products to the cart or performing searches.
A good approach will be to redirect spiders and those who block session cookies to a page with instructions what the site requires in order to access them.
The robots.txt is more of a guide. It does not guarantee spiders will not access a page. If for example someone posts a link to an external site for your customers account page a spider will access it. And it should be redirected to say the login or create account page.
Now if your site is structured properly there should be no problem because the forms will use a post method so spiders will never be able to process them. So for example a search box can be visible in every case but a search cannot be accepted as there is no link tag really (so spiders cannot submit it), plus the script validates the posted data for humans who may block the session cookies.