Google Site Search behind Login

Forum Moderators: anallawalla & bakedjake

Message Too Old, No Replies

Google Site Search behind Login

terrydba

9:32 pm on May 23, 2014 (gmt 0)

Hello. First time post. I hope this is the most appropriate forum.

I have read that Google Site Search is not able to index documents behind a login.

I have also read that for the general Google index, the googlebot can be allowed to index behind a login, and that Google requires a First Click Free policy is in place.

I have read that for the general index, the googlebot is permitted by examining the user-agent.

My question is 3-fold: Can the Google Site Search crawl be allowed behind a login by the same method? If so, can the crawled content be added to the Site Search catalog and not the general Google index? Finally, since it is site search, is the First Free Click required if the crawl can be made to work?

Thanks!

incrediBILL

1:57 am on May 24, 2014 (gmt 0)

If you want Google to index behind a login remember everyone will still be able to view your documents via Google cache so use the meta robots NOARCHIVE on all your pages to avoid this issue.

Search is Search, the spider will need the same considerations for a site search or it won't be able to crawl and the world can see the content via cache without logging in without NOARCHIVE.

lucy24

3:31 am on May 24, 2014 (gmt 0)

If so, can the crawled content be added to the Site Search catalog and not the general Google index?

Heh, that's funny, I was just speculating about the same thing myself yesterday. The sad conclusion was that if I want a site search to include content that isn't available in the general Google search, I'd have to code my own. Site search is basically a more elegant form of the "site:" operator. It looks different to the user, but under the hood it's just ordinary google search results, constrained to material on the present site (or material from a hand-picked list of sites, if you want to get fancy).

You can include login-required pages in a search engine's index. Details depend on your server, but the underlying concept is "Satisfy Any": visitors have to either log in or be the Googlebot.

You probably don't want to do it, though. Making login-required material visible in search results to non-logged-in humans is a good recipe for creating annoyed users.