Forum Moderators: open

Message Too Old, No Replies

Can You show content to spiders but not to Humans

Content site wants to be indexed but not let people see the content free.

         

exmoorbeast

9:57 am on Dec 3, 2006 (gmt 0)

10+ Year Member



I have a friend that has thousands of pages of content - basically its a newspaper's online version. They want to get pages indexed and are asking me if they could let spiders into the content, but non non paying humans. (ie non subscribed members)

There are people currently ready the site for free because of this. Ideally we'd like the site to be fully spiderable, but when a user clicks on the SERP, then a login would come up.

Perhaps someone can help me, I would be hugely grateful.

Thanks

phranque

12:06 pm on Dec 3, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the first part if the problem is getting all your content hidden from those who aren't already logged in.
this is usually done by having your script maintain a session for a user through cookies or parameters and checking for a session or user id before serving content.
the next part is usually accomplished by checking for known bots and allowing free access to content to these user agents rather than maintaining and checking for a session.

rytis

2:34 pm on Dec 3, 2006 (gmt 0)

10+ Year Member



In other words, getting free advertising he he.

Nope, you either buy Adwords (or buy whatever advertising that brings you visitors) and sell your content to visitors brought to your site by these ads. Or, give your content free for all, and get this free content searchable through search engines.

Showing different content to SE robots and humans is called cloaking and is a big no no from SE point of view - if cought could lead to banning the site from SEs alltogether.

You can have part of your content free and searchable and part accessible to those who pay.

R

exmoorbeast

5:59 pm on Dec 3, 2006 (gmt 0)

10+ Year Member



Phranque - great answer, thank you.

you said

the first part if the problem is getting all your content hidden from those who aren't already logged in.
this is usually done by having your script maintain a session for a user through cookies or parameters and checking for a session or user id before serving content.
the next part is usually accomplished by checking for known bots and allowing free access to content to these user agents rather than maintaining and checking for a session.

I don't really need to worry about the first bit as the system could be turned on when there was no one on the site, like late at night.

The second bit is very interesting, do you have to have known ip addreses, or can you just see its a bot because it's not just using a browser ie by user agent?

Once again, thank you so much for your reply.

pageoneresults

6:02 pm on Dec 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are people currently ready the site for free because of this. Ideally we'd like the site to be fully spiderable, but when a user clicks on the SERP, then a login would come up.

I'd consider "partial articles". Allow the pages to be indexed but only provide a snippet from the article and then they would have to login to view the full article.

They want to get pages indexed and are asking me if they could let spiders into the content, but non non paying humans.

You'd have to cloak to do this. And, its going to be somewhat risky. Not only from the SE standpoint, but if you are not doing it properly, that content is also at risk (to become publicly available).

Leosghost

6:27 pm on Dec 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Or ..make a "taster paragraphe" or summary page for each article available to all bots and surfers alike ..

with a "further details" link at the bottom of each page ..which opens on to your login page ..

using the cookie option for your subscribers ..

simply ban all bots ( search engine and otherwise ) and all non logged in users ( subscribers ) from your main content ..

this way is more work ..but is fool proof ..and carries no risk of dupe content penalties

exmoorbeast

11:27 pm on Dec 3, 2006 (gmt 0)

10+ Year Member



Thank you all.

Is it cloaking if you just detect that it's a bot and don't create a session requiring a login?

leadegroot

12:28 am on Dec 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is it cloaking if you just detect that it's a bot and don't create a session requiring a login?

No, thats fine.
In fact, some SE reps were asked exactly that at recent conference and they literally said 'yes, please do that' (this included Google)
Can't find a reference right now :(

pageoneresults

2:34 pm on Dec 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In fact, some SE reps were asked exactly that at recent conference and they literally said 'yes, please do that' (this included Google).

Anything to assist the bot with a "clean" indexing is in your best interest and that of the search engine.

I've also heard similar statements like that above. Think about it, you've now taken the extra step to make sure that the bot crawls only what it should be crawling.

phranque

3:11 pm on Dec 4, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I don't really need to worry about the first bit as the system could be turned on when there was no one on the site, like late at night.

The second bit is very interesting, do you have to have known ip addreses, or can you just see its a bot because it's not just using a browser ie by user agent?


you do need to worry about the first bit because your system has to be "on" for the "usual web browser visitors" simultaneously with the "search engine spider visitors".
the second bit is typically done with known user agent names rather than ip addresses, but it depends on your requirements.
please see this recent post for starters:
[webmasterworld.com...]

Is it cloaking if you just detect that it's a bot and don't create a session requiring a login?

it is not cloaking and it is required practice with SE spiders for which sessions are counterproductive.
bots cannot maintain a session with cookies and session ids in the urls cause massive duplicate content problems.

exmoorbeast

3:49 pm on Dec 4, 2006 (gmt 0)

10+ Year Member



Thanks for all the help - much appreciated. As soon as I mentioned cloaking, the tech team ran a mile, but i'm now trying to get them round to the SE's way of thinking.

Cheers

ronburk

7:10 pm on Dec 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ideally we'd like the site to be fully spiderable, but when a user clicks on the SERP, then a login would come up.

Just like the NY Times, and other newspapers. I see no problem with emulating their example. Let the bots in (e.g., identify Googlebot by IP address range), tell them not to cache the content, and require a login of all but the approved bots.

Many (most?) people on WebmasterWorld have a perception of "cloaking" based on pure superstition. Whenever someone uses the word "cloaking", the easiest thing is just to immediately say "are you trying to deceive the search engines"? If the answer is "no", then just go about your business and don't worry what any random person's definition of "cloaking" might be. Allowing SE bots to bypass login requirements is clearly not deceptive, and has been accepted practice for years.

Identifying SE robots and treating them specially when serving content is an absolute requirement for a great many legitimate website applications.