Welcome to WebmasterWorld Guest from 34.229.126.29

Forum Moderators: open

Message Too Old, No Replies

Paywall content and getting indexed in Google.

Can you eat your cake and have it too?

     
8:20 pm on Jun 5, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 16, 2001
posts: 2024
votes: 4


Interesting article in Bloomberg about Wall Street Journal (WSJ) having lower rank after implementing their paywall [bloomberg.com]. Is it possible for news/content sites to eat their cake and have it too? I would think there is a way to whiteliste robots (GoogleBot) full access to articles while presenting a paywall to new users.
9:35 pm on June 5, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15638
votes: 795


The details will obviously depend on your server type. But yes, it's extremely easy to set up multiple access criteria: “To see this content, you must either be logged in or be the googlebot.” The question is whether you’d want to. As a human, it disgusts me when something promising comes up in a search--and then when I click on the link I'm presented with two lines of text and a “You must be logged-in to read this article” message. Do a lot of humans use search engines to find content on sites they already subscribe to?
10:51 pm on June 5, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 16, 2001
posts: 2024
votes: 4


Wall Street Journal stated they want to have a paywall and fully have their content indexed by GoogleBot. WSJ obviously found SE referral traffic to be more important than serving non-subscribed visitors. From a technical standpoint, I don't understand how you can allow GoogleBot from Google, yet prevent someone spoofing GoogleBot and view the full content for free. Makes me think WSJ perhaps did not have a competent web server admin and lost out revenue from search engine traffic because their change was implemented poorly.
11:42 pm on June 5, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15638
votes: 795


I don't understand how you can allow GoogleBot from Google, yet prevent someone spoofing GoogleBot

Any website worth its salt will categorically block visitors who claim to be the Googlebot but come from a non-Google IP.
2:17 am on Oct 3, 2017 (gmt 0)

New User

joined:Sept 14, 2017
posts:10
votes: 0


Thank Lucy24, how can I do that for my site?
6:27 am on Oct 3, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3473
votes: 76


>>Thank Lucy24, how can I do that for my site?

the usual way is to test the UA for the expected googebot string - there are different googlebots, you may only want to allow some of them or you may want to allow them all.
if the UA is a googlebot, then do a forward and reverse DNS lookup to check it is coming from a google IP address. you can also maintain a list of IP addresses that google uses for bots, or a combinaion of the two - for instance do the forward/reverse dns lookup and somehow cache the result for a set period of time, so that next time check the cache first, before doing the lookup (as it is obviously quicker)
4:13 pm on Apr 15, 2018 (gmt 0)

New User

joined:Apr 15, 2018
posts:1
votes: 0


Any website worth its salt will categorically block visitors who claim to be the Googlebot but come from a non-Google IP.


And how on earth do you know all the IP addresses Google bots are using? You might know some range but if you're not the one deploying their server parks you have no idea about all the ranges.
10:32 pm on Apr 15, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9714
votes: 925


You don't have to know all the ranges, only the ranges of the bots you allow. That's a much smaller slice.
2:06 am on Apr 16, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15638
votes: 795


And how on earth do you know all the IP addresses Google bots are using?
If you have personally met the bona fide Googlebot crawling English-language content from a range other than 66.249.blahblah, many people hererabouts would like to hear about it. A few years back, Google said they were going to start using other, non-ARIN ranges. But I can't remember anyone posting hard evidence that they were actually doing so.
4:18 am on Apr 16, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11682
votes: 205


you shouldn't always rely solely on whitelisting IP ranges.

this is the proper way to verify googlebot IPs:
[support.google.com...]
7:32 am on Apr 16, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:Mar 25, 2018
posts:500
votes: 101


And how on earth do you know all the IP addresses Google bots are using?

You don't have to. Since you are speaking about dynamically generated page (PHP or other), at the beginning of your script, you just resolve the IP address of the client accessing the page, and it has to be "xxxx.googlebot.com." (or "xxxx.google.com."). Then to be sure, you do a reverse DNS on "xxxx.googlebot.com." and you have to get the same IP address. That's all, and this is explained here : [support.google.com...]

Now, the thing is, this is gray area to serve different content to Googlebot and human visitors. Big-big sites, might not risk anything, but publishers like us, risk to get our site banned by doing so...
8:15 am on Apr 16, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11682
votes: 205


there's no risk of being banned if the implementation is done properly.

How to indicate paywalled content

Publishers should enclose paywalled content with structured data to help Google differentiate paywalled content from the practice of cloaking, where the content served to Googlebot is different from the content served to users. If no structured data is provided to indicate the paywall, the paywall may be mistaken for a form of cloaking and the content could be removed from Google.

For detailed specifications on implementing the structured data, visit our Developer documentation.

We encourage publishers to experiment cautiously with different amounts of free sampling so as not to unintentionally degrade user experience and reduce traffic coming from Google.

source: [support.google.com...]

perhaps wsj's implementation provides a "degraded user experience".


google's Developer documentation:
Subscription and paywalled content [developers.google.com]
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members