Forum Moderators: open
For over 5 years I have been running a pretty successful paysite with articles. It has generated enough money to make a living from, although it hasn't been my primary occupation. For the last two years, however, income has been dropping as competition has gotten more fierce.
I decided to focus my efforts on getting a better google-listing just three months ago. My PageRank was already 7, but I wanted a highher position on the main keyword in my industry - I thought _that_ would be the big killer.
Reading up on webmasterworld, I looked at google and realised that very few of my pages were actually indexed. I changed the urls to appear parameter-less (not "page.php?var=value", but "page/var/value") and already I could see googlebot crawling deeper than ever before.
Next I did something pretty smart. On my main page I made links to specific googlefied article list pages.
If a user clicks on the link, the script can see he has a http_referer, and he is immediatly forwarded to the _proper_ article list.
However, if Google goes to the googlefied page, it will see a special list of links to articles (googlebot never has a http_referer, so it's not forwarded). Google visits an article from there. The article is shown, the _entire_ article (which people have to pay to see)... but the lines are shuffled! So even if users see the google cache, its useless to them.
The link to the shuffled article page is now listed on Google. When a user clicks on the link, he has an http_referer, and immediatly gets forwarded to the proper article page - where he has to pay :)
The result of this work? Now over 80,000 pages have been indexed (articles in shuffled and normal mode + all sorts of other pages), and my traffic has just about doubled! People don't just arrive after having searched for "keyword1 keyword2" but also for specific keywords _in_ the articles.
As a consequence, my income has just about doubled :)
So thank you to everyone on WebmasterWorld for giving advice. I hope you can use this advice yourselves.
P.S. The forwarding technique described above is known as "referer cloaking". It's not officially allowed by Google, but I have done it for two months now, without any problems and I have read of others too, who have done it without any penalties.
Unfortunately cloaking has a bad reputation of supplying different, and sometimes misleading, information to Spiders than the Users see which is often not the case. Sometimes but not always.
In your case you have made intellegent use of what would be termed cloaking to index your information, thereby increasing the quality of the Search Engines' Databases, and yet protecting your property and income. Any Search Engine would be hard pressed to think otherwise.
Of course, another option would be to include the no-cache tag but that would only apply to Google, not other engines, so in this case I think you have made the better choice.
Well Done. Definitely food for thought.
Onya
Woz
OTOH, I have to admit that it was rather clever :)
I think his pages are ranked correctly because he offers the article that the users sees in google on his website. Nobody can blame him only because he wants money for his work. Many people believe that there will be everytime averything in the Internet for free, but I don't think thats wright. More and more offers cost a little bit of money, so why not his articles?
I changed the urls to appear parameter-less (not "page.php?var=value", but "page/var/value") and already I could see googlebot crawling deeper than ever before.
Without a doubt, that is the most important thing any one with a dynamic site can do. Not only for Google but for every engine out there. CGI form based (get or www-urlencoded) urls simply don't work for promotion purposes today. Get rid of them like the plague.
Yes, I am officially the webmaster, not really the SEO, but then again, we are only two people working on the page, so I guess I have nearly every title you can come up with :)
As for Google not wanting this kind of behaviour, I can understand GilbertZs point.
Just curious: how does WSJ cloak their articles? I cant find any of their articles right now on news.google.com, but several times I have clicked on a news item and been forwarded to a pay page. Anyone knows how that works?
Personally I think my pages and WSJ pages should be indexed, even if they are not free. The question is how they should be ranked or listed on a search result page. If I were really looking for a piece of info I would rather have a paysite coming up with a result, than no results at all. I think Google would agree on that at least.
new_shoes
My point was not about paid vs free at all. If the content was exactly the same i would have no qualms, but here he is presenting up randomly mixed up text and his on page factors could be totally different from what the users see, and we know how important that is to ranking.
It *could be* that users would go to a page where they have high expectations of getting good freely available (and thats not the same as "free" necessarily) content based on a snippet and finding out that have to pay or go through some rigmorole for it. Indeed the page probably wont contain those words in the snippet at all, so to my mind that is misleading.
Generally Im in favour of search engines being able to find material that is behind passowords as the "hidden web" is enormous and information does have a value. But in this case, my initial feeling is that it is a bit deceptive. Now if there was a way of the google listings indicating that this is a paud article (like Northern Light for ex,) it would solve that problem, but the model is completely different and in that case, not even close to being proved as a winner ;)
On the Web, I usually want to find info that minute. Its not the paying for it is a problem for me, its the fact that I have to wait for it, or pay for something I only use once, and am not sure whether it is actually going to answer my question. Wait for a book to be delivered, or have to spend time evaluating a service for its cost effectivness where I only need one bit of info, and im off elsehwere, and maybe to another search engine.
Yes! I agree. However, I also did something else, to increase my ranking.
I actually changed the urls from:
/page.php?article_id=111
to
/page/111_the_title_of_the_article.html
the script ("page" or actually "page.php") stripped the one parameter and removed everything after the first "_". This means I get the proper article id, 111. Everything after the first "_" is just for google.
The reason I did this was because:
- match in url seems to increase ranking
- match is longer and stands out
- user will see the article title twice: once from the <title> and once from the url
- I added the .html and not .php and I read that google has preference for it.
Anyway, I wrote another post just about this and how to implement it. You can read it here. [webmasterworld.com]
- new_shoes
Well, I have added some extra precautions. If you have already initiated your session (ie. seen other pages on my site), you will be forwarded. Googlebot initiates a session with every single hit.
Yes, some of my users have referer turned off (even some where the ISP seems to turn it off), but it's very rare in my estimation.
If you should end up on the googlefied article page, it is explained that the page you are looking at is just for indexing purposes and there is a big link to click on to view the actual (payware) article.
I have chosen not to use user_agent as this kind of cloaking is much more likely to be detected by googlebot. As far as I have read, googlebot sends out "fake" requests with other user_agents to verify that cloaking is not used.
So no, the method is not fool-proof.
OTOH I expect that many sites use referer-checking for authentication or tracking, and more and more users will feel "forced" to enable sending the referer, when users see that sites don't work as intended. Just like it has become the case with cookies. Try browsing with cookies off - most sites wont work (eg. hotmail, cnn etc.).