Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is Google Actually Creating Your Thin Content and then Penalizing You?

         

incrediBILL

8:34 pm on Apr 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's one scenario I've run across and it could be happening to you!

Does your site have a home-brew search page, where you query your own local database and display the results?

Did you have that search page blocked in robots.txt and set to meta NOINDEX?

No?

Got AdSense and Google Analytics on those search results pages?

Bummer dude.

I recently stumbled across tens of thousands of duplicate pages in Google, maybe hundreds of thousands, pages that should have never been there logically because they didn't exist. You have to submit a query into a form, POST it, and then get a list of results, there are no links to this content from the website or the outside world, yet it exists in the index.

Apparently thanks to use of either AdSense of Analytics, maybe both, Google also saw every page of search results and indexed that crap and just kept indexing it until it overflowed into a mind blowing number of pages that nobody would ever know they had indexed because they obviously don't generate any traffic.

If you have a custom search, I'd suggest you go look to see if the results of those searches are being indexed ASAP because it could be all that stands between you and a thing content penalty.

Regardless, block your site search pages with robots.txt and meta NOINDEX just to make sure Google's other tools don't feed Googlebot any data it shouldn't have.

Live and learn.

Pjman

9:17 pm on Apr 11, 2011 (gmt 0)

10+ Year Member Top Contributors Of The Month



This is why I always I have my search in a single directory and robot.txt block the directory. I learned that a few years back. Good share.

Shatner

10:11 pm on Apr 11, 2011 (gmt 0)

10+ Year Member



I'm 99% sure this is what happened to me and it's why I was Panalized.

netmeg

10:28 pm on Apr 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yea, I always no index search results, for myself and my clients.

(Also sorting urls, pagination urls, tags, and anything else that can create tons and tons of crap)

Shatner

10:33 pm on Apr 11, 2011 (gmt 0)

10+ Year Member



In my case it wasn't just search results, but also Google creating bogus tag pages which never existed and contained no content, and also creating bogus URLs for regular pages which didn't have "canonical" on them.

I have now noindexed them all and put canonicals on everything, but since Pandalization is apparently permanent, it's too late to solve the problem that Google created and there was probably no point in doing it.

deadsea

1:28 am on Apr 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Shatner: From a google post about panda today: "As sites change, our algorithmic rankings will update to reflect that." [googlewebmastercentral.blogspot.com...]

I hope that means that Google will soon start re-ranking some pandalized sites that have made the appropriate changes.

Robert Charlton

1:35 am on Apr 12, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Is Google Actually Creating Your Thin Content and then Penalizing You?

Good questions.

Did you have that search page blocked in robots.txt ...

In this discussion on Google Commerce Search 3.0, I mentioned my surprise that none of the sites Google featured as examples had their search pages blocked....

Google Commerce Search 3.0 Now Adds Instant Search
http://www.webmasterworld.com/google/4289079.htm [webmasterworld.com]

I was half-wondering what was going on. The results pages displayed as query landing pages look good enough and useful enough that, with a bit of work, they conceivably might not in fact be considered thin. And it was hard to read between the lines in what Google marketing was saying.

I continue to assume that Google Search Quality does not like to index search pages.

...block your site search pages with robots.txt and meta NOINDEX...

Use both ? ? ?

incrediBILL

2:01 am on Apr 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use both ? ? ?


Take no chances.

If one fails or becomes inoperative you have a safeguard.

It's easier than addressing the mess you could end up with in the event of a malfunction!

Robert Charlton

5:27 am on Apr 12, 2011 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Use both ? ? ?


Take no chances.

If one fails or becomes inoperative you have a safeguard.

Brilliantly strategic, given the particular goal we're discussing here, which is keeping the content of pages in a search directory from being indexed. Thanks for adjusting the way I generally look at this.

We're not worrying here about keeping urls to individual search pages out of the index, nor about preserving PageRank flow.

Just to caution the casual reader that using both robots.txt and meta noindex together is not always a good idea, I suggest these two threads....

Robots.txt blocking and Google's behavior
http://www.webmasterworld.com/google/4284732.htm [webmasterworld.com]

robots.txt - Google's JohnMu Tweets a tip
http://www.webmasterworld.com/google/4143083.htm [webmasterworld.com]

dstiles

10:26 pm on Apr 12, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I generally have search forms on all pages if there is one at all.

I block the target directory in robots.txt plus a search-page meta tag for noindex,nofollow.

I also wrap a bit of code around the form to prevent it being seen or submitted by non-browsers and/or trap non-browsers within the search page and 405 them.

Maybe paranoia, but I do NOT trust ANY search engine to get it right.

Shatner

12:30 am on Apr 13, 2011 (gmt 0)

10+ Year Member



>>>@deadsea Shatner: From a google post about panda today: "As sites change, our algorithmic rankings will update to reflect that"

Don't believe it. I fixed all those above mentioned issues more than a month ago. No change, except that things continue to get slightly worse.

crobb305

12:44 am on Apr 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I hope that means that Google will soon start re-ranking some pandalized sites that have made the appropriate changes.


Shatner and Deadsea,

I assume you guys were Pandalized here in the U.S. back in February. Do you still find your sites Pandalized on the new international Panda that rolled out yesterday (like on google.es, .fr., se, etc)? I ask because I see my site in full recovery on all those. Top rankings with sitelinks. I see no recovery in the U.S. as of yet. It makes me wonder if Panda international has incorporated some/all of the changes that I/we have made as of the time of deployment (i.e., within the past 72 hours)? Otherwise, something else is going on with my site and I can't put my finger on it.

walkman

12:50 am on Apr 13, 2011 (gmt 0)



"As sites change, our algorithmic rankings will update to reflect that"

Keep dreaming, you're penalized for x weeks or months automatically, not one person has said "we're back" in the 7 weeks. The bigger joke is this "we’ve found the algorithm is very accurate at detecting site quality"

realmaverick

1:13 am on Apr 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unfortunately for me, my search pages are /searchterm-search.html

Is it possible to block *-search.html pages as a wildcard?

The pages are already noindexed but still indexed.

Not quite sure how to get past this one.

crobb305

1:32 am on Apr 13, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Regarding my post above, it looks like there are still some Google sites running pre-Panda English results such as Google Israel, Google Hong Kong, Google Egypt, essentially any site that doesn't primarily use the western alphabet seems to be still running the old stuff. So that explains what I was seeing on international searches.

AG4Life

2:01 am on Apr 13, 2011 (gmt 0)

10+ Year Member



I had a penalty for this almost exact thing about a year ago, lasted a very very long 6 months.

But it wasn't Adsense/Analytics that made the mistake, it was me. A silly mistake on my part, doing 302 redirects (and sometimes even a double 302 redirect) when I should have been doing 301 redirects or no redirects at all. I've also now made these search result pages "noindex", and Google has removed all of them from the search index (took a long time though, but you can speed things up a bit by requesting a faster crawl rate in G).

If those pages are already in the index or somehow getting in without having actual links pointed to them, using robots.txt to block them is actually counter productive - you're basically telling Google that now that they've indexed a bunch of useless pages, to stop visiting these pages to determine whether they are still there, still useful, or have been noindexed. Google will then have no clue as to what to do with these pages, and will probably keep them in the index for as long as possible.

What you want is to fix the problem. Noindex any newly created search result pages, and then get Google to recrawl, making sure that when Google recrawls one of the old search pages, that these pages performs the correct actions (if these pages still exist, noindex, or 404 them - be weary of redirects).

Also, double check to see if you're linking to these search results pages or not, perhaps a tag cloud somewhere, or "related searches" links or improper redirects.