Msg#: 4526979 posted 2:31 pm on Dec 11, 2012 (gmt 0)
My main website is completely custom designed, by me, and since I started it before CMS such as Wordpress became popular, I never made the transition so as not to deal with the headache of the redirects etc. Plus I like to have full control.
I am built a basic search facility so people can search through articles and reviews, the usual. Are there any repercussions in having querystrings such as /?page=xxx&query=xxx in terms of SEO rankings?
I see a lot of sites esp Wordpress etc have a search box so am I just being a bit paranoid? Do I need to make the search result pages non-idexable? And would Googlebot place and index lots of queries which might end up having a negative effect? Or is this a myth from ages ago?
Msg#: 4526979 posted 5:00 pm on Dec 11, 2012 (gmt 0)
You NEED to use robots.txt or noindex to prevent google from crawling or indexing those pages. Google has been known to delist entire sites that have indexed search result pages.
On the google support forums, there was a thread that Matt Cutts participated in. The site owner was asking why he was penalized. The site had a box on the home page that looked like a search box (although it was meant to put in numbers). Matt Cutts gave examples of putting #*$! terms into the box and showed that the site gave back indexable pages full of products. It didn't matter that those pages had nothing to do with #*$! terms, the product list was just a default list of products and you got the same page back for any term that wasn't the type of number that the script that powered it was expecting.
You don't want to be in the situation where a Google reviewer types in "Viagra" into your search box, finds an indexable page, and imposes a penalty on your entire site.
Msg#: 4526979 posted 6:23 pm on Dec 11, 2012 (gmt 0)
Either robots.txt OR noindex should be sufficient.
It doesn't make sense to use both. If you put it in robots.txt, there is no way for Googlebot to crawl it to find out that it is noindex. There does not appear to be a way to tell Google that you don't want something crawled and you also don't want it indexed. If they can't crawl something they generally don't index it (so robots.txt is sufficient for this case), but if a page that is in robots.txt gets enough external links they may index it based on the anchor text and context of those links alone.