homepage Welcome to WebmasterWorld Guest from 54.167.185.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Should I disallow search results pages from my website?
nestman




msg:4499501
 11:38 pm on Sep 24, 2012 (gmt 0)

For years google has been indexing many of my search results pages, which contain links to the articles on my site about essential oils, like this:

example.com/searchResults.php?query=lemon
example.com/searchResults.php?query=lavender
example.com/searchResults.php?query=clove

etc.

The thought recently occurred to me that perhaps I should disallow searchResults.php in my robots.txt file, since all these pages do is show a table of results that link to the various articles. What do the pros say on this forum? Should I continue to let Google do this? Or help direct them to the actual content by filtering out the search results pages?

Thank you!

 

Andy Langton




msg:4499508
 12:04 am on Sep 25, 2012 (gmt 0)

It's a tricky question, nestman, and one I imagine many SEOs have wrestled with at one point or another.

I would suggest building a bit of an evidence base for your decision. First up, do your search result pages attract search traffic?

Sgt_Kickaxe




msg:4499519
 1:14 am on Sep 25, 2012 (gmt 0)

I would not disallow any page on your website with robots.txt anymore, instead use the noindex meta tag or header response. If you disallow crawling of any of those links they will end up indexed with a message saying they couldn't be crawled and you get no credit for the links on the page.

nestman




msg:4499522
 1:24 am on Sep 25, 2012 (gmt 0)

Can you elaborate on why I would want credit for links that Iím considering filtering out anyway? Thank you.

tedster




msg:4499541
 3:34 am on Sep 25, 2012 (gmt 0)

I don't know that links on the page would be a major factor here. In my view, that's clutching at PageRank circulation a bit too intensively. However, I do agree that a noindex meta might be a good way to go, given how aggressively Google works to "discover" new URLs today.

At the same time, we have had advice from Google engineers - for several years - that they don't want to index site search URLs when that leads googlebot into an infinite crawl space. I've always used robots.txt to block the crawling and never suffered any issues that I could see.

Leaving site search open to crawling usually means having googlebot throw all kinds of keywords at your search form, and usually to no good result. That said, Andy's question is pretty important, I think. Does the present situation bring you any good traffic?

mslina2002




msg:4499558
 5:45 am on Sep 25, 2012 (gmt 0)

I always "noindex, follow" my search results pages simply because you can have umpteen variations of the same thing and we know people and bots can spell. Otherwise you risk having pages of the same thing in the index causing dupe content.

example.com/searchResults.php?query=lavender
example.com/searchResults.php?query=Lavender
example.com/searchResults.php?query=Lavendar+oil
example.com/searchResults.php?query=lavender+oils
example.com/searchResults.php?query=lavender+Oils&page=2

I would suggest building a bit of an evidence base for your decision. First up, do your search result pages attract search traffic?

I never thought to investigate this aspect. If the answer was 'yes' what would you recommend?

jinxed




msg:4499562
 6:01 am on Sep 25, 2012 (gmt 0)

I would add the rel=canonical tag to the results pages to just point to the main search page, otherwise it just gets messy.

lucy24




msg:4499567
 6:09 am on Sep 25, 2012 (gmt 0)

Leaving site search open to crawling usually means having googlebot throw all kinds of keywords at your search form

Isn't that the textbook case of what the Exclude Parameters function in gwt is intended for?

Simsi




msg:4499642
 9:51 am on Sep 25, 2012 (gmt 0)

I would add the rel=canonical tag to the results pages to just point to the main search page


^ That's what I do - strikes me as the best way.

jinxed




msg:4499660
 10:40 am on Sep 25, 2012 (gmt 0)

It also means that all the major search engines will follow the same instructions. If you use the GWT option then this will only notify G.

Ralph_Slate




msg:4499802
 3:17 pm on Sep 25, 2012 (gmt 0)

I would remove the search results from Google - they aren't really bringing any value to Google's index, are they? The only reason someone would think about leaving them there is that they might get a random hit from Google - but that's a lousy reason, and will likely hurt you more than help you.

1script




msg:4499828
 4:37 pm on Sep 25, 2012 (gmt 0)

I would add the rel=canonical tag to the results pages to just point to the main search page
This sounds counter-productive or at the very least redundant. There should really be no canonical version of a search page - it should all be no-indexed. Otherwise someone can link to a page with a URL like www.example.com/search.php?q=VERY_BAD_WORD and there you have it - your search page now has an inbound link with VERY_BAD_WORD in the anchor.

Here is another scenario which has already killed one of my sites (few years ago but I'm sure it's still going to be detrimental now): You have a search page on the site that accepts GET requests hence has a different URL for each search. Someone (competitor or just a curious hacker) can link to a "not found" search page using any number of combination of anchors and URLs (just changing q=xyz at the end of the URL) and there you go, your site has just picked up plenty of new pages, all duplicates.

In my case the "helpful" search script also added a part saying "We could not find xyz , would you like to check other searches that we think are relevant?" And linked to a couple of more search pages. That eventually has snowballed to 2M+ nearly identical pages that Google still (5 years later) thinks that my site has. The site has had what seems like EVERY penalty Google has ever devised (-950, -800, -50, you name it) and is still lingering mostly on pages 2-3-5 despite having been on positions 1-3 for years before the incident.

Anyway, a search page has really no information worthy of being indexed by itself, so it needs to be no-indexed and also disallowed in robots.txt to conserve your crawling budget. Just make sure you implement no-index before disallowing it in robots so the bots have a chance to read the html and see the no-index tag.

Planet13




msg:4499885
 7:16 pm on Sep 25, 2012 (gmt 0)

just from a USER'S point of view:

I find search results pages in the index pretty unhelpful.

Unfortunately with google's love of big brands, I seem to get lots of them in the SERPs when I do a search for certain keywords.

What seems even worse to me is that it APPEARS like google might be indexing text content from adsense ads that appear on those search pages, too.

So I have seen results where the snippet in the SERP featured text from an adwords ad, or where I have gone to a page that was listed in the SERPs and found that the matching keyword ONLY appeared in an adwords ad.

Simsi




msg:4500140
 11:07 am on Sep 26, 2012 (gmt 0)

Otherwise someone can link to a page with a URL like www.example.com/search.php?q=VERY_BAD_WORD and there you have it - your search page now has an inbound link with VERY_BAD_WORD in the anchor


But surely someone could link to *any* page of your site with .php?VERY_BAD_WORD in the URL? And as your search will probably return "No results found for VERY_BAD_WORD" Google would/could see that it's not relevant to your site anyway so I would have thought this doesn't matter myself.

Also the Canonical will truncate the page URL at .php anyway so you are doubly telling Google the VERY_BAD_WORD isn't relevant. Now, what Google does with that of course is in the lap of the Gods lol.

nestman




msg:4501358
 9:26 pm on Sep 28, 2012 (gmt 0)

1script,

I went ahead and made the search results page a "noindex, follow" as was suggested. Can you explain why itís important to implement noindex before disallowing the page in robots.txt? How long should I wait before I disallow the page in robots.txt?

Thanks!

Dymero




msg:4501359
 9:38 pm on Sep 28, 2012 (gmt 0)

@nestman:

Bots that pay attention to the robots.txt will not visit any URLs that are blocked, so the pages won't be deindexed.

As for how long to wait, I'd hold tight until all existing results have disappeared. I don't know how big your site is, but I had to noindex search result pages for a site I work on. I'm on week 18 now and there's still a few lingering pages.

Just do a site:example.com/searchResults.php in G and wait until there's no more results.

nestman




msg:4501388
 11:23 pm on Sep 28, 2012 (gmt 0)

Is it bad to have over 3,000 pages that are blocked by robots.txt. Is it more important for pages to be blocked or deindexed?

lucy24




msg:4501439
 2:54 am on Sep 29, 2012 (gmt 0)

How long should I wait before I disallow the page in robots.txt?

Forever. The moment you disallow the page, search engines can no longer see the "noindex" directive, so you're right back where you started.

You might think that indexing a page when you haven't seen a single word of its content is pretty pointless-- but what the search engine can see is the linking text from anyone in the world that links to the page. And so far there's no such thing as an inverse nofollow. ("If a link leads to this page, pretend you never saw it.") At least not in g###.

tedster




msg:4501453
 4:46 am on Sep 29, 2012 (gmt 0)

Is it bad to have over 3,000 pages that are blocked by robots.txt.

Not at all. In fact, since a robots.txt Disallow rule acts like a pattern match (it means "do not crawl an URL that begins like this") even one Disallow rule can block a potentially infinite number of URLs.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved