Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to de-index search pages?

         

Samsam1978

9:23 pm on Oct 15, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hi guys ohhh I am all confused! all my search pages have been accidently indexed I don't want them indexed obviously. I am getting all confused with my tags. Is putting this in the header enough?

<meta name="robots" content="noindex,follow"/>
<link rel="canonical" href="https://mysite.com/search/keyword/" />

Or do I also need to put in these?

<link rel="next" href="https://www.mysite.com/?q=keyword/keyword &page=1" />
<link rel="prev" href="https://www.mysite.com/?q=keyword/keyword &page=1" />

Thank you for your help

lucy24

10:02 pm on Oct 15, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



At the risk of sounding like That Other Forum, isn't this still the same question as this one [webmasterworld.com]?

Samsam1978

10:12 pm on Oct 15, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



No because I read that I need to use this in the search page rather than robots.txt

Samsam1978

10:13 pm on Oct 15, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks though Lucy for your response on the other post, so you think i should just disallow in robots.txt? or use the meta tag?

phranque

12:03 am on Oct 16, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



if you are noindexing the page i don't see a need for those link elements.
the link rel next/prev would be useful if you were indexing a collection of documents.
the link rel canonical would be useful if you wanted "this" document indexed under another url.
if anything, the link rel canonical may send an unwanted signal.

Samsam1978

12:19 pm on Oct 16, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



Yes that is what I thought thanks so much :-)

MayankParmar

6:13 pm on Oct 22, 2017 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Add this to your robots.txt:

Sitemap: https://www.example.com/sitemap_index.xml
User-agent: *
Disallow: /go/
Disallow: /?s=*
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Allow: /wp-admin/images/

phranque

8:27 pm on Oct 22, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



MayankParmar please explain how your post is relevant to Samsam1978's problem.

MayankParmar

4:02 pm on Oct 23, 2017 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



phranque Sam said that his website's search pages are getting indexed. If Disallow: /?s=* is added to the robots.txt, Google won't able to index the search pages, as it is disallowed :D And once the access is blocked, Google will deindex them (hopefully).

timemachined

11:20 am on Oct 25, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Yes but you don't disallow if you want pages to drop out of G automatically, you noindex. Only disallow if they're not already in.

If disallowing in robots, then you need to manually / bulk remove the known pages via webmaster tools yourself as if you disallow, G won't know. An answer previously gleamed from here.

phranque

12:26 pm on Oct 25, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



MayankParmar:

Disallow: /?s=* won't exclude robots from crawling the search urls used by Samsam1978 and as timemachined stated exclusion from crawling will not deindex the url.

instead the url will show this description in search results:
A description for this result is not available because of this site's robots.txt


the other Disallow/Allow directives are irrelevant to the problem statement.