Forum Moderators: goodroi

Message Too Old, No Replies

Remove unwanted and already indexed pages from google search result

         

anthonyinit

12:45 am on Dec 17, 2020 (gmt 0)

10+ Year Member



Hello,
Google have already indexed my website pages and it appears in search result however, i want to remove some pages from google and tell google not to reindex them.

Google have already indexed pages like
https://www.example.com/index.php?ccode=BG
https://www.example.com/index.php?ccode=RU
https://www.example.com/index.php?ccode=DE
https://www.example.com/index.php?ccode=ES

how to remove these url from google and not to reindex them?

not2easy

1:23 am on Dec 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Do the pages physically exist or are they dynamically generated?

If they are all in one directory you can remove them using X-Robots headers whether they are dynamically generated or physical pages.

anthonyinit

2:19 am on Dec 17, 2020 (gmt 0)

10+ Year Member



I believe these are dynamically generated pages i don't see any pages in my directories. when i select languages eg. RU or DE my URL change accordingly

if i chose RU language my URL change to https://www.example.com/index.php?ccode=RU

not2easy

3:53 am on Dec 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you wish to remove only the list of URLs above that appear to be dynamically generated versions of one single page the easiest way is to prevent them from being created. Since those appear to be versions of the home page of a site, you would not want to use noindex either as a metatag or X-Robots. That would noindex the entire site, I don't think that is what your goal is.

Didn't you recently ask how to prevent crawling [webmasterworld.com]of a different set of pages using robots.txt?

If these pages are on the same site you were asking about before, you might want to consider a few options: Rewrite index files to use only one version, or find the source of those unwanted language files and disable them.

This part:
?ccode=BG
is a query string, I'd be looking for a rewrite rule that disregards query strings on pages named "/index.php" as part of the domain's canonical rewrite.

Do you have a canonical RewriteRule so that requests for http://example.com/ or https://example.com/ go to https://www.example.com/ in your .htaccess file? If so, the index page rewrite rule belongs before that rule. If not, you should look through the Apache [webmasterworld.com] forum for what you need or ask for assistance there.

It is common to rewrite /index.php or /index.html pages to their directory URL so that this requested URL:
https://www.example.com/index.php?ccode=BG
would go to
https://www.example.com/
If you have no other way to prevent those unwanted language queries you might consider that.

when i select languages eg. RU or DE my URL change accordingly
If you NEED to offer those language selections that generate the other versions of your pages, then I have no guess how to remove them from the index because if people can visit the pages, they can be indexed. Crawling can be controlled, but you can't prevent indexing if you cannot generate a noindex metatag as far as I know.

phranque

9:00 am on Dec 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



how to remove these url from google and not to reindex them?

you should include a meta robots noindex element in the head of these (html) documents:
<meta name="robots" content="noindex">

or include a X-Robots-Tag: noindex header among the HTTP Response headers sent with these documents.

see Google Search Central documentation:
Block search indexing with 'noindex' [developers.google.com]

not2easy

11:48 am on Dec 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The problem is that they are dynamically generated and do not physically exist as pages and according to the example URLs shown in the OP here, they are auto-generated language versions of the site's homepage.

phranque

12:06 pm on Dec 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It doesn't matter how it is generated as long as the response for each of these urls includes the proper meta element or a comes with the proper response header.

not2easy

12:12 pm on Dec 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



But noindexing "index.php" is not a helpful thing imho.

phranque

8:20 pm on Dec 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you don't noindex the script - you noindex the URLs which contain those query strings.

phranque

11:01 pm on Dec 17, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



btw if this was on an apache server you could solve this entirely in .htaccess using mod_setenvif and mod_headers directives.

anthonyinit

2:09 am on Dec 20, 2020 (gmt 0)

10+ Year Member



Thank you both for your input...

i guess this is really a complicated issue for me however i'm going to take some time into this issue and consider all your options and see what works best but before that i want to know one thing.

Is it ok if google index my website with
https://www.example.com/index.php?ccode=BG
https://www.example.com/index.php?ccode=RU
https://www.example.com/index.php?ccode=DE
https://www.example.com/index.php?ccode=SA

including
https://www.example.com/index.php


will this hurt my ranking?

I'm really confused why google only show my different language pages in search result. image below
https://ibb.co/MVFJKwQ


all those results are from pages
https://www.example.com/index.php?ccode=PH
https://www.example.com/index.php?ccode=RU
https://www.example.com/index.php?ccode=TU
https://www.example.com/index.php?ccode=JP

I do have other important pages like contact.php , FAQ.php , Terms.php but non of those pages show up on google result. so i'm thinking if i get rid of those languages pages google might index and show up my important pages.

my guess only i'm not an expert here :(