Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Urgent Help Needed: Google Indexing Blocked Parametric Pages, Impactin

Appreciate Immediate Help

         

Khushbu2410

6:13 am on Aug 3, 2023 (gmt 0)



Hello Webmaster Community,

I'm currently encountering a significant problem with Google indexing a large number of parameter-based URLs on my website, despite these pages being blocked in my robots.txt file.

Here's a brief context: my website only has 10,000 to 12,000 pages, but it seems like Google's crawler is treating many parametric URLs as unique pages and indexing them. Today the indexed pages count is 26K while actual number of pages is no more than 11K.

This situation is negatively impacting my website's SEO rankings, and I'm struggling to understand why these blocked URLs are being indexed. The number of these indexed pages continues to rise every 3 days.

Could anyone help me troubleshoot this issue and suggest any potential solutions? Why would Google index pages that are explicitly disallowed in the robots.txt file? Should I modify my current setup in any specific way, just for further information Canonical Tags have been implemented?

I appreciate any advice or suggestions, and I'm happy to provide more information if needed. This issue is currently impacting my site's performance, so urgent help would be greatly appreciated. Thanks

lucy24

3:47 pm on Aug 3, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why would Google index pages that are explicitly disallowed in the robots.txt file?
Because a robots.txt Disallow is not the same as a noindex. In fact, they are mutually exclusive: If a search engine isn't allowed to crawl the page, they won't see any noindex directive associated with it.

999 times out of 1000, this makes no difference, because unless you have the world's most captivating and distinctive linking text, nobody will ever come across your pages in an ordinary human search. Sure, they will show up in GSC as “indexed although blocked by robots.txt”, but this has no real-life effect.

If it is a canonicalization issue, there are other ways to address the problem. One way, of course, is to let the URLs be crawled and use a "noindex" and/or "canonical" directive.

Khushbu2410

4:10 pm on Aug 3, 2023 (gmt 0)



Thanks Lucy for the answer I get your point here but the concern is basically there is a canonical added for the URL domainname.com/pageurl but somehow the crawler is finding 1000s parametric pages or rather say generating from somewhere as these pages do not exist ideally and there is no way to traverse those parametric URLs from the website. It's generating urls like domainname.com/pageurl? param =1, domainname.com/pageurl? param=1 & param=2, domainname.com/pageurl? param=1 & param=2 & param=3 and so on . The permutations and combinations of these URLs just seem to be endless. How do I add noindex directive in this case because I really have no clue how many such URLs are there as everyday they seem to be adding up and neither can I add the noindex directive to the pageurl without params because I want it to be indexed. It would be great if someone here could help and suggest the negative's am likely to sense because of this and also the best possible solution to this approach.

not2easy

4:23 pm on Aug 3, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hi Khushbu2410 and welcome to WebmasterWorld [webmasterworld.com]

When there are multiple variations or parameters being generated it would be best to be sure that your sitemap is listing only your preferred version; and as you have done, adding a canonical so other variations show their relationship.

As lucy24 said, do not block crawling if you wish to have the canonical page indexed.

Beyond that, can you determine what is generating the parameters? What can you tell from your access logs regarding those requests with parameters? Is the parameters problem a new development or have these been building over time?

If the parameters serve no purpose, you may be able to rewrite those requests. Determining the cause would help suggest a fix.

Khushbu2410

5:11 pm on Aug 3, 2023 (gmt 0)



Thanks for your response but we have debugged all possible options that could be generating the parameters but nothing as such in the code. Access logs for those requests suggest something coming from way long back ads that we ran on social media like Insta and FB. This is a recent problem that we have started seeing since July second week. Before this we were stuck with the problem that a million of of such non existent parametric pages were crawled but not indexed on GSC. So we blocked them in Robots.txt. After we blocked them in Robots.txt these parametric pages started showing up in Indexed pages. The root cause of these parametric pages is simply unknown as these are urls just getting appended with another URL from the site and adding a parameter to the suffix. Looks super messy and scary.

not2easy

6:16 pm on Aug 3, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



if the pages do not exist, you might find a better answer handling the requests with a rewrite. Assuming your site is hosted on an Apache server, the suffixes may be removed to send the request to the original URL without parameters.

Visit our Apache forum: [webmasterworld.com...] to get help with that.

Nutterum

10:20 am on Aug 9, 2023 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'd say (and this may invoke some back-end work) create a cannonicalisation script where for a given tag being used the new URL points to the appropriate non-parameter URL. This will avoid canonicalisation while still keeping the current structure and rules of your website.

There are some corner cases where depending on the products Google may decide to show a parameter page but that is rare and should not be considered cannibalization. But if you see clicks or orders and feel drop in traffic due to thin content of these indexed pages, the above solution should fix everything right up.

lucy24

4:03 pm on Aug 9, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



should not be considered cannibalization
Hmmm.