Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Duplicate Content problem because of html?PageSpeed=noscript

         

trocobob

2:10 pm on Sep 23, 2015 (gmt 0)

10+ Year Member



Hi

Im facing the problem of duplicate Meta tag because of html?PageSpeed=noscript

I have for exp :

example.com/woman/105124.html?PageSpeed=noscript
example.com/woman/105124.html

I want to block it by robots.txt , but i dont know the correct formula .

for the print pages i have this code : Disallow: /*.html?print$

But for the pagespeed i dont know which is the exact one

any help please it will be appreciated

thanks

[edited by: aakk9999 at 3:37 pm (utc) on Sep 23, 2015]
[edit reason] Corrected spelling of exemple.com to be example.com [/edit]

lucy24

6:25 pm on Sep 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As an alternative, you could go into the Parameters area of wmt ("Search Console") and tell them not to crawl URLs that contain the "PageSpeed" parameter.

Wilburforce

7:44 pm on Sep 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Interestingly, I have just started getting duplicate meta descriptions/duplicate title tags for e.g.

/mypage.htm
/mypage.htm?newwindow=true

/anotherpage.htm
/anotherpage.htm/?iframe=true&width=100%&height=100%

There are no internal links that use parameters at all (and no scripts on the site that generate them), so I assume they are in some backlinks from somewhere (although none are listed by Google). Presumably the iframes come from scrapers. Because they are valid parameter strings the server returns a 200 (and the page), so Google flages this as dupliacate content.

The URL parameters tool is fairly useless for this: the only way I can see to block these is to add rel="canonical" to every single page.

trocobob

8:13 pm on Sep 23, 2015 (gmt 0)

10+ Year Member



Thanks Lucy24 . i searched for it in WMT but i didnt found it

For the canonical solution it is not possible for me as it is generated with "html?PageSpeed=noscript"

so anyway way to block it from robots.txt will be appreciated

regards

not2easy

8:43 pm on Sep 23, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Disallow: /?PageSpeed=noscript
should work. I would check a few to be sure by using the tester here: [support.google.com...] or in your GSC (old GWT). The tool to add the URL Parameter as lucy24 suggested is in your GSC and it does answer the need to prevent crawling of those URLs in case the robots.txt doesn't work the way you want. Query parameters can be tricky in robots.txt - sometimes accidentally blocking unintended URLs, but easy to deal with using URL Parameters.

aakk9999

8:50 pm on Sep 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



^^^ Probably just a typo but:
Disallow: /?PageSpeed=noscript
will only disallow the home page with this parameter.

To disallow all pages with this parameter, you need to either remove the first slash or add an asterisk after it, i.e. for Googlr either of this would do:

Disallow: /*?PageSpeed=noscript
or
Disallow: ?PageSpeed=noscript

Wilburforce

9:30 am on Sep 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@trocobob

The URL Parameters tool is in the Crawl section of GSC.

That is probably the best way to go for what you need to do, but as this is essentially a canonical issue, you might also note Google's advice ([support.google.com ], next to the lightbulb about half-way down the page): "Don't use the robots.txt file for canonicalization purposes".

trocobob

10:39 pm on Sep 24, 2015 (gmt 0)

10+ Year Member



Thank you aakk9999 and Wilburnforce .

I have tried with blocking the urls wia robot.txt as I dont need it .
I will check the results in the next days

thank you