Welcome to WebmasterWorld Guest from 34.204.191.31

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Using 410s to quickly stop Google indexing navigation filter urls?

     
10:28 pm on Aug 1, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 363
votes: 204


Last year a dev left a setting on meaning that google indexed and followed all of my layered navigation filter urls. I only have some 450 products but as I have quite a few attributes the amount was exponentially huge and it ended up causing me lots of problems that dealing with a year on. Blocking them in robots caused me issue as google wanted to try to recheck them so I quickly stopped that so i ended up allowing them but with noindex nofollow . I have also done hierarchy changes since then so pretty much I get still hundreds of 404 per day that are mostly these filter urls that Im not going to 301 and I have just been leaving them as 404s and marking as fixed.

My question is , would i be better to add a nginx rule to return a 410 for all /filter urls ? (I dont want any indexed) - would this be faster and would it be safe to do so? It logically seems to be the right thing.
I read this thread some years ago that seems like it could be correct [webmasterworld.com...] but it was years ago so wanted to check if someone had current advice for this.
1:50 am on Aug 2, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


would i be better to add a nginx rule to return a 410
Yes, if the files no longer exist, a 410 is the proper server response.

Will it help with Google's error reporting... no.

This has been discussed many times. We don't really know how Google processes 410s or how using a 410 affects indexing or ranking, but we do have plenty of evidence that Google still reports 410s as errors in the Google Search Console.
2:11 am on Aug 2, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 363
votes: 204


ok i will give it a try. I read that some people experience that it removed them from the index faster with a 410 rather than retrying a few times with a 303, which would make sense. I will update what results I get
4:57 am on Aug 2, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10456
votes: 1091


410 is very specific as NO LONGER THERE where 404 is ambiguous in that regard ... not found one day might be found the next day. Use the 410 ... and expect g to keep coming on year after year testing to see if a url it once found is there ... or not.

Note: bing does the same thing, but respects 410 more swiftly.
8:05 am on Aug 2, 2018 (gmt 0)

New User from EE 

joined:Dec 28, 2016
posts:19
votes: 3


How do these URLs look like?
Imho, if the filter URLs include parameters, you can tell Googlebot not to index those via Google Search Console; If the URLs look the same as ordinary product pages with some additional information, then why not set up canonical URLs? Of course if you've already removed them, then 410 is the way to go.
3:05 pm on Aug 2, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 363
votes: 204


the URLs dont (mostly) contain parameters but are of format such as - filter/numberdays/1,3,4,6/type/atrributeA, attributeB, attributeC,attributeD . For example a product might have a filter of day with options of 1,2,3,4,5,6,7 and another filter of type with several options so every possible combination of selecting those was indexed. With that and the category structure it resulted in millions of filters being indexed and might site has just 450-500 products. Obviously there are various reasons why that isnt good and it did kill my seo for a while but it is recovering bit by bit..
Im thinking though 410 them should help with the crawl budget but wanted to make sure that google didnt see 410s as a sign of something they could penalise me for in anyway.As i thought though it does seem like most think 410 is the way to go with this. I dont want any filters indexed, all filters that remain are set to noindex anyway so cant see there being any harm if google thinks they are gone anyway.

I just need to work out how to handle doing this - as there are still filters in place so I cant just 410 all /filter/* to the whole world - im thinking maybe setup a nginx rule to return 410 to google and bing bots for all /filter urls. Anyone got any better suggestions?

GSC parameters seems to have very inconsistent results in my experience. I do use them (not in this case but for other things).
8:30 am on Aug 3, 2018 (gmt 0)

New User from EE 

joined:Dec 28, 2016
posts:19
votes: 3


If the filter URLs will still be available to visitors, I would suggest using canonical. I don't think it would be quite correct in Google's eyes if you showed these pages to visitors, but showed 410 to Googlebot when it tries to visit the same pages.
1:46 pm on Aug 3, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 363
votes: 204


the problem is there are hundreds of 1000s (was millions) in googles index and I have since simplified the category structure so alot of these are in fact gone, but of course there are some that are valid. You might be right though , google might see it as funny. Thinking about it a bit more im thinking that I should just go through and create a more complex rule that 410s the ones that are definately gone (the ones appearing in GSC errors) and just have the live ones as noindex no follow . Should have the effect I want and without the risk of confusing google.
5:55 pm on Aug 4, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:May 1, 2018
posts: 102
votes: 17


410 not there? Only for the search engine bots? How about using a canonical url on these pages?
6:26 pm on Aug 4, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 363
votes: 204


How about using a canonical url on these pages?


they actually are not there - the ones that are do have canonical set, and are also set to noindex nofolllow but it is the 100s of thousands that got indexed in a misconfig that fill up GSC with 404 and use up my crawl budget. Im looking to find the fastest way to get them out of google index history so it doesnt return to try indexing them multiple times .
2:50 am on Aug 5, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10456
votes: 1091


Don't hold breath! G returns time and again (for eons ... (sic, that is correct, though tongue-in-cheek). It never forgets a url it has met. Best you can do it ride it out. 410 will eventually reduce the noise, but it will take time.
3:37 am on Aug 5, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


410 not there? Only for the search engine bots? How about using a canonical url on these pages?
A canonical tag is not a panacea for SEs to apply juice to new pages.

A canonical should only be used to when there is sufficient shared content from a duplicate page. Most of the main text content of a duplicate page must also appear in the canonical page. [webmasters.googleblog.com...]
2:25 pm on Aug 5, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 363
votes: 204


found an interesting, albeit a bit old, video on the subject from google...

Does Google treat 404 and 410 status codes differently?
Matt Cutts - Google Webmasters - Apr 14, 2014
trt 2:55

https://youtu.be/xp5Nf8ANfOw [youtu.be]


[edited by: Robert_Charlton at 10:55 pm (utc) on Aug 5, 2018]
[edit reason] Added title, running time, and source info to video link [/edit]

7:24 am on Aug 6, 2018 (gmt 0)

New User from EE 

joined:Dec 28, 2016
posts:19
votes: 3


they actually are not there - the ones that are do have canonical set, and are also set to noindex nofolllow...

I think John Mueller just recently said that this is incorrect - having two of these at the same time might mess up Google's logic, read this thread - [reddit.com...]
4:42 pm on Aug 6, 2018 (gmt 0)

Preferred Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 363
votes: 204


@cr1m - now that IS interesting info , thanks. I need to reevaluate things be cause I do have noindex and rel canonical on my filter pages and it sounds like that isnt a good idea. So it sounds like removing the noindex is the way to go.
Problem i have with this though is that I dont want google to index these urls partly because i dont want them in the index but also partly because exponentially there are so many it seems that google is perpetually crawling them , using up my server resources, google crawl budget etc - it seems pointless for both them and me. I dont want them in the index, they are just going to see that the canonical is the parent category anyway and not index them. But 410 ing them seems like a scary thing to do as google might see it as some form of hiding them as by the very nature of filter urls that means they will be accessible from the parent page which is crawlable and therefore it could damage that.

There is so much guess work involved in trying to do what is right for google. Whilst there is "advice" and guidelines they are often so cryptic, sometimes seem contradictory.