Forum Moderators: phranque

Message Too Old, No Replies

410 Gone a directory and all it's content

how to set it up

         

Lorel

6:44 pm on Jun 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I need to 410 a directory and all it's content. I'm using the following but GWT is still indexing the pages within the directory

RewriteRule ^foldername(/.*)?$ - [G,NC]

I searched Google and Bing but can't find the answer.

wilderness

10:39 pm on Jun 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^name/ - [G]

Regarding NC, not sure you wish to 410 a directory (other than your specific choice) with a different name (regardless of how it's spelled).

lucy24

11:28 pm on Jun 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As wilderness points out, part of your pattern is superfluous-- but the extra bit isn't harmful or counterproductive, just unnecessary. And I agree: if they're misspelling (that includes mis-casing) the name, they really don't deserve anything but the 404 they'd get anyway.

but GWT is still indexing the pages within the directory

How long has it been? A single 410 response won't instantly remove an URL from any search engine's index. If you want it de-indexed right away, use the URL Removal feaature of wmt/gsc. They will re-request URLs in the directory periodically, so make sure you never remove the [G]. My personal experience--ymmv--is that a 410 makes the Googlebot go away faster than a 404. It doesn't seem to have any effect on Bing. (Here I think Google is correct, since a 410 won't happen unless you did it on purpose.)

wilderness

11:57 pm on Jun 11, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



GWT is still indexing the pages within the directory


I've some 410's the Major SE's been requesting for more than a decade.
You'd think they'd stop after getting the message a ka-zillion times.

lucy24

5:42 am on Jun 12, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You'd think they'd stop after getting the message a ka-zillion times.

There's definitely a difference in behavior between bing and Google. By now, an overwhelming majority of the requests for URLs I moved at the end of 2013 are coming from bing*; Google picks up its 410 or 301 and eventually gets the message. I don't think it's necessarily that bing is slower on the uptake, though. It feels more as if they're tracking long-term behavior: "They say it's gone, but is it really?"


* About 3:1 bing:google, which obviously doesn't correspond to their respective overall crawl rates. Right at this instant it's mostly the dotbot requesting URLs they haven't seen in several years. But that's transitory.

tangor

8:24 am on Jun 12, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



True comments and metrics much like I've seen over the last year. I don't know of a single bot/crawler yet that forgets a url, though most, after a time (unless they are evil bots!) get the message and let of for 9months to a year before starting over.

Andy Langton

8:43 am on Jun 12, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just to check that "Fetch as Google" comes back with a 410 response for the URLs in question?

Lorel

3:48 pm on Jun 13, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@lucy24

As wilderness points out, part of your pattern is superfluous-- but the extra bit isn't harmful or counterproductive, just unnecessary. . . A single 410 response won't instantly remove an URL from any search engine's index.


Thanks. I thought that in order to 410 the contents of a directory that a simple 410 wouldn't be enough. I also didn't realize that a 410 wasn't instant and permanent (like a 301).

lucy24

10:02 pm on Jun 13, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nothing is permanent. Search engines will keep re-requesting URLs periodically, no matter how often they've received a 300-class or 400-class response. "Sure it's been redirecting since 2009, but that's not to say it will redirect today." (This applies to humans too. A browser is supposed to remember a 301 response for a while, and make its requests accordingly, but it will not change or delete existing bookmarks.) Removing the directory in GWT/GSC-- and equivalent for any other search engines that are important to you-- won't make them stop crawling forever, but it will remove the relevant files from the search engine's index right away. And that, in turn, will cut back on humans landing splat on a 410 after following a promising search-engine lead.

Make sure you've got a good 410 page. Depending on the individual situation, you may or may not choose to use the same physical page as for 404s. But it has to be a separate ErrorDocument directive.