Welcome to WebmasterWorld Guest from 34.231.247.139

Forum Moderators: Robert Charlton & goodroi

Deleted pages 301 to the homepage... and potential problems?

     
5:45 pm on Apr 29, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Jan 8, 2019
posts: 89
votes: 2


The site I took over has a WP plugin called "All 404 Redirect to the Homepage."

My understanding is that this:

1) Confuses user who thinks they landed on the right page
2) Sacrifices opportunity to tell users they landed on wrong page, and offer a solution

But a more specific problem I have is this...

I 410d nearly 1000 crap pages and need them deindexed. They're causing keyword cannibalization and index bloat.

If this plugin is auto-redirecting 401ed pages to the homepage (which it is) does that mean the 410 status is not being communicated to Google? I heard 410 tells Google to de-index faster.

I'm checking the indexation of these 410 pages. Some are gone now others are still there.

Thanks!
10:44 pm on Apr 29, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 1124
votes: 134


I heard 410 tells Google to de-index faster.


410 is the correct response for a page that has gone and is never coming back, but won't necessarily prevent googlebot checking whether it is there at some time in the future, especially if there are backlinks to it that the linking site has not removed.

Responding with a 301 for a page request that should return 4xx is not good practice, and returning an error page with a 200 (a "soft" 404) is discouraged by Google. (see [support.google.com ]).

I don't see any particular problem using the WP redirection plug-in for redirection (i.e for pages that remain substatially the same, but whose URL has changed). I wouldn't recommend using it for anything else, as in your example.

Best practice is to have an error page that both gives an appropriate message to the user and returns the correct status code for each specific common 4xx error e (e.g. 404 for Not Found, or 403 for Forbidden). I'm pretty sure there is a WP plug-in that will do this, but it is only marginally more difficult to do on an Apache server using .htaccess/PHP.

Whatever else you do, and however you decide to go about it, I recommend that you stop redirecting missing pages to your Home page or to an error-page that does not return 404.


[edited by: Robert_Charlton at 2:34 am (utc) on Apr 30, 2019]
[edit reason] Fixed typo per poster request. [/edit]

11:02 pm on Apr 29, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 1124
votes: 134


P.S. Sorry, in response to your question about how the plug-in might affect response codes, I should also have mentioned that there are many free response-code checkers online (try searching for http status codes checker).

I would have recommended this one, but (mods please note) it doesn't seem to be working:

[freetools.webmasterworld.com ]
11:30 pm on Apr 29, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Jan 8, 2019
posts: 89
votes: 2


@Wilburforce

This helps a lot, thank you!

We did 410 a lot of pages. And I double checked that they had no incoming or internal links first. I 301ed any backlinked pages I wanted to get rid off.

If you were trying to get rid of a lot of pages that get no traffic and have no links, would you say 401 is an ok route? I chose it because I wanted to speed up Google removing from the index (which I read about in a Yoast post).

Thanks.
11:33 pm on Apr 29, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15814
votes: 848


If this plugin is auto-redirecting 401ed pages to the homepage (which it is) does that mean the 410 status is not being communicated to Google?
Exactly. Only one status code can be returned; if the site is sending out a 301--or, worse, a 302--then no 404 or 410 is being received. Ordinarily I would recommend checking your raw server logs, but in the case described here, the response code sent by your server is not necessarily the response code received by the visitor. More exactly: a 200 in your server logs could mean anything. A 300- or 400-class response in server logs should be correct.

Note, however, that if you’re returning the 410 response yourself, in the appropriate section of your htaccess--exact placement of the directive is crucial--then the request will never reach WP. In that case, things are happening the way you want.

While searching for just the right tool, you can spot-check by randomly requesting some URLs in your own browser and see where you end up. In particular, see whether the address bar shows the originally requested URL or something else.

I heard 410 tells Google to de-index faster.
What I can say with confidence is that it causes Google to stop crawling faster--and if you've got huge numbers of pages you want them to stop requesting, this by itself is an advantage.

Edit: I hope when you say 401 it's just a typo for 410 (Gone). A 401 response is something entirely different.
1:24 am on Apr 30, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Jan 8, 2019
posts: 89
votes: 2


@lucy24

Definitely a typo.

Good advice!

I'm going to deactivate the plugin and make sure a 404 error page is set up, too.
4:05 am on Apr 30, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11824
votes: 237


I'm checking the indexation of these 410 pages. Some are gone now others are still there.

i wouldn't expect these to disappear from the index until some time after they were crawled by googlebot (and received a 410 status code).
you should examine your server access logs to determine which urls google has crawled and received a 410 and then determine which (or how many) of those are still indexed before being concerned.
6:14 am on Apr 30, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10326
votes: 1058


It might take time for g to implement any changes, keep that in mind.

Check your raw logs for progress.

If a url is genuinely gone, 410 is the only applicable response, and keep that i force for ... eternity?
4:54 pm on Apr 30, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Jan 8, 2019
posts: 89
votes: 2


@tangor @phranque

Thanks! My question is, can Google even recognize them as 410 Gone if they are 301ing to the homepage? I'd imaging Google is just seeing a 301 not a 410.
5:09 pm on Apr 30, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15814
votes: 848


Any given URL is either 301 or 410. It can't be both. That's why it's important to check what response a request for these deleted pages actually gets. Did you at some point check GSC for “soft 404”s? When huge numbers of requests get redirected, they know what's up.

If you need guidance on the exact structuring of the htaccess rules that return the intended response, wander over to the Apache subforum. (Or IIS if you're using one of those weird alternative WP versions.)
5:09 pm on Apr 30, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10326
votes: 1058


If you are 301 to home page, you aren't using best practice and in fact are creating soft 404s ... which g definitely does not like.
5:21 pm on Apr 30, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Jan 8, 2019
posts: 89
votes: 2


@tangor @lucy24

Thanks! Definitely needed that insight about htaccess as my dev skills are limited.
6:55 pm on Apr 30, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:10326
votes: 1058


Just checking ... this plugin turns any 404 into a 301 redirect? Might want to rethink that!

404 pages exist for a reason. 410 for a different reason. In most cases redirection should be for moved content, or expanded/updated content related to the original url.
7:35 pm on Apr 30, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 1124
votes: 134


@TomSnow

If you want to know what response codes your deleted pages are returning I still think your best bet is to use one of the free online response-checkers, as it will tell you what code is being returned for any specific URL request: if you enter (e.g.)

https://www.mysite.com/deleted-page.htm


you will get to see the response Google gets when requesting that page.

As a couple of other posters have already pointed out, it will be a single code for each individual URL, although where redirection is involved some of the checkers will also provide the response of the destination URL, and some provide even more detail.

If you have access to your raw server logs, you can also check these for responses that have already been returned.
5:05 pm on May 1, 2019 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Jan 8, 2019
posts: 89
votes: 2


@tangor

The redirect is 404 to 301. But I asked Dev to 410 these pages to stop Google from crawling them. I'm not sure if a) Dev 404ed instead of 410, which is why the plugin is creating redirects or if b) Dev 410ed the pages, and plugin turns 404s AND 410s into 301s.

@wilbur

Thanks! Ran that test and these deleted pages are sending a 301/200 response, which is a soft 404 yes?
6:07 pm on May 1, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15814
votes: 848


The redirect is 404 to 301.
TomSnow, it sounds as if you yourself aren't the coder, so let's make sure you understand what's going on.

WordPress (and other major CMS) work like this: If there is a request for any file--not just pages but everything, though we're mainly concerned with pages--that does not physically exist on the server, then WP internally rewrites to its generic /index.php page. This page then does all the CMS stuff: pull in headers, footers, navigation elements, styles from here and there, and most importantly consult the database for content corresponding to the requested URL, and finally sends out a page. The user doesn't know, and in fact can't know, that they're receiving anything other than a hand-rolled “example.com/pagename”.

The catch is the “consult the database for content corresponding to the requested URL” part. Depending on how the CMS is set up--combination of plugins and site settings--if there is no applicable material in the database, the site might do any of three things:
-- return a 404 response and display a 404 page, which might be either the server default or a customized page you've made yourself
-- serve up an empty page with the global headers and footers and navigation, but no page-specific content
-- issue a redirect to some other named page, most often the site's front page

If you want to do anything else, such as returning a 410, you need to manually edit your htaccess or configuration file with instructions about specific requests, for example (assuming htaccess on an Apache server)
RewriteRule ^onegonepage - [G]

RewriteRule ^othergonepage https://www.example.com/specialnewpage [R=301,L]
If this rule is placed before the CMS rules, the request will never reach the CMS, and the three-way options above will never come into play. Instead, the server itself will send out the response you've told it to send out.
6:43 pm on May 1, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 1124
votes: 134


which is a soft 404 yes?


Strictly, no: a soft 404 tells the user the page isn't there, but returns 200.

Your arrangement doesn't tell users anything: they just find themselves - without any explanation - on your Home page when they meant to be somewhere else, while the URL returns 301 (telling Google that the page has permanently moved).

Redirection should return 301, and both Google and the user should find themselves on substantially the same page, but at a different address. The destination page/URL should return 200.

What is wrong with your current arrangement is that your Home page isn't the new address of an old page or a mistyped URL.

Best practice for missing/mistyped URLS is to tell the user the page isn't there (using a custom error page), and return 404 or 410. You can do this in a number of ways, including .htaccess and PHP (and in all probablility, but I don't have any personal experience of it, WP).
11:00 pm on May 1, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11824
votes: 237


Strictly, no: a soft 404 tells the user the page isn't there, but returns 200.

not necessarily:
A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn’t exist.

source: [webmasters.googleblog.com...]
11:07 pm on May 1, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11824
votes: 237



The redirect is 404 to 301.

no.
it isn't.

according to the http protocol a redirection status code is 3XX (i.e. 301, 302, etc)
The 3xx (Redirection) class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request.

source: https://tools.ietf.org/html/rfc7231#section-6.4

even if the Location header were provided in a 404 response it would be ignored as no further action is required for a Client Error class of status code.
11:08 pm on May 1, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11824
votes: 237


then WP internally rewrites to its generic /index.php page

to be precise, the web server is doing the internal rewrite to a WP script.
6:08 am on May 2, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 1124
votes: 134


A soft 404 is when a web server returns a response code other than 404 (or 410) for a URL that doesn’t exist.


Strictly, no. That would make every 301 a "soft 404".

The referred post continues: "A common example is when a site owner wants to return a pretty 404 page with helpful information for his users, and thinks that in order to serve content to users he has to return a 200 response code". No other example is given.

My own source ([support.google.com ]):

"A soft 404 is a URL that returns a page telling the user that the page does not exist and also a 200-level (success) code."
7:29 am on May 2, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11824
votes: 237


My own source


okay i'll see your source and raise you an unambiguous direct quote from john mueller:
So the the 301 redirect from all pages to the home page, that would be something that we see as a soft 404s.


source: 4:52 into this Webmaster Hangout [youtube.com]
9:14 am on May 2, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 7, 2006
posts: 1124
votes: 134


an unambiguous direct quote from john mueller


OK, I'll concede that one.

I've never done that myself, so haven't seen what GSC makes of it, but presumably that therefore means a 301 from a missing URL to the Home page would be flagged as a soft 404 in GSC.

Evidence, anyone?
11:47 am on May 2, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11824
votes: 237


Evidence, anyone?

i've seen it, but admittedly not recently.
however that quote was from ~30 months ago so pretty good chance it's still current information.
1:36 pm on May 2, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4466
votes: 332


Evidence: Google's Soft 404 info page [support.google.com]

Why does it matter?

Returning a success code, rather than 404/410 (not found) or 301 (moved), is a bad practice. A success code tells search engines that there’s a real page at that URL. As a result, the page may be listed in search results, and search engines will continue trying to crawl that non-existent URL instead of spending time crawling your real pages.

They consider a soft 404 to be much worse than a 404 error because it misleads and delivers a page that was not requested.
5:42 pm on May 2, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15814
votes: 848


fwiw: Look at your logs any time you've instituted an unusual number of redirects--of any kind. You'll see an upsurge in google requests for /random-string-of-letters.html. This is Google's way of confirming that your site still issues 404s when appropriate. The whole thing is clearly programmatically triggered, based purely on an increase in number of redirects.

the 301 redirect from all pages to the home page, that would be something that we see as a soft 404s
Note, however, that this doesn't mean a “soft 404” is defined as a 301 when a 404 is warranted. It’s simply the best-known category of that nebulous Google creation, the “soft 404”. Look through GSC and you'll find other things they call “soft 404”, some of which one could take serious issue with.

:: shuffling papers ::

Here from the horse's mouth [support.google.com]:
A soft 404 is a URL that returns a page telling the user that the page does not exist and also a 200-level (success) code. In some cases, it might be a page with little or no content--for example, a sparsely populated or empty page.
Note that this definition--with no visible datestamp--explicitly excludes 301 responses. Go figure. (But don't waste time trying to understand what “telling the user” means in this context. You will not succeed.)

Edit: Sorry, not2easy, we quoted different parts of the same page so I didn't realize you had already provided the same link. Oh well.


Recommendation: Snip off part of this thread so we can get back to Jon, whoops, Tom Snow's problem.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members