Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Mysterious 404 URL is indexed while 200 version is not.

         

TomSnow

6:22 am on Aug 28, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



Something weird is going.

I redesigned a website and went live.

We changed a lot of URLs using 301s, which resolved just fine.

But I just noticed that many of my blogs aren't indexed. Instead, a 404 page with a URL that looks almost exactly like the blog's URL is indexed.

The strange 404 URL that nearly mimics my blog URLs has the date in the URL instead of /blog/.

Example...

This is the 200 URL I want indexed: apples.com/blog/red-apples

But this 404d URL is indexed instead: apples.com/4/5/2020/red-apples

My problems are:

1) I don't know the origin of the 404 URL.

2) I don't know why the 404 URL is indexed while the 200 URL is not.

Any ideas?

not2easy

1:18 pm on Aug 28, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Is this a WordPress blog? The origin of the URL would be the /archives/ pages that WP creates. If it is another CMS such as blogger it may have similar structure. That would need to be handled in the Admin > Settings area and may have been caused by other settings.

lucy24

3:17 pm on Aug 28, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you please clarify what, exactly, you mean by 404 here? Clearly the URL is not intended to exist. But equally clearly it does exist (not2easy explains one way this can happen), and has served content to the Googlebot, or there would be nothing to index. Or do you mean that the SERP shows the content of the site’s actual 404 page, attributed to the unwanted URL?

When you visit these URLs yourself, do you get a 404 response? It isn’t enough to see your expected 404 page; does the address bar show the originally requested URL, while the response header (as shown by the tool of your choice in any major browser) is 404?

TomSnow

3:37 pm on Aug 31, 2020 (gmt 0)

5+ Year Member Top Contributors Of The Month



@lucy24

The response header is 404. The URL is a Wordpress archive URL that takes the slug of my 200 URL and adds a date to the file path. The 404 URL is in the SERPs while the 200 "version" is not.

I'm going to turn off the archive pages in admin>settings then 410 any remaining archive URLs that cannibalizing my 200 responses web pages.

not2easy

4:22 pm on Aug 31, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The archive page URL should use a canonical tag to the original permalink URL. Do you use a plugin to manage 'seo' functions? It should provide a setting for canonical handling, particularly because WP creates new URLs for the same content. If you use Tags or Categories, those are additional URLs.

jpalmer

2:14 am on Sep 4, 2020 (gmt 0)

10+ Year Member



Hi Tom,

Looks like you found a solution (your reply to @lucy24), But as an additional tip, if you haven't already done so, go and download Screaming Frog SEO Spider and run the web site url. It's limited to 500 urls (including file calls) for the free version, so if you have a large site, you may have to chunk the search results by Top Level (TL) page and sub directories to stay under the free limit, and then export individual search results to a master spreadsheet.

SF will give you a good overview of site pages status, including META tags, response codes, canonical status, follow/nofollow directives and more.

When I'm tracking down issues, I find SF provides a very good base line starting point to uncover glitches.