does the canonical url point to a document that looks like a customized "not found" page?
|Must the content on a set of pages be similar to the content on the canonical version? |
Yes. The rel="canonical" attribute should be used only to specify the preferred version of many pages with identical content (although minor differences, such as sort order, are okay).
An error message with "200 OK" status is a "soft 404".
You should avoid that. You should send 404 status.
Google will still follow the links from the 404 page.
welcome to WebmasterWorld, SabrinaScherer!
Thanks for the answers and the warm welcome!
Yes, but you want to keep the link juice. Why should Google show 404s in search results, right? At least in the long run [youtube.com].
you would probably do better for the user by either 301 redirecting to the canonical or simply showing the content from the page referred to in the link rel canonical.
Ahem. "No longer available" is a 410 ("I took it away on purpose") not a 404 ("Sorry, can't find it"). You can make a separate 410 page or send users to the same physical page as for 404; use your judgement.
If the page content has simply moved to a new URL then of course a 301 is appropriate.
I think you may have misunderstood what a "soft 404" is.
If a page is gone, you can't expect to return a 200 forever without consequences. Better to deal with it upfront.
@lucy24 that's exactly what best practices say, but Google still treats a 410 as a 404.
On some level they must know the difference, because they really do stop crawling after a while.
:: quick glance to see what forum we're in ::
Bing otoh doesn't seem to care. I've got pages that have been 410'd for ages but they still come by regularly. Not as often as for active pages, but still several times a month for pages that rarely changed when they were active.
Google respiders every URL forever, just in case content re-appears some time in the future.
They re-spider pages that last returned 410 on a lower priority than those that returned 404.
When Google finds a page that returns 404, they return to check it twice in the next 48 hours then do not return for several months.