Welcome to WebmasterWorld Guest from 23.22.46.195

Soft 404 in GWT

   
11:02 am on Sep 20, 2013 (gmt 0)

10+ Year Member



Hi Guys,

Recently I've received my first Soft 404 in GWT.
I use a VB forum and some of my forums are private, which means that guests have to register in order to read the threads. I let guests see the thread title and links thought which is probably what Googlebot is following.

So here are my questions:
1. What HTTP header should I return if not 200? (it's not a 404 either, the content exist but not for guests)

2. Would allowing Googlebot to crawl be ok? I don't think that's cloaking and I see many sites doing this. Or will it penalize me somehow?

Anyone who knows of the vBulletin plugin let me know :)

Thanks!
9:03 pm on Sep 20, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Speaking as a user: it is VERY, VERY annoying when a search brings up a promising result, and you go there only to find the site requires a login. As noted elsewhere, people are not likely to create an account purely to see a page. Possible but not likely.

Exluding search engines from pages they've previously crawled is generally not a good idea. I'd go with a "noindex" meta.

Some sites do use a "satisfy any" structure that allows registered users and also certain named search engines. See above about annoying users :(
9:52 pm on Sep 20, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



1. What HTTP header should I return if not 200?

403 Forbidden

2. Would allowing Googlebot to crawl be ok? I don't think that's cloaking and I see many sites doing this. Or will it penalize me somehow?

I wouldn't for exactly the reason Lucy24 states about visitors finding the page in the results and not being able to access it. -- It may not have a *direct* ranking impact, but it could certainly have a negative impact on visitor behavior and the willingness of visitors to visit your site for future queries even if the page returned for the next search is wide-open for everyone.

Also, it's been stated on a number of occasions Googlebot should see exactly what a "normal visitor" sees when they visit. Since "normal visitors" are "Forbidden" from entering without an account then serving Googlebot [and even not-logged in visitors since they'll likely never notice it if you continue serving the same page and just change the header] the correct 403 Forbidden header is the way I think I would go.
11:18 pm on Sep 20, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



the reason it shows up as a soft 404 is that you have a large number of urls returning essentially the same content, which would be the page template with only the thread title varying between pages.
this fact alone and/or perhaps some wording on the page is triggering the soft 404 signal.

i would allow googlebot to crawl (and follow links) and provide a noindex with the response.
1:56 am on Sep 21, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



the reason it shows up as a soft 404 is that you have a large number of urls returning essentially the same content, which would be the page template with only the thread title varying between pages.

My thought too.

i would allow googlebot to crawl (and follow links) and provide a noindex with the response.

Interesting -- I think I like it -- +1 for Sure -- Thanks!
11:29 pm on Nov 21, 2013 (gmt 0)

10+ Year Member



In an attempt to remove the soft 404 from WMT, I recently started serving 404 responses for deactivated pages, this was followed by big drop in rankings.
I have now reverted to serving the soft 404 and will see if the site recovers.
4:00 am on Nov 22, 2013 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



I have now reverted to serving the soft 404 and will see if the site recovers.

How are you serving soft 404? Redirecting the traffic to a particular page/home page which returns 200 OK?
5:28 am on Nov 22, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



I recently started serving 404 responses for deactivated pages

But, but, splutter, that's not what a 404 is for anyway. A "deactivated" page gets a 410.
4:02 pm on Nov 22, 2013 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Not the way I'd go about, glitterball, but good luck to you.

<spluttering with lucy24>
8:03 pm on Nov 22, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



<splutter>i would allow googlebot to crawl (and follow links) and provide a noindex with the response.</splutter>
1:12 am on Nov 23, 2013 (gmt 0)

10+ Year Member



I'm sure it's not an ideal way of dealing with the inactive widget pages, but my site has now recovered (which may or may not be related to this change).

Perhaps google applies a small penalty if a large number of 404s suddenly appear on a site?
1:26 am on Nov 23, 2013 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



Are you sure it was a drop in rankings and not a drop in traffic when you made the change?
1:43 am on Nov 23, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



What worries me is the exact wording
I recently started serving 404 responses

This implies some kind of deliberate action. Ordinarily a 404 is the default response when you take no action. What happens if the php-or-similar gets a request containing a bad parameter or out-of-range value?
9:41 pm on Nov 23, 2013 (gmt 0)

10+ Year Member



Are you sure it was a drop in rankings and not a drop in traffic when you made the change?

Yes

If it's an invalid ID in the querystring, it also responds with an "Invalid Product id" type message rather than a 404.
10:38 pm on Nov 23, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Reading this thread prompted me to take a look at my GWT. I also have one soft 404. It apparently, if I understand the message correctly, is caused by a strange link from someone else:

  • http://www.theirsite.com/www/mysite.com/

    How can I fix this? Is it even possible?
  • 10:59 pm on Nov 23, 2013 (gmt 0)

    WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



    it also responds with an "Invalid Product id" type message rather than a 404.

    Those aren't mutually exclusive. You need to distinguish between what a human sees -- which can be absolutely anything -- and the server response header. If a search engine asks for a garbage URL, it needs to get a 404. Otherwise it just thinks you have a bunch of identical pages.* Conversely, if your php-or-similar script returns a 404 header, you also need to include some form of the physical 404 page, because it won't happen automatically.

    I also have one soft 404. It apparently, if I understand the message correctly, is caused by a strange link from someone else

    Nobody else can "cause" a soft 404. It happens within your own site. You wouldn't be the first person to be confused by the terminology, though. The expression "soft 404" is google's way of describing a request that should lead to a "no such page" 404 but instead gets a redirect (301/302 followed by 200) to some other page. You typically see this in sites that redirect all bad requests to the home page.


    * I recently found this illustrated in a pretty entertaining way when I tried one of those "similar pages" tools. I fed in two URLs and was told they were 100% identical. This would be because, er, both requests led to my 403 page (online tools, by their nature, live on server farms). Unblock the IP and all was well.
     

    Featured Threads

    My Threads

    Hot Threads This Week

    Hot Threads This Month