Welcome to WebmasterWorld Guest from 54.166.107.51

Forum Moderators: Robert Charlton & andy langton & goodroi

Message Too Old, No Replies

Soft 404 in GWT

     
11:02 am on Sep 20, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 14, 2004
posts:602
votes: 0


Hi Guys,

Recently I've received my first Soft 404 in GWT.
I use a VB forum and some of my forums are private, which means that guests have to register in order to read the threads. I let guests see the thread title and links thought which is probably what Googlebot is following.

So here are my questions:
1. What HTTP header should I return if not 200? (it's not a 404 either, the content exist but not for guests)

2. Would allowing Googlebot to crawl be ok? I don't think that's cloaking and I see many sites doing this. Or will it penalize me somehow?

Anyone who knows of the vBulletin plugin let me know :)

Thanks!
9:03 pm on Sept 20, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13539
votes: 403


Speaking as a user: it is VERY, VERY annoying when a search brings up a promising result, and you go there only to find the site requires a login. As noted elsewhere, people are not likely to create an account purely to see a page. Possible but not likely.

Exluding search engines from pages they've previously crawled is generally not a good idea. I'd go with a "noindex" meta.

Some sites do use a "satisfy any" structure that allows registered users and also certain named search engines. See above about annoying users :(
9:52 pm on Sept 20, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:July 19, 2013
posts:1097
votes: 0


1. What HTTP header should I return if not 200?

403 Forbidden

2. Would allowing Googlebot to crawl be ok? I don't think that's cloaking and I see many sites doing this. Or will it penalize me somehow?

I wouldn't for exactly the reason Lucy24 states about visitors finding the page in the results and not being able to access it. -- It may not have a *direct* ranking impact, but it could certainly have a negative impact on visitor behavior and the willingness of visitors to visit your site for future queries even if the page returned for the next search is wide-open for everyone.

Also, it's been stated on a number of occasions Googlebot should see exactly what a "normal visitor" sees when they visit. Since "normal visitors" are "Forbidden" from entering without an account then serving Googlebot [and even not-logged in visitors since they'll likely never notice it if you continue serving the same page and just change the header] the correct 403 Forbidden header is the way I think I would go.
11:18 pm on Sept 20, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10750
votes: 44


the reason it shows up as a soft 404 is that you have a large number of urls returning essentially the same content, which would be the page template with only the thread title varying between pages.
this fact alone and/or perhaps some wording on the page is triggering the soft 404 signal.

i would allow googlebot to crawl (and follow links) and provide a noindex with the response.
1:56 am on Sept 21, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:July 19, 2013
posts:1097
votes: 0


the reason it shows up as a soft 404 is that you have a large number of urls returning essentially the same content, which would be the page template with only the thread title varying between pages.

My thought too.

i would allow googlebot to crawl (and follow links) and provide a noindex with the response.

Interesting -- I think I like it -- +1 for Sure -- Thanks!
11:29 pm on Nov 21, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:July 7, 2003
posts:538
votes: 11


In an attempt to remove the soft 404 from WMT, I recently started serving 404 responses for deactivated pages, this was followed by big drop in rankings.
I have now reverted to serving the soft 404 and will see if the site recovers.
4:00 am on Nov 22, 2013 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


I have now reverted to serving the soft 404 and will see if the site recovers.

How are you serving soft 404? Redirecting the traffic to a particular page/home page which returns 200 OK?
5:28 am on Nov 22, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13539
votes: 403


I recently started serving 404 responses for deactivated pages

But, but, splutter, that's not what a 404 is for anyway. A "deactivated" page gets a 410.
4:02 pm on Nov 22, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12929
votes: 200


Not the way I'd go about, glitterball, but good luck to you.

<spluttering with lucy24>
8:03 pm on Nov 22, 2013 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10750
votes: 44


<splutter>i would allow googlebot to crawl (and follow links) and provide a noindex with the response.</splutter>
1:12 am on Nov 23, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:July 7, 2003
posts:538
votes: 11


I'm sure it's not an ideal way of dealing with the inactive widget pages, but my site has now recovered (which may or may not be related to this change).

Perhaps google applies a small penalty if a large number of 404s suddenly appear on a site?
1:26 am on Nov 23, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:July 19, 2013
posts:1097
votes: 0


Are you sure it was a drop in rankings and not a drop in traffic when you made the change?
1:43 am on Nov 23, 2013 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13539
votes: 403


What worries me is the exact wording
I recently started serving 404 responses

This implies some kind of deliberate action. Ordinarily a 404 is the default response when you take no action. What happens if the php-or-similar gets a request containing a bad parameter or out-of-range value?
9:41 pm on Nov 23, 2013 (gmt 0)

Preferred Member

10+ Year Member

joined:July 7, 2003
posts:538
votes: 11


Are you sure it was a drop in rankings and not a drop in traffic when you made the change?

Yes

If it's an invalid ID in the querystring, it also responds with an "Invalid Product id" type message rather than a 404.
10:38 pm on Nov 23, 2013 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 1, 2004
posts:1258
votes: 0


Reading this thread prompted me to take a look at my GWT. I also have one soft 404. It apparently, if I understand the message correctly, is caused by a strange link from someone else:

  • http://www.theirsite.com/www/mysite.com/

    How can I fix this? Is it even possible?
  • 10:59 pm on Nov 23, 2013 (gmt 0)

    Senior Member from US 

    WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

    joined:Apr 9, 2011
    posts:13539
    votes: 403


    it also responds with an "Invalid Product id" type message rather than a 404.

    Those aren't mutually exclusive. You need to distinguish between what a human sees -- which can be absolutely anything -- and the server response header. If a search engine asks for a garbage URL, it needs to get a 404. Otherwise it just thinks you have a bunch of identical pages.* Conversely, if your php-or-similar script returns a 404 header, you also need to include some form of the physical 404 page, because it won't happen automatically.

    I also have one soft 404. It apparently, if I understand the message correctly, is caused by a strange link from someone else

    Nobody else can "cause" a soft 404. It happens within your own site. You wouldn't be the first person to be confused by the terminology, though. The expression "soft 404" is google's way of describing a request that should lead to a "no such page" 404 but instead gets a redirect (301/302 followed by 200) to some other page. You typically see this in sites that redirect all bad requests to the home page.


    * I recently found this illustrated in a pretty entertaining way when I tried one of those "similar pages" tools. I fed in two URLs and was told they were 100% identical. This would be because, er, both requests led to my 403 page (online tools, by their nature, live on server farms). Unblock the IP and all was well.
     

    Join The Conversation

    Moderators and Top Contributors

    Hot Threads This Week

    Featured Threads

    Free SEO Tools

    Hire Expert Members