homepage Welcome to WebmasterWorld Guest from 54.237.95.6
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Soft 404 in GWT
followgreg




msg:4611247
 11:02 am on Sep 20, 2013 (gmt 0)

Hi Guys,

Recently I've received my first Soft 404 in GWT.
I use a VB forum and some of my forums are private, which means that guests have to register in order to read the threads. I let guests see the thread title and links thought which is probably what Googlebot is following.

So here are my questions:
1. What HTTP header should I return if not 200? (it's not a 404 either, the content exist but not for guests)

2. Would allowing Googlebot to crawl be ok? I don't think that's cloaking and I see many sites doing this. Or will it penalize me somehow?

Anyone who knows of the vBulletin plugin let me know :)

Thanks!

 

lucy24




msg:4611437
 9:03 pm on Sep 20, 2013 (gmt 0)

Speaking as a user: it is VERY, VERY annoying when a search brings up a promising result, and you go there only to find the site requires a login. As noted elsewhere, people are not likely to create an account purely to see a page. Possible but not likely.

Exluding search engines from pages they've previously crawled is generally not a good idea. I'd go with a "noindex" meta.

Some sites do use a "satisfy any" structure that allows registered users and also certain named search engines. See above about annoying users :(

JD_Toims




msg:4611452
 9:52 pm on Sep 20, 2013 (gmt 0)

1. What HTTP header should I return if not 200?

403 Forbidden

2. Would allowing Googlebot to crawl be ok? I don't think that's cloaking and I see many sites doing this. Or will it penalize me somehow?

I wouldn't for exactly the reason Lucy24 states about visitors finding the page in the results and not being able to access it. -- It may not have a *direct* ranking impact, but it could certainly have a negative impact on visitor behavior and the willingness of visitors to visit your site for future queries even if the page returned for the next search is wide-open for everyone.

Also, it's been stated on a number of occasions Googlebot should see exactly what a "normal visitor" sees when they visit. Since "normal visitors" are "Forbidden" from entering without an account then serving Googlebot [and even not-logged in visitors since they'll likely never notice it if you continue serving the same page and just change the header] the correct 403 Forbidden header is the way I think I would go.

phranque




msg:4611468
 11:18 pm on Sep 20, 2013 (gmt 0)

the reason it shows up as a soft 404 is that you have a large number of urls returning essentially the same content, which would be the page template with only the thread title varying between pages.
this fact alone and/or perhaps some wording on the page is triggering the soft 404 signal.

i would allow googlebot to crawl (and follow links) and provide a noindex with the response.

JD_Toims




msg:4611511
 1:56 am on Sep 21, 2013 (gmt 0)

the reason it shows up as a soft 404 is that you have a large number of urls returning essentially the same content, which would be the page template with only the thread title varying between pages.

My thought too.

i would allow googlebot to crawl (and follow links) and provide a noindex with the response.

Interesting -- I think I like it -- +1 for Sure -- Thanks!

glitterball




msg:4625268
 11:29 pm on Nov 21, 2013 (gmt 0)

In an attempt to remove the soft 404 from WMT, I recently started serving 404 responses for deactivated pages, this was followed by big drop in rankings.
I have now reverted to serving the soft 404 and will see if the site recovers.

aakk9999




msg:4625301
 4:00 am on Nov 22, 2013 (gmt 0)

I have now reverted to serving the soft 404 and will see if the site recovers.

How are you serving soft 404? Redirecting the traffic to a particular page/home page which returns 200 OK?

lucy24




msg:4625319
 5:28 am on Nov 22, 2013 (gmt 0)

I recently started serving 404 responses for deactivated pages

But, but, splutter, that's not what a 404 is for anyway. A "deactivated" page gets a 410.

netmeg




msg:4625411
 4:02 pm on Nov 22, 2013 (gmt 0)

Not the way I'd go about, glitterball, but good luck to you.

<spluttering with lucy24>

phranque




msg:4625460
 8:03 pm on Nov 22, 2013 (gmt 0)

<splutter>i would allow googlebot to crawl (and follow links) and provide a noindex with the response.</splutter>

glitterball




msg:4625501
 1:12 am on Nov 23, 2013 (gmt 0)

I'm sure it's not an ideal way of dealing with the inactive widget pages, but my site has now recovered (which may or may not be related to this change).

Perhaps google applies a small penalty if a large number of 404s suddenly appear on a site?

JD_Toims




msg:4625502
 1:26 am on Nov 23, 2013 (gmt 0)

Are you sure it was a drop in rankings and not a drop in traffic when you made the change?

lucy24




msg:4625504
 1:43 am on Nov 23, 2013 (gmt 0)

What worries me is the exact wording
I recently started serving 404 responses

This implies some kind of deliberate action. Ordinarily a 404 is the default response when you take no action. What happens if the php-or-similar gets a request containing a bad parameter or out-of-range value?

glitterball




msg:4625572
 9:41 pm on Nov 23, 2013 (gmt 0)

Are you sure it was a drop in rankings and not a drop in traffic when you made the change?

Yes

If it's an invalid ID in the querystring, it also responds with an "Invalid Product id" type message rather than a 404.

icedowl




msg:4625582
 10:38 pm on Nov 23, 2013 (gmt 0)

Reading this thread prompted me to take a look at my GWT. I also have one soft 404. It apparently, if I understand the message correctly, is caused by a strange link from someone else:

  • http://www.theirsite.com/www/mysite.com/

    How can I fix this? Is it even possible?

  • lucy24




    msg:4625584
     10:59 pm on Nov 23, 2013 (gmt 0)

    it also responds with an "Invalid Product id" type message rather than a 404.

    Those aren't mutually exclusive. You need to distinguish between what a human sees -- which can be absolutely anything -- and the server response header. If a search engine asks for a garbage URL, it needs to get a 404. Otherwise it just thinks you have a bunch of identical pages.* Conversely, if your php-or-similar script returns a 404 header, you also need to include some form of the physical 404 page, because it won't happen automatically.

    I also have one soft 404. It apparently, if I understand the message correctly, is caused by a strange link from someone else

    Nobody else can "cause" a soft 404. It happens within your own site. You wouldn't be the first person to be confused by the terminology, though. The expression "soft 404" is google's way of describing a request that should lead to a "no such page" 404 but instead gets a redirect (301/302 followed by 200) to some other page. You typically see this in sites that redirect all bad requests to the home page.


    * I recently found this illustrated in a pretty entertaining way when I tried one of those "similar pages" tools. I fed in two URLs and was told they were 100% identical. This would be because, er, both requests led to my 403 page (online tools, by their nature, live on server farms). Unblock the IP and all was well.

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Google / Google SEO News and Discussion
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
    © Webmaster World 1996-2014 all rights reserved