Forum Moderators: open

Message Too Old, No Replies

Wordpress CMS 404 question-Follow, No Index?

What codes to add?

         

aced84

12:31 pm on May 24, 2008 (gmt 0)

10+ Year Member



Hi there,

I am using Wordpress 2.5 as my CMS.

I heard some SEO expert said that, for SEO reason, we shouldn't allow Google or other robot to index our 404 pages.

And IF somehow, Google do indexed some of our 404 error pages, it could very well be that our blog is not returning the proper 404 Not Found http status code, which could be a big problem.

Someone suggested If we don't want our 404's indexed by Google, we should then add this to our Wordpress header.php
Code:

<?php if(is_404())echo '<meta name="robots" content="noindex" />';?>

If we add another line of code, all of our pages except for 404 pages will be indexed. The code will be :
Code:

<?php if(is_404())echo '<meta name="robots" content="noindex" />
'; else echo '<meta name="robots" content="follow, all" />
'; ?>

Question 1 : Are those the correct codes to add ?
Question 2 : WHICH PART in the header.php should I insert the code into ?

Thanks in advance for your help.

ergophobe

4:06 pm on May 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I certainly don't bill myself as an SEO expert, but I'm skeptical of the expert that was giving you advice.

WP should send a valid 404 header for any missing page. It does on all my installs. You should not have to do anything in order to make that happen.

Also, you don't want to disallow the search engines from getting 404s, because that's how they identify bogus URLs. [edit: and those pages should get removed from the search engine indexes, whether noindexed or not. Unless you have some problem where pages that should be 404s are somehow getting indexed and you need to redirect to a valid page, but still want the crawlers and users to get a 404, you shouldn't need to do anything like this.]

The bigger issue with WP is the potential for dupe content in the archives, author archives, category pages and such. Generally, I like to "follow,noindex" these pages rather than disallowing with robots.txt (which would prevent the follow and the index).

Rather than hacking WP, I usually use one of two excellent plugins
- Joost de Valk's robots meta plugin
- Headspace2 which does a zillion things, among them allow you to noindex certain pages.

Just my $0.02

Tom

[edited by: ergophobe at 4:00 pm (utc) on May 27, 2008]

aced84

4:28 pm on May 24, 2008 (gmt 0)

10+ Year Member



Hi Ergophobe,

Thank you for your reply. However, The SEO expert which I've mentioned is not really like what I meant. I assumed that person to be an SEO expert ;)

Anyway, I don't quite get what you meant , sorry for that, I am not a tech-savvy person , when it comes down to php coding stuff. I read about the NO-INDEX post from another forum . Allow me to post the link here, if it is ok. Let me know and I will post it here,or I will PM you the link, I need help on that.

My title is absolutely wrong, I think. It shouldn't be DISALLOW ROBOT INDEXING, it should be FOLLOW, NO-INDEX.

Thanks for the reminder, but I can't edit the post title now, Can I ?

ergophobe

2:21 am on May 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, but my guests just arrived so I'll have to be brief.

You should not need to add any code. The 404s in WP should work without adding code, but for the other times when you want to add No Index, either of those plugins will do it and will save you the hassle of editing your code every time you upgrade WP.

Sorry for the brief response... but just google for those plugins and play with them. If that doesn't make sense, I'll be checking in again on Monday.

>> I can't edit the post title now

Nope, but I can :-) Hopefully that's a better version.

Cheers and have a great weekend!

aced84

3:43 pm on May 26, 2008 (gmt 0)

10+ Year Member



Thanks for your suggestions again, ergophobe. I have tried headspace 2, but it is too complicated for a non-tech-savvy person like me. LOL

I guess I will just leave it like that for this moment. Thanks anyway. Have a nice day.

ergophobe

4:55 pm on May 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No no... don't give up just yet.

Step 0: download and install Firefox if you haven't already.

1. install the LiveHTTPHeaders extension [livehttpheaders.mozdev.org]

2. In Firefox, go to Tools -> LiveHttpHeaders to open a window that will show you header - the back and forth chatter between your browser and the server that negotiates what will finally be sent to your browser and how. If the box has a bunch of gibberish in it, hit the "Clear" button.

3. Very next thing, type in to the adress bar http://example.com/adfewoxfhjelxixneojs where example.com is your domain and asdfouasn is any old gibberish.

4. Go back and look at your LiveHttpHeaders window and look for a bunch of stuff that looks like this (I've bolded the important parts):


http://example.com/ASDFESasdfee

GET /ASDFESasdfee HTTP/1.1
Host: example.com

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 404 OK
Date: Mon, 26 May 2008 16:37:48 GMT
Server: Apache/1.3.41 (Unix) mod_gzip/1.3.26.1a mod_log_bytes/1.2 mod_bwlimited/1.4 mod_auth_passthrough/1.8 FrontPage/5.0.2.2635 mod_ssl/2.8.31 OpenSSL/0.9.7a
Cache-Control: no-cache, must-revalidate, max-age=0
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Pragma: no-cache
X-Pingback: http://example.com/xmlrpc.php
Set-Cookie: PHPSESSID=8782e6dbac201a2a455ad2cad4f6bbc0; path=/
Last-Modified: Mon, 26 May 2008 16:37:49 GMT
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
----------------------------------------------------------

The first bolded part is your request, the second bolded part is the server response. In this case, taken from an actual Wordpress site, it's

HTTP/1.x 404 OK

This is good, but not great. It's good, because it sends a 404. It's not great because it gives a verbal explanation of 404 as "OK" whereas ideally it would be "Not Found".

This is not crucial, though, because the HTTP specification says that only the numeric part matters and the verbal part is only a recommendation. This is because the spec can't forsee every possble language and it absolutely legit for, say, a French site to return

HTTP/1.x 404 Pas Trouvé

I used to actualy go to the trouble of hacking WP or Drupal to make it give a "not found" but, given that the spec doesn't require it, I've decided to become less OCD in my middle age.

So try that much for starters. If that works, then we can move on to the Headspace2 stuff.

I think it's worth it for you to figure this out. Assuming that you are actually going to the trouble to put information on your site that a second human being other than you might actually care about, I think it's worth it to take a little trouble just to make sure that you're not throwing obstacles in front of the search engines.

aced84

9:32 am on May 27, 2008 (gmt 0)

10+ Year Member



Hi ergophobe,

Wow, what a log post, I really appreciate what you've replied there, and trying to help me figure out the whole thing.

I can only say THANK YOU for that.

However, actually my 404 error page is working well. Whenever I try something on it, be it an old page that I have removed, or a non-existence page, it worked just fine.

The reason I asked about this in the first place, was because I want to do a "prevention" on what could have happened in the future. I read about the method on another forum, and it seems to be too hard for me to understand on it, and I posted the question here, just to double confirm if what I've heard from another forum, was a correct thing to do.

After having opinions here and there, I think I would take your advice for NOT TO HACK the header.php. I will just leave it alone, and if there is anything comes up in the future, I will surely come back and ask for help :)

THanks a million for your kind reply.

I have flagged this thread, so that it could be a reference for me, in the future :)

Hace a nice day, my friend ...

ergophobe

3:46 pm on May 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You're welcome. I guess I missed the fact that you were already sure that you were getting valid 404s. That being the case, I would definitely not hack the header.php.

It's hard for me to imagine why I would want to noindex my 404 page. If a URL is returning a 404, there's no reason on earth that a search engine should index that anyway.

Good luck with your site!