the status returned is 200 OK which is wrong
Actually, it's right. Has to do with the way different filetypes handle trailing path info (the part after the extension). But that's no comfort to you here.
To make this work I have:
AddHandler application/x-httpd-php5 .php .html .htm .shtml
Meaning that your files have .html extension but you're parsing them all as php? But the sole purpose of the php is to add those includes?
<tangent>
If so, is this really the most efficient way to do it? I'd have thought an SSI would use fewer resources.
</tangent>
if I remove:
AddHandler application/x-httpd-php5 .php .html .htm .shtml
from the .htaccess I lose my menus and footers but I do get a 404 error for files listed like the one above
If you remove the line, your .html pages are no longer being parsed as php and the path-info rules change. But that doesn't address the underlying problem.
I see three things:
the "real" or "base" URL
the added stuff after the .html extension
the material to be included
Some cursory experimenting (me again in your logs!) suggests that the added parts exist as filenames in their own right. So what you've got, legitimately, is
example.com/dir/file1.html
and
example.com/dir/file2.html
which is somehow turning into
example.com/dir/file1.html/file2.html
Who's requesting these bogus pages? Only the googlebot, or other visitors as well? One thing you can do is add a line to htaccess. Assuming you've already got mod_rewrite in place:
RewriteRule ^([^.]+\.html). http://www.example.com/$1 [R=301,L]
Don't cut and paste: that's one possible wording, and may not be optimal for your site. The idea is simply to grab any request with stuff after "html" and forcibly redirect to a form without the added stuff.
I don't see the connection between the included files and the garbage URLs. I kinda suspect there are two separate and unrelated issues. A php include of the kind you're using is pretty generic-- not the kind of thing you can blame on a different host using a different php version.