Forum Moderators: phranque

Message Too Old, No Replies

google webmaster tools and error log returning 404, not sure why

         

brandon0401

3:37 am on Aug 29, 2007 (gmt 0)

10+ Year Member



I am using php and multiple php includes and using mod rewrite for
pretty urls in http://www.example.com/main_dir/259561/name.html form.
Urls work perfectly in browser, and even get pages indexed in google,
but im sure im being limited. In google webmaster tools showing 404's
like
http://www.example.com/main_dir/259561/function.include
and
http://www.example.com/main_dir/259561/function.main

but I do not get 404 errors for my other mod rewrites pages that are
just http://www.example.com/main_name form

In my error log its only showing as
[Tue Aug 28 21:07:37 2007] [error] [client 38.99.44.104] File does not exist: /home/user/public_html/main_dir/259561/

seems only coming from 38.99.44.104 ip# mostly, so assuming that is a googlebot..

There are no references anywhere to that dir w/o ending part of url anywhere...

I have tried and tested and all the includes are working fine it
seems(tested by if statements on @include(dirname(__FILE__)."/
config.php"); statement
...and I do not get 404s from regular users in log

I have tried setting the header at top to header("HTTP/1.1 200 OK");
header ( 'Status: 200' ); with no luck.

I have checked my header responses using googlebot emulators and get fine 200 response, and 404 for my error pages as well.

I am stumped have no clue why would be getting a 404 here in logs

Any ideas for me here? Thanks in advance!

Brandon

[edited by: eelixduppy at 3:57 am (utc) on Aug. 29, 2007]

[edited by: brandon0401 at 4:13 am (utc) on Aug. 29, 2007]

brandon0401

3:51 am on Aug 29, 2007 (gmt 0)

10+ Year Member



Just to add some info here is my .htaccess code

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R,L]
RewriteRule ^dir/(.*)/(.*)\.html$ url.php?url=$1&text=$2
RewriteRule ^main_dir/(.*)/(.*)\.html$ comment.php?id=$1&text=$2

[edited by: eelixduppy at 4:00 am (utc) on Aug. 29, 2007]

jdMorgan

12:39 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



38.99.44.*** is NOT a Googlebot IP address. It belongs to a major backbone provider, and is used by a lot of scrapers and other undesirables. If fact, I have the entire 38.* IP range blocked, except for the 38.114.104.0/24 range, in order to allow GigaBot to spider.

The IP address you cited is currently being used by cuill.com. They deploy the badly-behaved "Twiceler" robot, which doesn't fetch or heed robots.txt.

In short, unless your script is doing "GETs" of these included files using the HTTP protocol instead of reading them locally from the server filesystem, and unless you are seeing errors in Google Webmaster Tools, I flat wouldn't worry about this.

Jim

jdMorgan

2:59 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note also that your first rule has several problems, one major. You must use a 301-Moved Permanently redirect if you wish to avoid duplicate-content issues:

RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [[b]R=301[/b],L]

It would also improve efficiency if you added the [L] flag to the last two rules, and perhaps made the ".*" patterns more specific.

Jim

brandon0401

5:13 pm on Aug 29, 2007 (gmt 0)

10+ Year Member



Thanks for the reply.

Ya the problem is I am seeing 404's in webmaster tools..

I just pointed that out from logs cause I see alot of it from that ip#...

when you mean http for include do you mean

<?php include("http://www.example.com/include.php");?>

Cause I am doing that...

Why would it cause webmaster tools to report that with the file name cutoff and

Is that why?

function.main or function.include at end like above...

Thanks for the tip on rewrite...

does
RewriteRule ^dir/(.*)/(.*)\.html$ url.php?url=$1&text=$2
RewriteRule ^main_dir/(.*)/(.*)\.html$ comment.php?id=$1&text=$2

look right?
and rewriterule ^example index.php?category=160

Thanks in advance.

jdMorgan

6:23 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<?php include("http://www.example.com/include.php");?>

You should not use a full URL unless this included file is located on another server. Fix this, and it's likely your problem will go away.

I'm no PHP expert, so I'll refer you to our PHP forum for details... :)

Jim

brandon0401

7:25 pm on Aug 29, 2007 (gmt 0)

10+ Year Member



so you think google would read it like that? thanks

g1smd

7:35 pm on Aug 29, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google does not and cannot directly read any of the "include" statements.

The server includes the content from the other page before anything is sent out to the browser.

Users and bots cannot see the join.