Welcome to WebmasterWorld Guest from 54.80.68.137

Forum Moderators: phranque

Message Too Old, No Replies

Increase in 404 messages from G Search Console

     
9:55 am on Jul 10, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


Not sure if this is the correct place to post this.

I have received a Google Search Console message (message type [WNC-655201]) which says that there has been an increase in "404" pages on one of my websites. Currently there are 97 404s identified, all within a week.

The pages identified do not exist.

Google is trying to access the pages because it claims they they are linked to from other pages in my website. Those pages shouldn't exist (not intentionally created by me anyway). However, when i type the url of the linking pages into my browser the page comes up. It's all screwed up because the .css file is not found but it is there.

The url of each linking page is in the following format:

http://www.example.co.uk/directorya/subdirectorya/filenamea.php/filenameb.php

I have not knowingly created that page although I have created:

http://www.example.co.uk/directorya/subdirectorya/filenamea.php

The erroneous "filenameb.php" is always a another filename from the subdirectorya.

What concerns me most is that the two urls (albeit the first one is mis-constructed) are effectively duplicate content. Aside from the 97 identified in Google Search Console there are hundreds of other urls which I can mis-construct in the same way and the page apparently exists.

Anyone any idea what is going on?

[edited by: phranque at 11:28 am (utc) on Jul 10, 2017]
[edit reason] exemplified urls for clarity (example.com isn't linked in this forum) [/edit]

10:35 am on July 10, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:11540
votes: 702


Sometimes the error reports don't make sense. I've never received a message like that, but every couple days there are a couple 404s in that report for pages that have never existed on my server.

Some are obviously backlink typos. but others are a mystery.

The ones that look correct but return a 404 when you click them usually have a space at the end that you don't see.

The ones that really irritate me are incorrect URLs that are found, according to GSC, on that same incorrect URL... yeah right.
11:33 am on July 10, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11293
votes: 135


since those paths probably don't exist on the server i would assume those requests are being internally rewritten to a script which is serving the duplicate content.
is this your php script or is it a CMS of some type?
12:43 pm on July 10, 2017 (gmt 0)

Preferred Member

10+ Year Member

joined:May 18, 2005
posts:420
votes: 1


I have been experiencing the same issue (with similar erroneous URLs) for years. Sometimes I manage to block the reported URLs via robots.txt. Be careful when you do this and test the robots.txt file in Webmaster Tools to avoid blocking legitimate URLs.

I wouldn't worry too much about this however.
1:13 pm on July 10, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


I have no CMS. The website is hand coded with plain vanilla php.

When you say "those paths probably don't exist on the server" that means the php page doesn't exist on the server? If so, then no it doesn't exist. I've never published them and when I look on the server they simply aren't there.
1:16 pm on July 10, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


I have been experiencing the same issue (with similar erroneous URLs) for years. Sometimes I manage to block the reported URLs via robots.txt.


When you type those erroneous URLs into your browser does the page appear?
1:50 pm on July 10, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14735
votes: 430


Sounds like there's a bug in your hand coded php that's creating those 404 pages under a limited circumstance.

Have you run Xenu link sleuth with 404 error reporting and redirection logging turned on to see if it can reproduce the error?
5:32 pm on July 10, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


Thanks, I will investigate Xenu link sleuth and see what happens.

At the moment I do get the feeling that I am the cause of the misconstructed urls and by some peculiar circumstance they translate into a page which can be reached even though it doesn't really exist.
7:27 pm on July 10, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


OMG!

Ran Xenu and although it didn't identify what the problem was, it did identify a massive amount of bad links. Truly appalling.

I think whatever is causing the 404s is probably less important than correcting the massive amount of bad links I have in this particular website. Priorities have suddenly changed.

Thanks @martinibuster.
8:41 pm on July 10, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14719
votes: 616


Commenting strictly on the OP, so there will be some overlap:

You've got at least three different but intersecting problems.

#1 is the one you posted about: Google thinks you've got nonexistent pages, and the reason it thinks so is that it's following links from other pages of yours.

#2 relative links.This is definitely the reason your URLs in the form /pagename.php/otherpage.php are showing up without CSS: the browser is looking for the obviously nonexistent /pagename.php/somestyle.css. It may also be the reason you're getting so many bad URLs: Once someone is at /pagename.php/otherpage.php, then any relative links may well lead them to request /pagename.php/thirdpage.php

#3 wrongly interpreted path info, assuming you're on an Apache server with default AcceptPathInfo [httpd.apache.org] settings. These are handled differently in .php URLs than in .html URLs at the server level before it ever reaches your php.

If it were me, the very first thing I'd do is (ymmv of course on the details):
RewriteCond %{REQUEST_URI} ^(/[^.]+\.php)
RewriteRule \.php/ https://www.example.com%1 [R=301,L]
Regardless of other circumstances, Google now has these imaginary URLs in its memory and will keep requesting them forever, so you need to redirect them to the intended form. This redirect will, sadly, have to remain in place for ever, even after the need for it has passed. Personally I wouldn't bother about the CSS unless Google is obsessively fond of your stylesheets.

And then you can pinpoint the source of the error. I once started getting requests for /directory//more-stuff-here with double slash. I did eventually figure what I'd done wrong, but the Googlebot still had to get redirected.
9:38 pm on July 10, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


Thanks @lucy24, yes I know I need to resolve the problem in my OP.

#1 - correct

#2 - spot on about the relative links. All my internal links are relative and the extra filename.php is causing many of them to fail (including images and css).

#3 - I'm not technical so I will need to take some time to digest and investigate what you suggest. But it does seem to be correct.

Thanks very much for the suggested solution. It is very much appreciated.

BTW: your comment "wrongly interpreted path info, assuming you're on an Apache server with default AcceptPathInfo [httpd.apache.org] settings. These are handled differently in .php URLs than in .html URLs at the server level before it ever reaches your php"

was a big surprise to me. I claim absolutely no technical expertise but on the little I know that's definitely an eye-opener.
2:23 am on July 11, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11293
votes: 135


When you say "those paths probably don't exist on the server" that means the php page doesn't exist on the server? If so, then no it doesn't exist. I've never published them and when I look on the server they simply aren't there.

yes, that's what i meant.
if those paths to the .php pages "simply aren't there" that typically implies an internal rewrite.

I'm not technical so I will need to take some time to digest and investigate what you suggest.

is there a file named .htaccess in the document root directory?
6:00 am on July 11, 2017 (gmt 0)

Preferred Member

10+ Year Member

joined:May 18, 2005
posts:420
votes: 1


When you type those erroneous URLs into your browser does the page appear?

Yes.
6:26 am on July 11, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


Yes there is an .htaccess file. It has the following:

# Leverage Browser Caching by setting HTTP header expires
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access plus 1 year"
ExpiresByType image/jpeg "access plus 1 year"
ExpiresByType image/gif "access plus 1 year"
ExpiresByType image/png "access plus 1 year"
ExpiresByType text/css "access plus 1 month"
ExpiresByType application/pdf "access plus 1 month"
ExpiresByType text/x-javascript "access plus 1 month"
ExpiresByType application/javascript "access plus 1 month"
ExpiresByType text/javascript "access plus 1 month"
ExpiresByType application/x-shockwave-flash "access plus 1 month"
ExpiresByType image/x-icon "access plus 1 year"
ExpiresDefault "access plus 1 days"
</IfModule>
# End of leverage browser caching settings

#Redirect any non www. page requests to be www.
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]
#End non www. page requests

#RewriteCond %{HTTPS} off
#RewriteRule (.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]



# Use the GZIP Apache module
<ifModule mod_gzip.c>
mod_gzip_on Yes
mod_gzip_dechunk Yes
mod_gzip_item_include file .(html?|txt|css|js|php|pl)$
mod_gzip_item_include handler ^cgi-script$ mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime ^application/x-javascript.*
mod_gzip_item_exclude mime ^image/.*
mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
</ifModule>

# Enable DEFALTE
<IfModule mod_deflate.c>
AddOutputFilter DEFLATE js css
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4.0[678] no-gzip
BrowserMatch bMSIE !no-gzip !gzip-only-text/html
Header append Vary User-Agent
</IfModule>

The https rewrite is commented out - it's there for the future when I convert to https.

[edited by: phranque at 9:47 am (utc) on Jul 11, 2017]
[edit reason] unlinked urls for clarity [/edit]

9:50 am on July 11, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11293
votes: 135


are there any additional .htaccess files in any of the subdirectories of the requested paths?
2:26 pm on July 11, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


No, I've checked every directory on the server and no other htacces file is present.

What I have realised is the php files which link to missing (404) files are always the same 7. I've been through those seven files line by line and nothing appears amiss. The next step is to examine them on the server just in case they are different there.

One unusual aspect is that on 16th April the 404 report shows that there were around 100 404 files, this then dropped to 15 404 files on the 17th April. It remained at that level until the beginning of July when it went up to 97 files. I don't remember doing anything to clear off the 404 files, but it is possible.

I also republished the entire site on the 28th June, just before the 97 404 files were highlighted.

Too many probably false leads.
2:23 pm on July 12, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11293
votes: 135


What does "republished" mean?
8:40 am on July 15, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


I made some changes to the templates used in almost all pages. So I ftp'd the entire site to the server.
4:28 pm on July 15, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14719
votes: 616


I also republished the entire site on the 28th June, just before the 97 404 files were highlighted.

Aha. If you look very closely, you'll probably find the one tiny little typo that leads to a cascade of imaginary URLs.

Sometimes GSC will tell you how they learned about URLs with a non-200 response.

:: detour to refresh memory, with further digression as they suddenly claim wholly spurious "soft 404" on two pages, one of them the front page so why don't they rename it again to GWTF ::

Um. And sometimes, I guess, they won't.
5:10 pm on July 15, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


Aha. If you look very closely, you'll probably find the one tiny little typo that leads to a cascade of imaginary URLs.


Probably correct. I'm searching and because I have it in the back of my mind, the EUREKA moment will hopefully come one day when I'm doing something completely unconnected.
9:41 pm on July 15, 2017 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11293
votes: 135


I would recommend crawling the site with Xenu's Link Sleuth or a similar tool and look for clues there.
10:41 am on July 16, 2017 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 29, 2005
posts:2064
votes: 97


Done that Phranque on the basis of Martinibuster's previous suggestion.

Can't find any clue to the problem but it did allow me to clear up a whole pile of faulty internal and external links. That was definitely worth the effort.
1:26 am on July 27, 2017 (gmt 0)

Junior Member from US 

5+ Year Member

joined:Dec 23, 2008
posts:156
votes: 4


I made some changes to the templates used in almost all pages. So I ftp'd the entire site to the server.]
Ahhh... But, are there any orphan files on the server?
Or, did you clean out all the html/php files first -- before you ftp'd the entire site to the server?
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members