Welcome to WebmasterWorld Guest from 18.204.2.53

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

404 in GWT to pages that have never existed on my website

     
9:31 am on Feb 2, 2015 (gmt 0)

Full Member

10+ Year Member

joined:Apr 26, 2009
posts: 286
votes: 6


Since about 25th January I started to get unusual amount of 404 crawl errors in my GWT all of which point to some strange pages on my site that were never there, I mean there were never there from beginning, which was way back in 2002.
What so strange about it is that GWT will usually point to the page that those broken links had appeared on, but at this point there is nothing to refer to, so I simply cannot see anything. Naturally I went to check my site logs to see if I can find any trace of these links anywhere, but there is nothing there, except from several hundred IP's from china that are trying my site for loopholes in various areas, but either having 404 or 403 responses to there queries. To be on the safer side, I have had to block all of the IP's that I could trace via .htaccess.

I understand that 404 cannot harm the site and SEO in general, which is fine, but is is safe to assume that I should just leave it to Google to decide what to do with all of this errors?

I have checked entire site and DB for possible injections, nothing was found, I also tried fetching site in GWT to see if I can find any trace of clocking or anything else in the source code that should have not been there in the forst place, but found nothing again.

here is few of these URL's

http://www.example.com/products/variable-voltage-widgets/variable-voltage-accessories/v-scope-vv-products.html
http://www.example.com/index.php/green-widget


I do not use any unsafe methods in my PHP and all of the queries are checked prior to processing them so that anything that is not in my DB will automatically fire up 404.

I am open for suggestions and your experience on this subject.

[edited by: aakk9999 at 1:03 pm (utc) on Feb 2, 2015]
[edit reason] wigetised URLs [/edit]

1:15 pm on Feb 2, 2015 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


Hi AlexB77,

Are you on a shared hosting? I had a situation similar to yours and the problem was that the hosting company made a short-lived mistake so another website on the same server has temporarily pointed to my client's website. This created URLs with my client's domain, but their URL path.

This short lived mistake was long enough (couple of hours) for Googlebot to come and harvest all URLs. The mistake was corrected, but Google has in the meantime harvested "new URLs" and in retrying these, gets served 404.

These 404 appearing in WMT was what alerted us to a problem and then we checked logs, going back a month or so in the past, and this is where we found Googlebot requesting these URLs. Searching the web with "inurl:/*path-comes-here" command showed us a website that had the same URL path. Checking hosting of that website, we saw it was hosted by the same hosting company and upon enquiring, it was physically on the same server (not the same IP since client's site used https and had its own IP).

This happened about 3 months ago. Our client's site was not harmed. Every now and then we go to WMT and "Mark as fixed" all these URLs and every time there are less and less of them reported.

There may be other reasons for such URLs, but if you are seeing many of them with paths that look like the full URL paths (rather than getting 404 because of URLs being truncated), then this is certainly the aspect I would investigate first.
1:39 pm on Feb 2, 2015 (gmt 0)

Full Member

10+ Year Member

joined:Apr 26, 2009
posts: 286
votes: 6


Hi aakk9999,

I am actually on dedicated server with only one site currently hosted on it large enough to require entire server.

In relation to the rest, yes this is exactly what I have done in the first place, I have started to dig deep in to my access logs, but found no trace of the URLs that were reported by GWT, instead I have discovered bunch of other URL's of similar kind, which were not reported at all by GWT. I have search with "inurl:/ " but found no site that is using that particular structure of the URL's to report the problem to them, so at this point I am totally lost as I have run out of options, at least for now.
4:58 pm on Feb 2, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15809
votes: 847


anything that is not in my DB will automatically fire up 404

That's all you need to do. If the site is database-driven the requests may not show up in logs as 404. But as long as you're certain the visitor is receiving a 404 you're good to go.

I have run out of options

Honestly, you don't need to do anything. As far as you're concerned, Google is making up URLs out of the clear blue sky and then informing you that they don't exist.

Proverb: Even Google Nods. They're currently plaguing me with requests in the form
/realdirectory//realsubdir/real-file-path
with double slash. All you can do is feed them the correct response and wait for them to ... well, not go away exactly, but slow down in their asking.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members