Forum Moderators: open
Here's the error:
ON YOUR SITE, Widget Network
ERROR CODE 404 MISSING URL
OCCURRED ON Wed Nov 5 19:31:55 2003
WHEN THE URL /****-***-***-pic.html WAS REQUESTED
BY A USER AT 64.68.80.13
THE BROWSER WAS Googlebot/2.1 (+http://www.googlebot.com/bot.html)
I astericked out the bad words. I'm not sure where it's getting these pages trying to find them. Any ideas?
[edited by: ciml at 9:45 am (utc) on Nov. 6, 2003]
[edit reason] Widgetised. [/edit]
1. you used to have a page named xxx-xx.html
2. there is a link somewhere on the web to xxx-xxx.html
3. the xxx-xxx.html link was put up at some time and it was a typo and googlebot found it
most of the time when I discover these 404s it is either because I changed the page name and forgot to put up a 301 redirect -or- inkktomi still still has an old page listed and googlebot is finding their link to it
ink can take years (really) to get rid of an old page :(
I've been getting like 30 bad 404 hits from Google a day or so all relating to the same topic. They all have porn webpage names (which is why I astericked out the webpage name in my first post). I have no clue why and they all seem to be random requests.
And no, it didn't use to be a porn site. ;)
When you found a page, did you search the source code to see if your domain was listed?
All you can do is keep searching for something because the IP is googlebot's and google isn't going to tell you where they found the link to your site.
BTW, the entry you posted originally doesn't actually look like a raw log file. Is this info from some third party software or something your host provides instead of the raw logs?
Finally, if the only 404s you are finding for these pages are from googlebot you can be pretty sure they are fairly obscure links, else there would be more 404s from others trying to access them - in which case you might find a referer.
The entry I first posted is a cgi program that intercepts 404/any error and logs it in it's own file. It can then email me with the error, etc.
Yes, so far they only seem to be from Google Bot.