Forum Moderators: open

Message Too Old, No Replies

Googlebot Requesting Non-Existent File

Google Bot Weirdness

         

cpals

11:43 pm on Nov 5, 2003 (gmt 0)

10+ Year Member



I was looking through my error logs and saw some pages that google-bot was apparantly trying to reach.

Here's the error:
ON YOUR SITE, Widget Network
ERROR CODE 404 MISSING URL
OCCURRED ON Wed Nov 5 19:31:55 2003
WHEN THE URL /****-***-***-pic.html WAS REQUESTED
BY A USER AT 64.68.80.13
THE BROWSER WAS Googlebot/2.1 (+http://www.googlebot.com/bot.html)

I astericked out the bad words. I'm not sure where it's getting these pages trying to find them. Any ideas?

[edited by: ciml at 9:45 am (utc) on Nov. 6, 2003]
[edit reason] Widgetised. [/edit]

nancyb

7:52 pm on Nov 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



can be any one or combination of these reasons:

1. you used to have a page named xxx-xx.html
2. there is a link somewhere on the web to xxx-xxx.html
3. the xxx-xxx.html link was put up at some time and it was a typo and googlebot found it

most of the time when I discover these 404s it is either because I changed the page name and forgot to put up a 301 redirect -or- inkktomi still still has an old page listed and googlebot is finding their link to it

ink can take years (really) to get rid of an old page :(

cpals

5:26 am on Nov 7, 2003 (gmt 0)

10+ Year Member



Maybe I didn't phrase my comments right. ;)

I've been getting like 30 bad 404 hits from Google a day or so all relating to the same topic. They all have porn webpage names (which is why I astericked out the webpage name in my first post). I have no clue why and they all seem to be random requests.

And no, it didn't use to be a porn site. ;)

seofreak

5:52 am on Nov 7, 2003 (gmt 0)

10+ Year Member



looks like competitor is trying to screw up by giving links to you from a porn site .. no worries tho' sinc you don't link back to them.

cpals

6:41 am on Nov 7, 2003 (gmt 0)

10+ Year Member



No way to find out the origination of these requests?

nancyb

7:20 am on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I may be being dimwitted again, but you might try searching G for the html file named in your logs. sorry if you already tried that ;)

course if it's a porn site you might have to try somewhere else as well.

cpals

2:26 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



Well, all of my logs show the Google Bot crawling for these porn pages. I meant in reply to seofreak, whether I could find out who was trying to make it look like I have these pages on here.

seofreak

3:53 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



as nancy suggested .. search for the ****-xxx.html page on G and see

cpals

5:20 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



I found this website (texts.random-link) and all it does is list text on a page of what looks like porn keywords. I didn't find my page listed anywhere, but I'm wondering if it has anything to do with it.

nancyb

5:45 pm on Nov 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



does "texts.random-link" represent the name of a file in your logs that gogglebot couldn't find? Did you search for the actual file name you originally posted?

When you found a page, did you search the source code to see if your domain was listed?

All you can do is keep searching for something because the IP is googlebot's and google isn't going to tell you where they found the link to your site.

BTW, the entry you posted originally doesn't actually look like a raw log file. Is this info from some third party software or something your host provides instead of the raw logs?

Finally, if the only 404s you are finding for these pages are from googlebot you can be pretty sure they are fairly obscure links, else there would be more 404s from others trying to access them - in which case you might find a referer.

cpals

6:19 pm on Nov 7, 2003 (gmt 0)

10+ Year Member



texts.random-link.us was a site that I found a lot of the 404 pages that were coming up in my logs. Maybe just a coincidence.

The entry I first posted is a cgi program that intercepts 404/any error and logs it in it's own file. It can then email me with the error, etc.

Yes, so far they only seem to be from Google Bot.