Forum Moderators: Robert Charlton & goodroi
Example: 66.249.65.238 - - [13/Sep/2006:17:48:08 -0500] "GET /parse.php?url=http://www.website.com/n-policy.htm HTTP/1.1" 200 37 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
The pages googlebot request are not on my server at all they are all broken, it takes a part of the real url and try to fetch it.
The site is 100% static we do not use php, and we have a valid working custom 404 page but still the status googlebot get is 200ok.
What on earth is happening?
If you're getting a 200-OK response, then either the parse.php page *is* there, or you have an incorrectly-configured server.
> We do not use php
It doesn't matter whether *you* use it. What matters is whether parse.php exists, and whether php is enabled.
Use your Control Panel to disable php if possible. If not, contact your host and ask them to look into this.
Jim
I am just wondering why googlebot is going after "parse.php" kinda urls or where on earth did it find them....
How did Googlebot find me?
Website is for asp.net development only, never had single link to it anywhere, and site has only been up for 5 days.
Yet today I see this:
Googlebot 3+1 26.95 KB 16 Sep 2006 - 02:34
SurveyBot 0+1 472 Bytes 13 Sep 2006 - 17:25
Do the bots go out and check URL's from whois or something?
I also started to use noindex and noarchive on my test pages for the same reason.
Suspected and proven, Using the advanced Google tool bar will cause googlebot to visit pages in a matter of hours....
That is very interesting...
Easier than submitting your site to Google, and automatic :D
I learn something new here every day. Pretty soon I will be smart.
I wondered how Google in the past had found a couple of unlinked pages that I had forgotten to make links to in our online store site a while back, I guess that is how. I had just assumed that I had a link from some other page.
It does not really matter if Google indexes the site, most of the pages are "Lorem Ipsum" stuff anyway for now.
[edited by: Wlauzon at 11:21 pm (utc) on Sep. 17, 2006]
I wondered how Google in the past had found a couple of unlinked pages that I had forgotten to make links to in our online store site a while back, I guess that is how. I had just assumed that I had a link from some other page.
And they blame us for duplicate content? Lets say you have a page online named test1.html, but your testing page .. where you do your tweaks or whatever is test2.html .. now .. supposively only you know it's there .. no links to it .. etc.. then bot comes along and fetches that page that you didn't delete .. now you have duplicate content! Did you submit this page.. NO. Did you link to this page in any manner .. NO. In your sitemaps .. NO.
The only crime you did was use a test file and leave it on your server .. but that page only you knew about .. and that's the way it should have been. So I guess we'll be faced with having to remove our tool bar so we can test our pages?
That's why I test in a separate folder; that folder is then also denied by robots.txt or sometimes placed behind a password control.
We just do the same thing on a development only site. For $8 a month, it makes it much easier for us to have it totally seperate from our real sites, and we don't care if we have to do a total wipe of the site for some reason.
I am sure that Google saw some duplicate content there - 100 pages or so, and 98 are all identical Lorem Ipsum pages ;p
anyhow .. I noticed that for each request I made it not only sent google the page that I was on, but was also passing to google the last query I made through my toolbar. How wierd is that.
Could google possibly be using this to link related sites, topics or content somehow.. geeze .. it was passing it from page to page.
I don't mind passing the pages to them, but passing a toolbar query along with the site that i'm on kinda bothers me. I certainly dont want our site associated something off topic.. geeze.
bye bye google toolbar and possibly analytics .. I'm gonna miss ya.
That's my experience too. Not good if you're testing pages you don't want anyone to ever find. Now I use robots.txt to prevent that from happening and I've also removed the Google toolbar.