homepage Welcome to WebmasterWorld Guest from 54.204.215.209
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Weird Googlebot behavior - asking for non-existent URLs
asher02




msg:3082549
 7:45 am on Sep 14, 2006 (gmt 0)

I just checked my logs and saw that Googlebot is requesting pages with parse.php

Example: 66.249.65.238 - - [13/Sep/2006:17:48:08 -0500] "GET /parse.php?url=http://www.website.com/n-policy.htm HTTP/1.1" 200 37 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The pages googlebot request are not on my server at all they are all broken, it takes a part of the real url and try to fetch it.

The site is 100% static we do not use php, and we have a valid working custom 404 page but still the status googlebot get is 200ok.

What on earth is happening?

 

jdMorgan




msg:3084956
 10:34 pm on Sep 15, 2006 (gmt 0)

> The pages googlebot request are not on my server at all they are all broken

If you're getting a 200-OK response, then either the parse.php page *is* there, or you have an incorrectly-configured server.

> We do not use php
It doesn't matter whether *you* use it. What matters is whether parse.php exists, and whether php is enabled.

Use your Control Panel to disable php if possible. If not, contact your host and ask them to look into this.

Jim

asher02




msg:3086186
 5:46 am on Sep 17, 2006 (gmt 0)

My 404 is valid, just checked Google webmaster tools and all theses broken url are marked under "url not found" with 404. I hope it is just a glitch and that it will have no "after" effect.

I am just wondering why googlebot is going after "parse.php" kinda urls or where on earth did it find them....

Wlauzon




msg:3086284
 9:21 am on Sep 17, 2006 (gmt 0)

Speaking of weird..

How did Googlebot find me?

Website is for asp.net development only, never had single link to it anywhere, and site has only been up for 5 days.

Yet today I see this:

Googlebot 3+1 26.95 KB 16 Sep 2006 - 02:34
SurveyBot 0+1 472 Bytes 13 Sep 2006 - 17:25

Do the bots go out and check URL's from whois or something?

g1smd




msg:3086292
 9:43 am on Sep 17, 2006 (gmt 0)

Suspected but not proven: both whois data and Google Toolbar queries for PageRank for every page that you visit.

asher02




msg:3086313
 10:31 am on Sep 17, 2006 (gmt 0)

Suspected and proven, Using the advanced Google tool bar will cause googlebot to visit pages in a matter of hours. I saw it happening more than once, I work on a test page that has a "test" name and in hours I see googlebot fetching this page. I started deleting my test pages while working on them in order to avoid googlebot fetching them.

I also started to use noindex and noarchive on my test pages for the same reason.

Wlauzon




msg:3086805
 11:19 pm on Sep 17, 2006 (gmt 0)

Suspected and proven, Using the advanced Google tool bar will cause googlebot to visit pages in a matter of hours....

That is very interesting...

Easier than submitting your site to Google, and automatic :D

I learn something new here every day. Pretty soon I will be smart.

I wondered how Google in the past had found a couple of unlinked pages that I had forgotten to make links to in our online store site a while back, I guess that is how. I had just assumed that I had a link from some other page.

It does not really matter if Google indexes the site, most of the pages are "Lorem Ipsum" stuff anyway for now.

[edited by: Wlauzon at 11:21 pm (utc) on Sep. 17, 2006]

Bewenched




msg:3086923
 2:44 am on Sep 18, 2006 (gmt 0)

I've had alot of requests from google that are not valid urls as well. Some of them have the file extension chopped off and some have the extension like /myfile....

These are shown in webmaster tools, exist no where in my site maps nor on the site.

Testing for a valid 404? Maybe...

Bewenched




msg:3086928
 2:51 am on Sep 18, 2006 (gmt 0)

I wondered how Google in the past had found a couple of unlinked pages that I had forgotten to make links to in our online store site a while back, I guess that is how. I had just assumed that I had a link from some other page.

And they blame us for duplicate content? Lets say you have a page online named test1.html, but your testing page .. where you do your tweaks or whatever is test2.html .. now .. supposively only you know it's there .. no links to it .. etc.. then bot comes along and fetches that page that you didn't delete .. now you have duplicate content! Did you submit this page.. NO. Did you link to this page in any manner .. NO. In your sitemaps .. NO.

The only crime you did was use a test file and leave it on your server .. but that page only you knew about .. and that's the way it should have been. So I guess we'll be faced with having to remove our tool bar so we can test our pages?

g1smd




msg:3087772
 7:24 pm on Sep 18, 2006 (gmt 0)

That's why I test in a separate folder; that folder is then also denied by robots.txt or sometimes placed behind a password control.

Dead_Elvis




msg:3087840
 8:26 pm on Sep 18, 2006 (gmt 0)

So I guess we'll be faced with having to remove our tool bar so we can test our pages?

Nope, just block Google with robots.txt or NOINDEX.

:)

Wlauzon




msg:3087952
 10:05 pm on Sep 18, 2006 (gmt 0)

That's why I test in a separate folder; that folder is then also denied by robots.txt or sometimes placed behind a password control.

We just do the same thing on a development only site. For $8 a month, it makes it much easier for us to have it totally seperate from our real sites, and we don't care if we have to do a total wipe of the site for some reason.

I am sure that Google saw some duplicate content there - 100 pages or so, and 98 are all identical Lorem Ipsum pages ;p

Bewenched




msg:3088308
 6:52 am on Sep 19, 2006 (gmt 0)

I was doing some testing tonight to make sure that 301's were being done right so I was running a header checking (firefox plugin)

anyhow .. I noticed that for each request I made it not only sent google the page that I was on, but was also passing to google the last query I made through my toolbar. How wierd is that.

Could google possibly be using this to link related sites, topics or content somehow.. geeze .. it was passing it from page to page.

I don't mind passing the pages to them, but passing a toolbar query along with the site that i'm on kinda bothers me. I certainly dont want our site associated something off topic.. geeze.

bye bye google toolbar and possibly analytics .. I'm gonna miss ya.

g1smd




msg:3089028
 6:45 pm on Sep 19, 2006 (gmt 0)

Are you sure that wasn't just the referrer being sent?

BillyS




msg:3089161
 8:34 pm on Sep 19, 2006 (gmt 0)

>>Suspected and proven, Using the advanced Google tool bar will cause googlebot to visit pages in a matter of hours

That's my experience too. Not good if you're testing pages you don't want anyone to ever find. Now I use robots.txt to prevent that from happening and I've also removed the Google toolbar.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved