Forum Moderators: open

Message Too Old, No Replies

CGI pages

Not crawled when they don't have parameters!

         

Bones

1:24 am on Jan 3, 2004 (gmt 0)

10+ Year Member



Just been through my logs and noticed that Googlebot crawls every link on my site, except those which end in: .cgi (dynamic pages with NO parameters)

mypage.cgi - won't get crawled.

mypage.cgi?a=1&b=2 will get crawled.

I did a quick search of Google and couldn't find any straightforward .cgi pages in the SERPs. However pages that have extensions like: pl, asp and php are happily crawled if they don't have parameters.

Is this my imagination? Any reason for it?

dirkz

3:42 pm on Jan 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google is very cautious wich dynamic pages. They just don't want to wreak havoc with your server. mypage.cgi looks indeed very suspicious. But there could be other factors at work. Maybe the Googlebot responsible for mypage.cgi was scheduled for something else and will come back tomorrow.

Dynamic pages are crawled more slowly.

Bones

5:52 pm on Jan 3, 2004 (gmt 0)

10+ Year Member



I just used "mypage.cgi" as an example - although I'm not quite sure why you think mypage.cgi would be very suspicious! :)

As a real example, why is:

boardpower.cgi not crawled, when:

boardpower.cgi?cookie=logout is?

The second one looks far more 'dodgy' than the first.

I've checked through my logs for the last couple of weeks and seen the same pattern throughout, so I don't think it's a case of sitting tight and waiting for GoogleBot to come back tomorrow.

Googlebot generally picks up a couple of hundred pages quite happily from my site each day, 99% of which have parameters, and yes, I'm quite glad Googlebot doesn't hammer the server to get them.

I guess I'm the only one seeing this. *shrug*

dirkz

7:58 pm on Jan 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> although I'm not quite sure why you think mypage.cgi would be very suspicious! :)

My point was this: CGIs are normally meant for parameters. Standalone ones look as if the parameters were forgotten, and Googlebot is always cautious about that (can you append some empty parameters?)

Google once crushed an online game session just by calling some CGIs (if I remember this story correctly).

Bones

8:08 pm on Jan 6, 2004 (gmt 0)

10+ Year Member



I'm not sure I agree, but I think I see what your getting at now.

Personally, I tend to write my CGI scripts so the 'meaty' bit (for want of a better word) will be displayed when calling the script itself (ie, .cgi without parameters), then any functionality is achieved by using parameters (ie, .cgi?action=dofunkystuff). I'd imagine a lot of other script writers do the same. I find it hard to believe Google intentionally ignore .cgi when they wouldn't ignore .pl, which could probably be the exact same script.

Adding on bogus parameters may work, but it's a bit of an ugly looking bodge. I can't help but think Googlebot just doesn't crawl dynamic pages quite aswell as I originally thought. :o/

Jesse_Smith

2:20 am on Jan 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Use mod_rewrite to change it to

mypage-a-1-b-2.html

and you can bet you'll get them indexed!

dirkz

5:39 am on Jan 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I can't help but think Googlebot just doesn't crawl dynamic pages quite aswell as I originally thought. :o/

That's a fact. See GoogleGuy's post.
[webmasterworld.com...]

Bones

7:24 pm on Jan 7, 2004 (gmt 0)

10+ Year Member



Indeed, that is another way around it. That wasn't really what I was asking though - I was just curious why Googlebot ignores these pages. ;o)

dirkz

8:42 am on Jan 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Is this my imagination? Any reason for it?

No answer for your original question.

g1smd

7:33 pm on Jan 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you have many internal or external links that point to the cgi but without any parameters?

If not, then add some, sharpish.

Bones

12:20 am on Jan 15, 2004 (gmt 0)

10+ Year Member



Yes, lots of internal, a few external.

If anyone can find a .cgi page, without any parameters, currently listed on Google - please sticky me. Maybe I can figure out why mine are being ignored.

GodLikeLotus

12:49 am on Jan 15, 2004 (gmt 0)

10+ Year Member



Google does exactly what it wants, unless you are a big spender.

They do what they want and when they want.

Woried about duplicate content or dodgy redirects, well no fear.

Search for caskets:
[google.com...]

The top result is no more than a 302 redirect to another site. It does have a listing from DMOZ so it must be right, even if its a link to a redirect. They are however good AdWord spenders, that must make it right.