Forum Moderators: open

Message Too Old, No Replies

Google and .cgi

documents Google does not index

         

Lisa

11:14 pm on Apr 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was wondering why Brett's [webmasterworld.com...] was not indexed. I did a search for CGI in the url and this is what I foundinurl:cgi [google.com], there were no documents ending with .cgi

So it appears to me that Google does not index .cgi. Am a I wrong? If I am right what other documents does google ignore?

MarkHutch

11:21 pm on Apr 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe because there is this meta tag in the page.....

<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

Lisa

11:23 pm on Apr 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ok, that explains Brett's page and perhaps my whole question... hmmm

hutchins13

4:27 am on Apr 4, 2002 (gmt 0)

10+ Year Member



It appears that Google will index URLs containing ".cgi":

[google.com...]

You might have to look through several pages of results to see what your looking for. One note, I believe they are much harder to get indexed versus htm, html, etc.

Key_Master

4:38 am on Apr 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

This tag prevents Googlebot from caching the page. I'd assume (without looking) that Brett has disallowed active.cgi in robots.txt

Believe me when I say this, Google indexes .cgi pages. In fact, I have a few hundred of them I would like removed from their index.

Macguru

4:44 am on Apr 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Maybe because there is this meta tag in the page..... <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

This tag is to request Google not to cache the page.
[google.com...]

Google will index cgi generated pages. To prevent it from doing so is generrally done with the Robots tag or the robots.txt exclusion file.

Key_Maste beat me!

(edited by: Macguru at 4:51 am (utc) on April 4, 2002)

MarkHutch

4:46 am on Apr 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just took a quick look/see of the robots.txt file and YES, he does have /cgi-bin/ disallowed.

Brett_Tabke

9:04 am on Apr 4, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Because the .htm version is indexed...

http://www.google.com/search?q=active+posts [google.com]

I did have a "no index" on it for awhile because there were 4-5 variations on the url that could result in indexing. So...to stop dupes I did some different things. It's a special page because that is the only way the bot has to get at some of the content around here (since it can't read cookies for navigation...)

Slade

3:59 am on Oct 2, 2002 (gmt 0)

10+ Year Member



I know this is an old thread, but...

I noticed that I have active.htm and active.cgi in my browser's dropdown.

I'm just curious why both are even on the page? Wouldn't it make sense to just change the .cgi link to .htm, and consolidate PR?

Or, do they do different things, that I'm not seeing?

Brett_Tabke

7:05 pm on Oct 2, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'm not stressed about it. I just put the htm version up on the nav bar to allow proxy caches to cache the page. That page is so busy, that I'm more concerned about resources that promotion. As long as bots can follow links from the first page of the active list - that's fine by me.