| 8:44 pm on Nov 26, 2002 (gmt 0)|
Welcome to WebmasterWorld [webmasterworld.com], Dean.
If you have plenty of PageRank in the page that links to the missing page, then you should normally expect it to be spidered despite the cgi-bin or .cgi URL components.
.ctl may be a different matter. We discussed this topic recently [webmasterworld.com] but I don't think a definitive answer was reached.
I don't want to jump to an unfounded conclusion here, but it appears that Google doesn't crawl URLs that look like they might have unknown file extensions. The WWW approach would be to use <A href="/whatever.ctl" type="text/html">, but I don't think I've ever seen it used by any user agent application, or any HTML document.
There should be no reason not to use .gif URLs for HTML and .html URLs for GIFs, as long as the Web server advertises the content-type correctly. Seeing as both IE5 and Googlebot seem to guess content-type from URLs, this becomes a moot point.
| 9:47 pm on Nov 26, 2002 (gmt 0)|
Hi dean and welcome to WebmasterWorld,
>formulation of the URL leads to its exclusion? Which part?
Too many "." imho.
| 3:51 pm on Nov 27, 2002 (gmt 0)|
>>formulation of the URL leads to its exclusion? Which part?
>Too many "." imho.
The Googlebot expects a file extension after the first "." perhaps? I don't know; I don't think data bears that out. If you drop this in the Google search box: "allinurl:s.cgi" you get 16,900 results. Not all have multiple "." in the URL but many do. (You can probably substitute any letter of the alphabet for "s" and get some results.)
The file extension of ".ctl" may be a more likely offender. Is there any way of searching for specific extensions in Google?
Thanks for your help.
| 6:08 pm on Nov 27, 2002 (gmt 0)|
Yes! The filetype: operater doesn't just work with the types in the drop-down list on the advanced search page.
This is fun, though:
| 7:17 pm on Nov 27, 2002 (gmt 0)|
>Yes! The filetype: operater doesn't just work with the types >in the drop-down list on the advanced search page.
>This is fun, though:
More fun? Try your initials, maybe. I got 314 for my filetype. Maybe we'll end all our pages with that extension.
| 7:25 pm on Nov 27, 2002 (gmt 0)|
db, are any of the problem URL's linked to from a static/regular URL?
| 7:30 pm on Nov 27, 2002 (gmt 0)|
Use "+com" as the search and you'll find many .ctl URLs.
But, all of them have?something after the .ctl
Google search for +com filetype:ctl [google.com]
| 7:33 pm on Nov 27, 2002 (gmt 0)|
>db, are any of the problem URL's linked to from a >static/regular URL?
Yes. All or almost all of them are linked from static pages within our own site. A significant number of them are linked from other sites and pages (that is, Google-indexed sites and pages).
| 6:18 pm on Dec 2, 2002 (gmt 0)|
On Nov. 26 ciml said:
>If you have plenty of PageRank in the page that links to the missing page, then you >should normally expect it to be spidered despite the cgi-bin or .cgi URL components.
>.ctl may be a different matter. We discussed this topic recently but I don't think a >definitive answer was reached.
Today I looked for pages in Google with hfs.cgi/00/ in the URL [allinurl:hfs.cgi/00/]. There are now 257 pages in the index, 256 of which have a .ctl extension. So it appears that Google has started to crawl and index these pages within the last few days.
However, the pages have no PR--according to the Toolbar at any rate (which perhaps I ought to take with more than a few grains of salt). Why is that, when our site itself has a PR of 9?
| 2:12 pm on Dec 3, 2002 (gmt 0)|
Good news, Dean. You needn't worry about the Toolbar; your listings seem to be from the Everflux [webmasterworld.com] so the PR won't show until the next Google Update [webmasterworld.com]. Some types of URLs don't show, but without "?" characters I think yours will.
The +com filetype:ctl [google.com] search doesn't show a sudden proliferation of .ctl endings; I wonder why?
- You are the only person who uses URLs ending in .ctl
- Other people will get their URLs ending in .ctl indexed at the full update.
- URLs ending in .ctl weren't the problem, and it just took you a long time to get yours indexed.
I don't think it's the latter.