Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: open
I have found shtml, php, cgi buried deep in the results, but none in the top 100.
joined:Apr 13, 2001
I have 185 static pages indexed by Fast. Very thorough job on the static pages. Some of these pages are doorways into my cgi-bin, where the dynamic pages start. Once you get going on the dynamic pages, there's no stopping.
Fast has deftly avoided any and all attempts at venturing into the cgi-bin. Just as well; at one GET per minute it would take them 83 days, and that's assuming that they're smart enough not to fetch stuff they already have. Google isn't this smart, so why would Fast be this smart?
Fast is definitely afraid of the "deep web."
joined:Apr 13, 2001
This special directory of 134 *.txt documents is one of the most popular on my site. It's rich research from the 1980s. They aren't deep into my site -- they're at www.domain.org/gw/*.txt
None are in Fast! And when I do Fast searches for things that bring in lots of hits, like "faq" with "txt", I don't see any *.txt files coming back in the links.
Those 134 docs have been posted since early 1998, unchanged.
In June, 2000, I freaked out for a while because Google was ignoring stuff that did not have any extension at all. I was recommending to someone on another site that they change their filenames to *.txt so that Google would be happy. Well, it didn't matter because Google suddenly got much less fussy last summer.
Now it appears that even a proper *.txt extension is not cool for Fast. I just checked my friend's log, and Fast is happily crawling plain-text files with no extension at all (my friend wisely ignored my advice of last June!). Is this a bug in Fast? Perhaps by fetching but not indexing "robots.txt" they get confused and throw out all *.txt files?
Am I doing something wrong? Should I have stayed in bed this morning?
It seems, or I should say it has been proven and documented, that many of the SE's won't even touch a page that has a character like this in the URL.