Forum Moderators: open
FAST-WebCrawler/3.6 or 3.5 fetched my robots.txt on the following dates:
What can/should I do? I am in DMOZ and have links to my pages. My pages rank well in Google, Altavista, Teoma, ... Could you have a look at my "robots.txt" -- it's valid according to the Robots.txt Validator [searchengineworld.com].
Your robots.txt looks OK to me... The only thing I see is that your
three Disallows for "Microsoft URL Control" are not likely to work -
those user-agents likely won't check robots.txt at all. You should
probably block these in .htaccess (for Apache server) instead.
Does Fast index any of the sites that link to your site? If so, they
should pick up your site quickly.
In your log files, what server code does your server return when Fast
requests robots.txt?
Have you recently moved your site?
Have you tried Fast's "Submit a Site" process?
Another "picky" thing about robots.txt is that is a Unix-format file;
Make sure you don't have carriage-return/linefeed pairs as the end-of-
line characters. Most robots.txt validators will catch this problem
though, so I assume this is not your problem. If so, and you're on a
PC, you can edit it in MS Word and use the "Save as" options, specifying
ASCII text, LF only.
So, good question - Anyone else?
Jim
Your robots.txt looks OK to me... The only thing I see is that your three Disallows for "Microsoft URL Control" are not likely to work - those user-agents likely won't check robots.txt at all. You should probably block these in .htaccess (for Apache server) instead.I do use .htaccess for some 301s but haven't figured out blocking UAs.
Does Fast index any of the sites that link to your site? If so, they should pick up your site quickly.Yes other pages linking to me are in.
In your log files, what server code does your server return when Fast requests robots.txt?200 OK
Have you recently moved your site?No. But I only started adding real content and getting links to it a couple of months ago.
Have you tried Fast's "Submit a Site" process?I am sure I submitted a page or two a couple of months ago (free submit) and did so again a couple of days ago. I might be wrong cause I don't keep notes...
Another "picky" thing about robots.txt is that is a Unix-format file;I am using Linux myself and checked again but everything seems right.
So, good question - Anyone else?Thanks for your help. You see, I think I double-checked everything and really can't find anything. :(
Well, I just read thread Kudos to AllTheWeb - Customer Service [webmasterworld.com], so maybe there's hope after all. I used the AllTheWeb.com: Send Feedback to FAST [alltheweb.com] form a couple of days ago. But I also wanted to be sure that there's not a general problem (that could affect other search engines as well).
Guess I will just have to wait and hope that Google will never start ignoring me. It's just that I don't want to put all eggs in one basket ...
I looked at the source of a couple of your pages, and I still don't see anything
wrong. However, I should add that I'm not familiar with XHTML. I also searched
for your site on Fast using your URL, and your name and the primary subject of
your site. No luck.
I have a couple of observations and maybe they will help...
The DOCTYPE statement should probably not be broken into two lines. (I know the
W3C HTML validator is very picky about spacing and capitalization of letters in
DOCTYPE). Did you try the validators at www.w3c.org?
The DOCTYPE says it is English, but the other tags says it is Deutch (It's
both, yes, but these may need to agree)
You have several repeats in your meta keywords which gain nothing, and reduce
the value of the ones that follow. (Not related to your problem, but true).
Try adding one blank line at the end of your robots.txt. I'm not at all sure
it's required, but the specification calls the line break a "record separator",
so maybe some robots think the record didn't end?
These are all pure guesses, but I agree it is not a good idea to have all
of your eggs in one basket, and Fast looks like a nice second or third basket.
Fast is usually fast. I made some changes to my site, and within a week, Fast
had found them and updated its index. So, if you do fix something, you
should see results soon.
It looks like I will have to bookmark your site and come back to read about
URL filtering, etc. Very nice browser feature!
Jim
The DOCTYPE statement should probably not be broken into two lines. (I know the
W3C HTML validator is very picky about spacing and capitalization of letters in
DOCTYPE).
That shouldn't be a problem. A line break between the Formal Public Identifier (FPI) and the system identifier is completely legitimate, and in fact, sorta traditional. A few very old browsers have trouble with it, but no legitimate validator or robot should complain. Besides, it doesn't sound like Fast is even requesting the HTML files, so the SGML probably isn't the problem.
The DOCTYPE says it is English, but the other tags says it is Deutch (It's
both, yes, but these may need to agree)
They're not supposed to agree (in this case). The language code in the FPI identifies the language used to create the markup language, not the language of the document content. HTML's DTDs are all written in English, so the HTML FPIs always use EN.
Looking at the source code for luma's home page, I'd personally be more concerned about starting the page with an empty comment tag. Starting a page with a comment feels like bad karma to me.
don't panic, your site is not the only one not being fully spidered. a couple of weeks back, brett mentioned that FASTs spider wasn't crawling as normal. it could be that FAST are changing the crawl schedule, or they might be stopping the free crawling, or maybe something else. we'll have to wait and see.
Did you try the validators at www.w3c.org
Try adding one blank line at the end of your robots.txt
I'd personally be more concerned about starting the page with an empty comment tag. Starting a page with a comment feels like bad karma to me.
we'll have to wait and see.
Thanks for all of your help.
I am not sure what you are talking about. I don't have an empty comment tag, do I?
As it turns out, you don't. I was using my roommate's computer last week (mine was dismantled while I worked on a hardware problem), and I didn't realize the advertising filter he's using with IE alters the source code of HTML pages.
access.log.33.2:66.77.73.254 - - [13/Aug/2002:15:04:43 +0200]
"GET /widgets/blue.html HTTP/1.0" 200 21053 www.domain.com "-"
"FAST-WebCrawler/3.6/FirstPage (crawler @fast.no;
http*//fast.no/support.php?c=faqs/crawler)" "-"
It fetched two pages but no robots.txt. The last time "FAST-WebCrawler/3.6 (atw-crawler at fast dot no; http*//fast.no/support/crawler.asp)" fetched robots.txt was on July, 23rd. It didn't fetch any pages.
So, is the Fast-Firstpage crawler the regular guy or did they finally read my e-mail and send some special bot? What do you think, (when) will those pages make it in the index?
It's very strange that the crawler didn't ask for the robots.txt. It should do that every time it accesses the site. Are you sure?
Actually I haven't noticed the "first page" crawler before. It could be anything from a regular crawler to a special bot they use to check a site out manually (I doubt that - manual checks would be too time consuming).
First page could mean a bot for new sites that they haven't got in the database yet?
Come on guys and girls, help luma out and check if you have the first page crawler in your logs:
"FAST-WebCrawler/3.6/FirstPage (crawler @fast.no; http*//fast.no/support.php? c=faqs/crawler)" "-"
-I can't get the url to the crawler info to resolve either.
You should be able to access FAST's crawler pages:
FAST-WebCrawler/3.6 atw-crawler at fast dot no;
FAST Web Crawler >> FAQs
[fast.no...]
FAST-WebCrawler/3.6/FirstPage crawler @fast.no
FAST Customer Support
[fast.no...]
That second address gets redirected to [fast.no...] were you find a link to FAST's Web Crawler FAQ (see above).
>What do you think, (when) will those pages make it in the index?
Luma - no predictions here. My impression is Fast is working on something which has slowed crawling cycles and picking up new sites somewhat down.
That's it. I will completely ignore FAST (what a funny name for something this slow) until they crawl and list all of my site.
<random google success story>
A friend of mine finally got his domain online. I linked to two of his pages (frameset and deep) on Aug, 28th. On Sept., 1st those two pages were in!
</random google success story>