WebManager

msg:220423 | 6:47 pm on Nov 11, 2002 (gmt 0) |
Do you actually have a robots.txt? If your robots.txt is not set up for any particular purpose, you might would be better off deleting it. <major edit> [edited by: WebManager at 6:51 pm (utc) on Nov. 11, 2002]
|
CuriousWeb

msg:220424 | 6:50 pm on Nov 11, 2002 (gmt 0) |
webmanager - yes sorry meant to say that I don't have a robots.txt so I assume that this is OK. Google has no problems spidering the site...
|
WebManager

msg:220425 | 6:57 pm on Nov 11, 2002 (gmt 0) |
CuriousWeb, Unfortunately it may be that the site isn't considered relevant / important enough due to their algo. I had a site that was visited again and again by Google - and it sometimes showed up in their minty fresh updates for a week or so - and then disappeared again. It was highly relevant to a human reader, but I had to make some basic changes to the index page to convince the spider. Are your titles, keywords and content such that what your site claims to be about is clear to a semi-intelligent spider?
|
NFFC

msg:220426 | 7:01 pm on Nov 11, 2002 (gmt 0) |
Curious [I was too ;)] it will be worth looking at how your server is set up to deal with 404's, I think that is where you may find the problem. You on a raq?
|
CuriousWeb

msg:220427 | 7:20 pm on Nov 11, 2002 (gmt 0) |
WebManager - It is a recruitment site so I figure it should be fairly relevant (and obvious) I think. NFFC - yes raq and it's a custom 404 page. Not really my area so any major do's and don't I'd appreciate...
|
NFFC

msg:220428 | 7:44 pm on Nov 11, 2002 (gmt 0) |
>do's and don't I'd appreciate... In some cases I get a prompt for a cert in Linux/Mozilla on the 404 pages of raq's, maybe just that an image is being called from the server root. If you don't have access to the server then it may be wise to upload a valid robots.txt and avoid that particular problem.
|
CuriousWeb

msg:220429 | 8:09 pm on Nov 11, 2002 (gmt 0) |
>If you don't have access to the server then it may be wise to upload a valid robots.txt and avoid that particular problem. Think I'll do that and see if it changes anything. Thanks
|
NFFC

msg:220430 | 8:17 pm on Nov 11, 2002 (gmt 0) |
Have a look here first [searchengineworld.com...] for what should be a very simple file it can be hard to get right [don't I know it!]. On a sidenote Fast seem to be very responsive to email, if you have a problem after this I would mail them, I'd be surprised if you didn't get a helpful reply.
|
CuriousWeb

msg:220431 | 8:31 pm on Nov 11, 2002 (gmt 0) |
Thanks NFFC... Just added User-agent: * Disallow: for the time being so it doesn't 404. If nothing changes I'll get onto Fast to see if they can do anything...
|
CuriousWeb

msg:220432 | 6:10 pm on Nov 22, 2002 (gmt 0) |
Well I emailed Fast and just got back this reply: <<After looking into this, we found that your robots.txt file is set to: User-agent: * Disallow: The asterisk in the User-agent area effectively blocks the crawlers from indexing any crawled information. A suggestion would be to replace the asterisk with the user-agent name or names of the robots you are trying to block. As long as FAST is not on your robots.txt user-agent list, then we would be able to index the information we crawl. We hope that is helpful to you.>> Now I believe that shouldn't be the case...or am I missing something?
|
mayor

msg:220433 | 10:52 pm on Nov 22, 2002 (gmt 0) |
They probably read your email too quickly (err too Fast). This should allow all robots: User-agent: * Disallow: This should disallow all robots: User-agent: * Disallow: / I recommend writing them back for confirmation. I think they erred in their response.
|
heini

msg:220434 | 10:58 pm on Nov 22, 2002 (gmt 0) |
Yup, what Mayor says... ;)
|
CuriousWeb

msg:220435 | 12:07 am on Nov 23, 2002 (gmt 0) |
thanks, yep I checked the tutorial to be sure I wasn't being stuuuupid and replied back to them. they replied straight away saying that they would manually force the crawler to crawl my site. :) Hoping for traffic now...
|
|