Welcome to WebmasterWorld Guest from 54.159.250.110

Forum Moderators: open

Message Too Old, No Replies

Trouble with Fast spidering my site

help!

   
6:40 pm on Nov 11, 2002 (gmt 0)

10+ Year Member



OK I have a site that is about 6 months old. I get visited by Fast's spider quite regularly (for the last 3 months at least) but it only requests robots.txt and then leaves. So far it hasn't even taken the index page let alone crawled deeper into the site.

I've had a good read around this forum but can't see what I'm missing. I have some good incoming links, including DMOZ, meta tags and titles in place.

Can there be something obvious that I'm missing?

TIA

6:47 pm on Nov 11, 2002 (gmt 0)



Do you actually have a robots.txt?

If your robots.txt is not set up for any particular purpose, you might would be better off deleting it.

<major edit>

[edited by: WebManager at 6:51 pm (utc) on Nov. 11, 2002]

6:50 pm on Nov 11, 2002 (gmt 0)

10+ Year Member



webmanager - yes sorry meant to say that I don't have a robots.txt so I assume that this is OK. Google has no problems spidering the site...
6:57 pm on Nov 11, 2002 (gmt 0)



CuriousWeb,

Unfortunately it may be that the site isn't considered relevant / important enough due to their algo. I had a site that was visited again and again by Google - and it sometimes showed up in their minty fresh updates for a week or so - and then disappeared again.

It was highly relevant to a human reader, but I had to make some basic changes to the index page to convince the spider.

Are your titles, keywords and content such that what your site claims to be about is clear to a semi-intelligent spider?

7:01 pm on Nov 11, 2002 (gmt 0)

WebmasterWorld Senior Member nffc is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Curious [I was too ;)] it will be worth looking at how your server is set up to deal with 404's, I think that is where you may find the problem. You on a raq?
7:20 pm on Nov 11, 2002 (gmt 0)

10+ Year Member



WebManager - It is a recruitment site so I figure it should be fairly relevant (and obvious) I think.

NFFC - yes raq and it's a custom 404 page. Not really my area so any major do's and don't I'd appreciate...

7:44 pm on Nov 11, 2002 (gmt 0)

WebmasterWorld Senior Member nffc is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>do's and don't I'd appreciate...

In some cases I get a prompt for a cert in Linux/Mozilla on the 404 pages of raq's, maybe just that an image is being called from the server root. If you don't have access to the server then it may be wise to upload a valid robots.txt and avoid that particular problem.

8:09 pm on Nov 11, 2002 (gmt 0)

10+ Year Member



>If you don't have access to the server then it may be wise to upload a valid robots.txt and avoid that particular problem.

Think I'll do that and see if it changes anything. Thanks

8:17 pm on Nov 11, 2002 (gmt 0)

WebmasterWorld Senior Member nffc is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Have a look here first [searchengineworld.com...]
for what should be a very simple file it can be hard to get right [don't I know it!].

On a sidenote Fast seem to be very responsive to email, if you have a problem after this I would mail them, I'd be surprised if you didn't get a helpful reply.

8:31 pm on Nov 11, 2002 (gmt 0)

10+ Year Member



Thanks NFFC...

Just added

User-agent: *
Disallow:

for the time being so it doesn't 404.

If nothing changes I'll get onto Fast to see if they can do anything...

6:10 pm on Nov 22, 2002 (gmt 0)

10+ Year Member



Well I emailed Fast and just got back this reply:

<<After looking into this, we found that your robots.txt file is set to:

User-agent: *
Disallow:

The asterisk in the User-agent area effectively blocks the crawlers from
indexing any crawled information. A suggestion would be to replace the
asterisk with the user-agent name or names of the robots you are trying
to block. As long as FAST is not on your robots.txt user-agent list,
then we would be able to index the information we crawl.

We hope that is helpful to you.>>

Now I believe that shouldn't be the case...or am I missing something?

10:52 pm on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They probably read your email too quickly (err too Fast).

This should allow all robots:

User-agent: *
Disallow:

This should disallow all robots:

User-agent: *
Disallow: /

I recommend writing them back for confirmation. I think they erred in their response.

10:58 pm on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member heini is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yup, what Mayor says... ;)
12:07 am on Nov 23, 2002 (gmt 0)

10+ Year Member



thanks,

yep I checked the tutorial to be sure I wasn't being stuuuupid and replied back to them. they replied straight away saying that they would manually force the crawler to crawl my site. :) Hoping for traffic now...