homepage Welcome to WebmasterWorld Guest from 107.20.25.215
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
Forum Library, Charter, Moderator: open

Deprecated - Altavista, Alltheweb.com Forum

    
Trouble with Fast spidering my site
help!
CuriousWeb




msg:220422
 6:40 pm on Nov 11, 2002 (gmt 0)

OK I have a site that is about 6 months old. I get visited by Fast's spider quite regularly (for the last 3 months at least) but it only requests robots.txt and then leaves. So far it hasn't even taken the index page let alone crawled deeper into the site.

I've had a good read around this forum but can't see what I'm missing. I have some good incoming links, including DMOZ, meta tags and titles in place.

Can there be something obvious that I'm missing?

TIA

 

WebManager




msg:220423
 6:47 pm on Nov 11, 2002 (gmt 0)

Do you actually have a robots.txt?

If your robots.txt is not set up for any particular purpose, you might would be better off deleting it.

<major edit>

[edited by: WebManager at 6:51 pm (utc) on Nov. 11, 2002]

CuriousWeb




msg:220424
 6:50 pm on Nov 11, 2002 (gmt 0)

webmanager - yes sorry meant to say that I don't have a robots.txt so I assume that this is OK. Google has no problems spidering the site...

WebManager




msg:220425
 6:57 pm on Nov 11, 2002 (gmt 0)

CuriousWeb,

Unfortunately it may be that the site isn't considered relevant / important enough due to their algo. I had a site that was visited again and again by Google - and it sometimes showed up in their minty fresh updates for a week or so - and then disappeared again.

It was highly relevant to a human reader, but I had to make some basic changes to the index page to convince the spider.

Are your titles, keywords and content such that what your site claims to be about is clear to a semi-intelligent spider?

NFFC




msg:220426
 7:01 pm on Nov 11, 2002 (gmt 0)

Curious [I was too ;)] it will be worth looking at how your server is set up to deal with 404's, I think that is where you may find the problem. You on a raq?

CuriousWeb




msg:220427
 7:20 pm on Nov 11, 2002 (gmt 0)

WebManager - It is a recruitment site so I figure it should be fairly relevant (and obvious) I think.

NFFC - yes raq and it's a custom 404 page. Not really my area so any major do's and don't I'd appreciate...

NFFC




msg:220428
 7:44 pm on Nov 11, 2002 (gmt 0)

>do's and don't I'd appreciate...

In some cases I get a prompt for a cert in Linux/Mozilla on the 404 pages of raq's, maybe just that an image is being called from the server root. If you don't have access to the server then it may be wise to upload a valid robots.txt and avoid that particular problem.

CuriousWeb




msg:220429
 8:09 pm on Nov 11, 2002 (gmt 0)

>If you don't have access to the server then it may be wise to upload a valid robots.txt and avoid that particular problem.

Think I'll do that and see if it changes anything. Thanks

NFFC




msg:220430
 8:17 pm on Nov 11, 2002 (gmt 0)

Have a look here first [searchengineworld.com...]
for what should be a very simple file it can be hard to get right [don't I know it!].

On a sidenote Fast seem to be very responsive to email, if you have a problem after this I would mail them, I'd be surprised if you didn't get a helpful reply.

CuriousWeb




msg:220431
 8:31 pm on Nov 11, 2002 (gmt 0)

Thanks NFFC...

Just added

User-agent: *
Disallow:

for the time being so it doesn't 404.

If nothing changes I'll get onto Fast to see if they can do anything...

CuriousWeb




msg:220432
 6:10 pm on Nov 22, 2002 (gmt 0)

Well I emailed Fast and just got back this reply:

<<After looking into this, we found that your robots.txt file is set to:

User-agent: *
Disallow:

The asterisk in the User-agent area effectively blocks the crawlers from
indexing any crawled information. A suggestion would be to replace the
asterisk with the user-agent name or names of the robots you are trying
to block. As long as FAST is not on your robots.txt user-agent list,
then we would be able to index the information we crawl.

We hope that is helpful to you.>>

Now I believe that shouldn't be the case...or am I missing something?

mayor




msg:220433
 10:52 pm on Nov 22, 2002 (gmt 0)

They probably read your email too quickly (err too Fast).

This should allow all robots:

User-agent: *
Disallow:

This should disallow all robots:

User-agent: *
Disallow: /

I recommend writing them back for confirmation. I think they erred in their response.

heini




msg:220434
 10:58 pm on Nov 22, 2002 (gmt 0)

Yup, what Mayor says... ;)

CuriousWeb




msg:220435
 12:07 am on Nov 23, 2002 (gmt 0)

thanks,

yep I checked the tutorial to be sure I wasn't being stuuuupid and replied back to them. they replied straight away saying that they would manually force the crawler to crawl my site. :) Hoping for traffic now...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved