homepage Welcome to WebmasterWorld Guest from 54.205.241.107
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
Forum Library, Charter, Moderator: open

Deprecated - Altavista, Alltheweb.com Forum

    
ATW reads robots.txt then leaves
Fizzy

10+ Year Member



 
Msg#: 972 posted 9:45 pm on Nov 13, 2003 (gmt 0)

Hi all,

I have been getting regular visits from atw but for some reason it comes in, reads my robots.txt and then immediately leaves, like so:

66.77.73.89[06/Nov/2003:20:30:27GET / HTTP/1.020021224-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)
66.77.73.89[07/Nov/2003:21:12:45GET / HTTP/1.020021039-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)
66.77.73.89[08/Nov/2003:15:30:23GET /robots.txt HTTP/1.0200332-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)
66.77.73.89[09/Nov/2003:09:56:00GET / HTTP/1.020021039-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)
66.77.73.89[10/Nov/2003:11:05:59GET / HTTP/1.020021039-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)
66.77.73.89[10/Nov/2003:15:55:12GET /robots.txt HTTP/1.0200370-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)
66.77.73.89[11/Nov/2003:11:45:15GET / HTTP/1.020021039-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)
66.77.73.89[12/Nov/2003:12:59:16GET / HTTP/1.020021039-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)
66.77.73.89[12/Nov/2003:16:42:49GET /robots.txt HTTP/1.0200370-FAST-WebCrawler/3.7/FirstPage (atw-crawler at fast dot no;http://fast.no/support/crawler.asp)

All I have in my robots.txt file is the following:

#
# This restricts access to only known and registered robots.
#
User-agent: *
Disallow: /cgi-bin/

User-agent: TurnitinBot
Disallow: /yabbse/

User-agent: NPBot
Disallow: /

User-agent: Zao
Disallow: /yabbse/

User-agent: ia_archiver
Disallow: /

User-agent: baiduspider
Disallow: /

Am I doing something wrong?
I have been waiting for ATW for months now but it just isn't picking up.

 

Macguru

WebmasterWorld Senior Member macguru us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 972 posted 9:50 pm on Nov 13, 2003 (gmt 0)

Hi Fizzy,

The bot is teasing you. ;)

Get more inbound links, else pull the plastic.

Fizzy

10+ Year Member



 
Msg#: 972 posted 11:04 pm on Nov 13, 2003 (gmt 0)

Hi Mac,

Well I already have over 400 inbound links according to ATW so I'm not sure how many I would need for them to decide to spider the site.

I have waited patiently for many months now and have yet to see them go through the site. The pages are shtml so I would have thought that they are spider fodder, it's not even trying to spider them though.
Could this be because ATW is coming in on the inbound links and isn't bothering to actually visit my site to spider it? Could it just be "passing through"?

aus_dave

10+ Year Member



 
Msg#: 972 posted 11:48 pm on Nov 13, 2003 (gmt 0)

Fizzy, from what I have seen of ATW it is very slow. On a mid-sized site of mine it is generally about 5-6 months behind in indexing the content when compared to Google.

I see lots of requests for robots.txt and then nothing else, similar to what you describe. Some indexing must occur occasionally though, just not very often.

I like the ATW interface but my log files tell me it sends me very few visitors.

heini

WebmasterWorld Senior Member heini us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 972 posted 11:01 am on Nov 14, 2003 (gmt 0)

From what I see in my logs the FirstPage crawler has been in charge for index page checking. Deep indexing was from FAST-WebCrawler/3.8.
That one, WebCrawler/3.8./Fresh, really lives up to it's name, it comes daily.

Fizzy, there have been several reports lately of sites where only the robots txt gets checked but nothing indexed, even for well linked sites.
Frankly I don't know what the problem is. It might have to do with the behind the scenes working at OV.
We have heard the frontend serving Altavista and ATW has been merged. The bigger question is what about the backend?

aus_dave

10+ Year Member



 
Msg#: 972 posted 1:12 pm on Nov 14, 2003 (gmt 0)

heini, sounds like you are getting some nice regular visits :).

I contacted the ATW support people by email when I first noticed this robots.txt thing, and they asked for a log file snippet so they could see what was happening. Never heard anything back though.

Fizzy

10+ Year Member



 
Msg#: 972 posted 10:25 am on Nov 29, 2003 (gmt 0)

Thanks for the replies everybody.

I'll keep watching and see if anything changes, nothing has so far though :(

Fizzy

10+ Year Member



 
Msg#: 972 posted 11:25 pm on Dec 7, 2003 (gmt 0)

Hi all,

I promised an update if I got it as I was worried about being missed out.

FAST-WebCrawler/3.8 - atw-crawler at fast dot no (117 pages and counting)

Thanks again to you all for your continued kind and helpful advice.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Yahoo / Deprecated - Altavista, Alltheweb.com
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved