Forum Moderators: DixonJones
From the Webmaster World's glossary it seems that crawlers are more important than spiders because it means that the links have followed.
Spiders appear to be the precursor to the arrival of the crawler. Is that why Google will spider one page, then leave?
In the past week a blog page was added and I've noticed a jump in traffic (nothing in the range of you guys--right now.) Is it likely that the traffic increase is because the blog is from Google's blogger.com? Also, spiders are returning almost daily now. I'm also assuming that is because of the blog. What do you think?
AWSTATS is my log of choice. In the spider section they have comments with the spiders like: 'identified by... robot, or crawl, or spider, or hit on robtos.txt.' Spider and crawl, I understand, but what do they mean by 'hit on robots.txt'? Is that only telling me that a request was made to the robots.txt file by an unidentified source? Is this something I should be concerned about?
One last question. In the log it indicates how many pages the spider looked during its visit. One listed 0+28, another 13+4, and yet another 3+3. There are still others, but you get the idea. AWSTATS says, "Numbers after + are successful hits on "robots.txt" files." Does that mean how many times the spider/crawler has read the robots.txt file? It isn't clear to me what the number after the '+' means.
Even if you can shed light on just one of my questions it is appreciated.
None of them suggest either old or new technology, they just refer to the process of retrieving a webpage and/or discovering links from that page to add to their crawl queue.
Is it likely that the traffic increase is because the blog is from Google's blogger.com? Also, spiders are returning almost daily now. I'm also assuming that is because of the blog. What do you think?
Only your logs can tell you where the traffic increase actually came from - sometimes you get surges of traffic from older links, other times you get a burst of traffic from something new.
The fact that Google owns blogger is really neither here nor there - I know from my own experiences that Blogger pages don't seem to be treated preferentially (at least for links etc) over other webpages.
The big question would be "how active is this blog" ... an active blog with your link inserted nicely into the daily copy will often give you a nice chunk of traffic, and if it's a more popular blog the residual referrals are often interesting.
That's my 2c, don't really know a whole lot about your choice of stats package.
- Tony
The question between spiders and crawlers came up when from the definitions given in Webmaster World's glossary.
CRAWLER:
Crawling refers to the fact, that the spider will look for links in the pages it downloads and then walk or crawl down through a web site.
SPIDER:
The main program used by search engines to retrieve web pages to include in their database. see robot.
They could very well be synomonous. From the logs it didn't appear that spiders necessarily went through the entire site.
To support the idea that they are different is that in other threads it is mentioned that spiders will come to a site and spider only the main page, then later the site will be crawled with all of the links being followed.
I get the impression that the spider is like an army scout scoping out the activities of the area. When something of interest appears, then the troops are sent in to do a thourgh inspection.