Welcome to WebmasterWorld Guest from

Forum Moderators: bakedjake

Message Too Old, No Replies

GigaBlast Part 3

11:17 am on Mar 18, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 10, 2001
votes: 10

Continued from: [webmasterworld.com...]

Looks impressive so far indeed. I'm really curious about any increase/decrease in relevance, once there's a significant number of sites indexed.

A few things to note, most of which you probably know already:

  • Always respect robots.txt for all pages.

  • The spider needs to do some load balancing, so that it doesn't fetch too many pages from the same site in a short time. The recommended ratio is about one page per minute and site (http://www.robotstxt.org/wc/robots.html)

  • Make sure that the images on your site are served with headers for creation date, size, and expiry date, so that the client can cache them. This will noticeably reduce the bandwidth requirements on your own system.

  • Only list one of www.example.com/ and www.example.com/index.html (home¦default.htm¦asp¦php, etc.) at least if they contain the same text.

  • Cluster the results, so that one site can't dominate the SERPs for any keyword combination.

  • I'm sure there's a lot more work waiting for you... ;)
  • 5:09 pm on Mar 29, 2002 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member

    joined:June 18, 2001
    votes: 0

    It will probably reset many more times. It is just in pre-beta testing. Nothing is permanent at this stage. I'm sure we will all need to check it and resubmit when it comes out of testing.
    10:39 pm on Mar 30, 2002 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member

    joined:July 7, 2001
    votes: 0

    Hey, Gigablast has been 404 all afternoon!


    8:43 pm on Apr 1, 2002 (gmt 0)

    New User

    10+ Year Member

    joined:Oct 12, 2001
    votes: 0

    pyst - While clustering improves matters a lot, it can be very useful to spider deeply. Some sites don't have every topic on the site detailed on the home page, and sometimes a deeper page is really more relevant to a search.

    Not spidering sites deeply, and only paying attention to the home pages just encourages people to get a different site for each product. Certainly, the more topics your home page covers, the less likely you are to rank well for the specific topics customers will look up. This is exactly what happens in Yahoo and Looksmart, who only pay attention to the homepage. You end up with whole categories full of one-off sites that are obvious domain spam.

    6:34 pm on Apr 2, 2002 (gmt 0)

    Senior Member

    WebmasterWorld Senior Member 10+ Year Member

    joined:Mar 6, 2002
    votes: 0

    The site has been down for me for a couple of days now. Matt, are you still reading post in the forum? I was just wondering if there was some kind of major problem or are you just doing some more fine tuning??
    12:09 am on Apr 3, 2002 (gmt 0)

    Administrator from US 

    WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    joined:Sept 21, 1999
    votes: 16

    Lets go ahead wrap this one up. Gigablast looks like a good new project. We wish you well.

    When you get it all tweaked and ready for a roll out - feel free to let us know and we'll have another go at it.


    This 65 message thread spans 3 pages: 65