Welcome to WebmasterWorld Guest from

Forum Moderators: bakedjake

Message Too Old, No Replies

GigaBlast Part 3



11:17 am on Mar 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Continued from: [webmasterworld.com...]

Looks impressive so far indeed. I'm really curious about any increase/decrease in relevance, once there's a significant number of sites indexed.

A few things to note, most of which you probably know already:

  • Always respect robots.txt for all pages.

  • The spider needs to do some load balancing, so that it doesn't fetch too many pages from the same site in a short time. The recommended ratio is about one page per minute and site (http://www.robotstxt.org/wc/robots.html)

  • Make sure that the images on your site are served with headers for creation date, size, and expiry date, so that the client can cache them. This will noticeably reduce the bandwidth requirements on your own system.

  • Only list one of www.example.com/ and www.example.com/index.html (home¦default.htm¦asp¦php, etc.) at least if they contain the same text.

  • Cluster the results, so that one site can't dominate the SERPs for any keyword combination.

  • I'm sure there's a lot more work waiting for you... ;)
  • mattdwells

    6:17 pm on Mar 21, 2002 (gmt 0)

    10+ Year Member


    If robots.txt wasn't obeying your site's rules, there was a bug, but it should work now.

    I've put together a page of logos and page designs that I'd like everyone to view and if you feel so moved as to give positive or constructive feedback, please don't hesitate!

    <a href=http://www.gigablast.com/designs/designs.html>

    And thanks for all the comments so far, i've really found and fixed a lot of bugs!

    truly yours,


    6:50 pm on Mar 21, 2002 (gmt 0)

    I guess I like the first one the best though I don't understand the purpose of reversing the first a in gig a blast.

    I think the logo should be relatively small.

    Two cents,



    9:02 pm on Mar 21, 2002 (gmt 0)

    I hope to get the time to make one and send it across.
    I really didn't like any of those.

    BTW : I think your page is decent, not to say the posters page design was not good. It was pretty too, better actually, but I yet feel you should stick to the fast loading one.


    11:28 pm on Mar 22, 2002 (gmt 0)

    10+ Year Member

    you make a great job. Congratulation.

    One question to better understand the numbers:
    You say:

    the current hardware i have should hold somewhere between 200-250 million web page

    On the Gigablast about-page i read

    scales to 200 billion full pages

    Is my assumption right, 200 billion is the range what the software can do? To reach it, you need "a little bit" more of hardware, right?

    cheers klaus


    1:14 am on Mar 23, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member

    re: logo designs, I really think the "lightning bolt for an L" technique is seriously overused (maybe you should try a lightning bold with a swoosh behind it!). And every time I see that in a gigablast logo I think of Jolt cola. And I've never even tasted Jolt.

    Maybe it's for that reason that the only design on or linked from that page that I like is the second one; the red lettering with the gray and black in the background.


    1:30 am on Mar 23, 2002 (gmt 0)

    10+ Year Member

    Of those submitted so far I actually like the Sticky Sauce version. It feels google-like, but little cooler...dress it up by switching the black for your favorite color. Keep it simple...


    7:57 am on Mar 23, 2002 (gmt 0)

    10+ Year Member

    I liked the second one. It's got that trendy feel while looking clean and suffisticated.

    The first one is cool too, except the backwards "a" looks like another "b" that got blown up. If you left the "a" going the right direction, but tilted and lowered it slightly, it would probably look a lot better.

    The lighting bolt ones look kinda cheezy, like they might appear on the box of a store brand knock-off cereal. (that was the first thing to come to mind when I saw them)

    I hope that was helpful :)

    - D.G.


    12:53 pm on Mar 23, 2002 (gmt 0)

    10+ Year Member

    Matt - good job!


    12:30 am on Mar 24, 2002 (gmt 0)

    10+ Year Member

    thanks for the comments, guys.

    and, yes, gigablast does scale to 200 billion pages (200,000,000,000).
    and, yes, i would need more hardware.
    my current setup only goes to about 200-250 million, so i'd need 1,000 machines times what i have, which is actually very doable.


    click watcher

    12:40 am on Mar 24, 2002 (gmt 0)

    hi matt great work

    re logos...

    stickysauce is best because its slickest/quickest looking

    the lightening is a no no for me, nice work but doesn't suit a searchengine.

    the neon layout is good, but green = kiss of death i think (despite dmoz)
    most shades of green are not appealing, plus the lightening bolt doesn't quite work.


    9:11 am on Mar 24, 2002 (gmt 0)

    10+ Year Member

    Add URL shows that it is temporarily unavailable.

    Any ideas when its going to be up and running again?


    9:14 am on Mar 24, 2002 (gmt 0)

    10+ Year Member

    Ooopppsss ... goofed on my first post.

    Add URL is back up and running.


    12:33 pm on Mar 24, 2002 (gmt 0)

    10+ Year Member

    2,295,520 pages. quick, seems to follow links well from the site I submitted, interesting.


    10:38 pm on Mar 24, 2002 (gmt 0)

    Gigablast looks good apart from;
    a lousy looking logo graphic
    an awful background colour

    I don't mean to be nasty - just critical of something wrong that may seem like nothing but I think makes a difference.

    Yep, it is only MY opinion. Take it or leave it. I think I still like the engine and the simplicity, just like googles, I give a thumbs up to.

    Why is it so necessary for spiders to go past the index page. I can't think of many sites that need to be spidered so 'deeply' - all they do is clutter up engines with a multitude of pages making it more difficult to find the others. I clicked on one of the recent searches and was presented with an entire page of links to different pages for ONE site - very UNimpressive.

    Why don't spiders take notice of metatags - it really peeves me to make an effort to describe my sites and have it all ignored and something quite irrelevant (or less meaningful) placed in the description area.

    Is there such a thing as a human edited search engine as opposed to a human edited *laugh* directory like dmoz used to be.

    Maybe this board needs a discussion on what a search engine should be. Maybe someone will read it and build a new engine that people will actually enjoy using 100%.


    10:55 pm on Mar 24, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member

    pyst, you may have missed some of the earlier discussion in this thread and on the previous pages.

    >>I clicked on one of the recent searches and was presented with an entire page of links to different pages for ONE site

    Matt has mentioned that clustering is not yet being done, but it will be.

    Regarding the logo and colors, you'll also see that he's asked for and recieved several suggested designs; likely that will be changed.

    Remember that the site is still in the early development stages.

    >>Why don't spiders take notice of metatags - it really peeves me to make an effort to describe my sites

    Because while you may use them to accurately describe your sites, many other people have used them inaccurately to spam search engines.


    11:07 pm on Mar 24, 2002 (gmt 0)

    I think you should add the advanced search option on all of the result pages instead of just on the main page.

    My reasoning is that often after you make your first search you realize that you need to refine it somewhat...

    2 more cents,


    11:22 pm on Mar 24, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member

    Hmmm... Add a URL is down again...


    11:23 pm on Mar 24, 2002 (gmt 0)

    10+ Year Member


    Very nice work and great job.

    Just want you to know this is the kind of ingenuity that is going to make the search engine world a whole new place to be.



    3:52 am on Mar 25, 2002 (gmt 0)

    ADD URL is up again.

    I just did a search which made me wonder about the relevance of gigablast searches because I was given as many irrelevant results as valid ones. On closer inspection I noticed my search term seemed to be giving me results for another term.
    The terms are 'gay pics' and 'tin cans'.
    Any reason this might be happening? However apart for that the search results were ok.

    Something missing from the Gigablast results - the ability to pick a page 'deeper' in the pack like google has - a dozen or more pages you can pick from instead of just 'next' or 'previous'.


    3:50 pm on Mar 25, 2002 (gmt 0)

    Just found this thread... So I thought I would give it a try.

    I added the URL of my site. Within seconds it apparently had spidered my site (200 pages or so...) as well as a few hundred other related sites I have listed in my directory, judging from the "date spidered".

    Pretty impressive!


    10:13 pm on Mar 25, 2002 (gmt 0)

    10+ Year Member

    Strange... I added my url a couple of days ago after reading this thread. And now my site seems to have vanished from the SE. What's going on??

    brotherhood of LAN

    10:25 pm on Mar 25, 2002 (gmt 0)

    WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

    pgsbs, he has been starting the database from scratch, he done it 3 times last I heard, probably many more times now


    8:44 am on Mar 26, 2002 (gmt 0)

    10+ Year Member

    So I guess all the hope for a new good SE has vanished now?? Well Matt shouldn't give up on it. We need an extra, good working SE. Even with Google up and about.


    6:52 pm on Mar 26, 2002 (gmt 0)

    10+ Year Member

    hope this really becomes the next google. need some serious options out there along with google :-)

    the logos/designs, i really wasnt impressed by any of them. i'm sure some better submissions will come in soon


    7:32 pm on Mar 26, 2002 (gmt 0)

    10+ Year Member

    Man, this sure brings back memories!


    9:27 am on Mar 27, 2002 (gmt 0)

    10+ Year Member

    I discovered that too. I think it's caused by an "or"-connection between the search terms. That means, the term "gay pics" will deliver all hits with "gay" and "pics" first, and then start delivering all pages with "gay" or "pics".


    9:56 pm on Mar 27, 2002 (gmt 0)

    10+ Year Member

    Looks like the database is corrupted again. This time I found my site description leading to a porn site.

    Hmmmm if it was the other way around traffic would have gone up.

    Hope its corrected before the site owner has a stroke.


    10:52 pm on Mar 27, 2002 (gmt 0)

    5+ Year Member

    Hi I tried to submit my site but it says that it is temporarly down. Does anyone know when it will be back up?


    12:40 am on Mar 28, 2002 (gmt 0)

    What is going on with Gigablast?


    1:44 pm on Mar 29, 2002 (gmt 0)

    The database appears to have been reset again...
    This 65 message thread spans 3 pages: 65

    Featured Threads

    Hot Threads This Week

    Hot Threads This Month