homepage Welcome to WebmasterWorld Guest from 54.167.75.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Alternative Search Engines
Forum Library, Charter, Moderators: bakedjake

Alternative Search Engines Forum

This 65 message thread spans 3 pages: < < 65 ( 1 [2] 3 > >     
GigaBlast Part 3
bird




msg:465485
 11:17 am on Mar 18, 2002 (gmt 0)

Continued from: [webmasterworld.com...]


Looks impressive so far indeed. I'm really curious about any increase/decrease in relevance, once there's a significant number of sites indexed.

A few things to note, most of which you probably know already:

  • Always respect robots.txt for all pages.

  • The spider needs to do some load balancing, so that it doesn't fetch too many pages from the same site in a short time. The recommended ratio is about one page per minute and site (http://www.robotstxt.org/wc/robots.html)

  • Make sure that the images on your site are served with headers for creation date, size, and expiry date, so that the client can cache them. This will noticeably reduce the bandwidth requirements on your own system.

  • Only list one of www.example.com/ and www.example.com/index.html (home¦default.htm¦asp¦php, etc.) at least if they contain the same text.

  • Cluster the results, so that one site can't dominate the SERPs for any keyword combination.

  • I'm sure there's a lot more work waiting for you... ;)
  •  

    mattdwells




    msg:465515
     6:17 pm on Mar 21, 2002 (gmt 0)

    greetings,

    If robots.txt wasn't obeying your site's rules, there was a bug, but it should work now.

    I've put together a page of logos and page designs that I'd like everyone to view and if you feel so moved as to give positive or constructive feedback, please don't hesitate!

    <a href=http://www.gigablast.com/designs/designs.html>
    [gigablast.com...]
    </a>

    And thanks for all the comments so far, i've really found and fixed a lot of bugs!

    truly yours,
    matt

    greektomi




    msg:465516
     6:50 pm on Mar 21, 2002 (gmt 0)

    I guess I like the first one the best though I don't understand the purpose of reversing the first a in gig a blast.

    I think the logo should be relatively small.

    Two cents,

    Greektomi

    ceo




    msg:465517
     9:02 pm on Mar 21, 2002 (gmt 0)

    I hope to get the time to make one and send it across.
    I really didn't like any of those.

    BTW : I think your page is decent, not to say the posters page design was not good. It was pretty too, better actually, but I yet feel you should stick to the fast loading one.
    Cheers,
    RR

    klaus




    msg:465518
     11:28 pm on Mar 22, 2002 (gmt 0)

    Matt,
    you make a great job. Congratulation.

    One question to better understand the numbers:
    You say:
    the current hardware i have should hold somewhere between 200-250 million web page

    On the Gigablast about-page i read
    scales to 200 billion full pages

    Is my assumption right, 200 billion is the range what the software can do? To reach it, you need "a little bit" more of hardware, right?

    cheers klaus

    JayC




    msg:465519
     1:14 am on Mar 23, 2002 (gmt 0)

    re: logo designs, I really think the "lightning bolt for an L" technique is seriously overused (maybe you should try a lightning bold with a swoosh behind it!). And every time I see that in a gigablast logo I think of Jolt cola. And I've never even tasted Jolt.

    Maybe it's for that reason that the only design on or linked from that page that I like is the second one; the red lettering with the gray and black in the background.

    Craig_F




    msg:465520
     1:30 am on Mar 23, 2002 (gmt 0)

    Of those submitted so far I actually like the Sticky Sauce version. It feels google-like, but little cooler...dress it up by switching the black for your favorite color. Keep it simple...

    DGBrown




    msg:465521
     7:57 am on Mar 23, 2002 (gmt 0)

    I liked the second one. It's got that trendy feel while looking clean and suffisticated.

    The first one is cool too, except the backwards "a" looks like another "b" that got blown up. If you left the "a" going the right direction, but tilted and lowered it slightly, it would probably look a lot better.

    The lighting bolt ones look kinda cheezy, like they might appear on the box of a store brand knock-off cereal. (that was the first thing to come to mind when I saw them)

    I hope that was helpful :)

    - D.G.

    FreeBee




    msg:465522
     12:53 pm on Mar 23, 2002 (gmt 0)

    Matt - good job!

    mattdwells




    msg:465523
     12:30 am on Mar 24, 2002 (gmt 0)

    thanks for the comments, guys.

    and, yes, gigablast does scale to 200 billion pages (200,000,000,000).
    and, yes, i would need more hardware.
    my current setup only goes to about 200-250 million, so i'd need 1,000 machines times what i have, which is actually very doable.

    matt

    click watcher




    msg:465524
     12:40 am on Mar 24, 2002 (gmt 0)

    hi matt great work

    re logos...

    stickysauce is best because its slickest/quickest looking

    the lightening is a no no for me, nice work but doesn't suit a searchengine.

    the neon layout is good, but green = kiss of death i think (despite dmoz)
    most shades of green are not appealing, plus the lightening bolt doesn't quite work.

    BikeMan




    msg:465525
     9:11 am on Mar 24, 2002 (gmt 0)

    Add URL shows that it is temporarily unavailable.

    Any ideas when its going to be up and running again?

    BikeMan




    msg:465526
     9:14 am on Mar 24, 2002 (gmt 0)

    Ooopppsss ... goofed on my first post.

    Add URL is back up and running.

    SmallTime




    msg:465527
     12:33 pm on Mar 24, 2002 (gmt 0)

    2,295,520 pages. quick, seems to follow links well from the site I submitted, interesting.

    pyst




    msg:465528
     10:38 pm on Mar 24, 2002 (gmt 0)

    Gigablast looks good apart from;
    a lousy looking logo graphic
    an awful background colour

    I don't mean to be nasty - just critical of something wrong that may seem like nothing but I think makes a difference.

    Yep, it is only MY opinion. Take it or leave it. I think I still like the engine and the simplicity, just like googles, I give a thumbs up to.

    Why is it so necessary for spiders to go past the index page. I can't think of many sites that need to be spidered so 'deeply' - all they do is clutter up engines with a multitude of pages making it more difficult to find the others. I clicked on one of the recent searches and was presented with an entire page of links to different pages for ONE site - very UNimpressive.

    Why don't spiders take notice of metatags - it really peeves me to make an effort to describe my sites and have it all ignored and something quite irrelevant (or less meaningful) placed in the description area.

    Is there such a thing as a human edited search engine as opposed to a human edited *laugh* directory like dmoz used to be.

    Maybe this board needs a discussion on what a search engine should be. Maybe someone will read it and build a new engine that people will actually enjoy using 100%.

    JayC




    msg:465529
     10:55 pm on Mar 24, 2002 (gmt 0)

    pyst, you may have missed some of the earlier discussion in this thread and on the previous pages.

    >>I clicked on one of the recent searches and was presented with an entire page of links to different pages for ONE site

    Matt has mentioned that clustering is not yet being done, but it will be.

    Regarding the logo and colors, you'll also see that he's asked for and recieved several suggested designs; likely that will be changed.

    Remember that the site is still in the early development stages.

    >>Why don't spiders take notice of metatags - it really peeves me to make an effort to describe my sites

    Because while you may use them to accurately describe your sites, many other people have used them inaccurately to spam search engines.

    greektomi




    msg:465530
     11:07 pm on Mar 24, 2002 (gmt 0)

    I think you should add the advanced search option on all of the result pages instead of just on the main page.

    My reasoning is that often after you make your first search you realize that you need to refine it somewhat...

    2 more cents,
    Greektomi

    Tapolyai




    msg:465531
     11:22 pm on Mar 24, 2002 (gmt 0)

    Hmmm... Add a URL is down again...

    JBoss008




    msg:465532
     11:23 pm on Mar 24, 2002 (gmt 0)

    Matt,

    Very nice work and great job.

    Just want you to know this is the kind of ingenuity that is going to make the search engine world a whole new place to be.

    Jason

    pyst




    msg:465533
     3:52 am on Mar 25, 2002 (gmt 0)

    ADD URL is up again.

    I just did a search which made me wonder about the relevance of gigablast searches because I was given as many irrelevant results as valid ones. On closer inspection I noticed my search term seemed to be giving me results for another term.
    The terms are 'gay pics' and 'tin cans'.
    Any reason this might be happening? However apart for that the search results were ok.

    Something missing from the Gigablast results - the ability to pick a page 'deeper' in the pack like google has - a dozen or more pages you can pick from instead of just 'next' or 'previous'.

    raceboat




    msg:465534
     3:50 pm on Mar 25, 2002 (gmt 0)

    Just found this thread... So I thought I would give it a try.

    I added the URL of my site. Within seconds it apparently had spidered my site (200 pages or so...) as well as a few hundred other related sites I have listed in my directory, judging from the "date spidered".

    Pretty impressive!

    pgsbs




    msg:465535
     10:13 pm on Mar 25, 2002 (gmt 0)

    Strange... I added my url a couple of days ago after reading this thread. And now my site seems to have vanished from the SE. What's going on??

    brotherhood of LAN




    msg:465536
     10:25 pm on Mar 25, 2002 (gmt 0)

    pgsbs, he has been starting the database from scratch, he done it 3 times last I heard, probably many more times now

    pgsbs




    msg:465537
     8:44 am on Mar 26, 2002 (gmt 0)

    So I guess all the hope for a new good SE has vanished now?? Well Matt shouldn't give up on it. We need an extra, good working SE. Even with Google up and about.

    top5jamaica




    msg:465538
     6:52 pm on Mar 26, 2002 (gmt 0)

    hope this really becomes the next google. need some serious options out there along with google :-)

    the logos/designs, i really wasnt impressed by any of them. i'm sure some better submissions will come in soon

    Eathan




    msg:465539
     7:32 pm on Mar 26, 2002 (gmt 0)

    Man, this sure brings back memories!

    Ica




    msg:465540
     9:27 am on Mar 27, 2002 (gmt 0)

    pyst:
    I discovered that too. I think it's caused by an "or"-connection between the search terms. That means, the term "gay pics" will deliver all hits with "gay" and "pics" first, and then start delivering all pages with "gay" or "pics".

    BikeMan




    msg:465541
     9:56 pm on Mar 27, 2002 (gmt 0)

    Looks like the database is corrupted again. This time I found my site description leading to a porn site.

    Hmmmm if it was the other way around traffic would have gone up.

    Hope its corrected before the site owner has a stroke.

    nima




    msg:465542
     10:52 pm on Mar 27, 2002 (gmt 0)

    Hi I tried to submit my site but it says that it is temporarly down. Does anyone know when it will be back up?

    Jacquie




    msg:465543
     12:40 am on Mar 28, 2002 (gmt 0)

    What is going on with Gigablast?

    raceboat




    msg:465544
     1:44 pm on Mar 29, 2002 (gmt 0)

    The database appears to have been reset again...

    Brad




    msg:465545
     5:09 pm on Mar 29, 2002 (gmt 0)

    It will probably reset many more times. It is just in pre-beta testing. Nothing is permanent at this stage. I'm sure we will all need to check it and resubmit when it comes out of testing.

    This 65 message thread spans 3 pages: < < 65 ( 1 [2] 3 > >
    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Search Engines / Alternative Search Engines
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved