Forum Moderators: bakedjake
As well as that my question to you is, among spidering and all other methods, quality , not quantity sites matter.
Dmoz is a good example, i guess the more sites that are to do with making money the more likely sites which try to use many different methods to cheat the system.
aka change page after being indexed in a catergory, most likely this will happen with my index in the future, but like dmoz, those individuals can be banned.
I would prefer to take a long time to make 10,000 sites listed, rather then spider crap. I dont really care about the money, like alot of you ppl i want to make the web a better place hopefully.
I used to like the yahoo index because of its accuracy and results, even though the google model finds alot of relevant searchs , i believe a directory model will be infinitely better in accuracy and in the long run.
Ive lost count on how many times they change their algorithm, so while some go up, some go down.
Changing how your page looks for google,inkitomi, hotbot,msn, etc to spider it is really sad, because i know for a fact many people do it, thats what annoys me and thats why i believe directories, like dmoz, yahoo will remain popular in the future.(hopefully mine goes ok to)
Thats my two cents, anyone with advice on my own SE will be appreciated.
Spidering your directory sites can always be added at a later date after you feel comfortable with the quality of the sites you've listed. Until then it might be a good idea to be fairly generous (not overly so) with keyword phrases when writing site descriptions so users can easily find what they need when using the directory search function as opposed to drilling down through the categories.
You're going to have to make some bucks off this so be sure to give early thought to some sort of 'featured sites' or 'sponsored sites' listings and other revenue generating possibilities. Even if you don't incorporate it starting out be sure to make plans for it now.
Good luck. Keep us posted on how it goes.
Jim
Indexing hundreds of thousands of documents in hours, not days, is critical - because if you can't index a million docs in a day, it will never scale well.
Decide on scope - how big? Millions, or thousands? Billions?
Niche is good. Niche is nice.
I have a niche engine myself that does reasonably well, all things considered (and the project was offline for longer than it has been online!).
If you build something that is truly useful, and provide the right user experience, it doesn't matter if your site gets one visitor today or a hundred. Eventually, it will spread, and at some point you will start to realize, "Hey! My pet project is taking off!"
That is usually where people that want to make the world a better place, and make the web a better internet for surfers, figure out that it takes lots of time to publish quality.
So as jimbeetle said, keep in mind the revenue, even if the site isn't going to 'make millions' it will need to support itself and or justify your time in putting it together. :)
For xml backfill im thinking to use is dmoz or altavista?
Eventually who knows, someone would want my information to supplement theres.
Is backfill a good idea, or should have multiple like hotbot?
I also run a directory in my spare time and get some success - (only been up and running for a couple of months)
I would suggest that if you are taking a long while to build up the database then you will defintley need a backfill (People search on the strangest terms! - also I found running a directory/search services gives you a good sample of search terms and you can build your results/pages around these terms)
I currently use searchhippo as my backfill it is free and easy to use :)
A backfill is also a good idea. Why not use Google? That way you could give users what they would probably go and do anyway, and you get to keep them on site. You could probably use the free searchbox and direct them to a page with the query already put in.
If your directory sites are first, then people will be able to judge the quality of the site directly against what is considered the most accurate search engine. That way if you impress people, they will want to come back more and more :)
I would also say that you should encourage your early visitors to give as much feedback as possible. About the results, the UI, everything.
I have a directory I'm almost ready to put online for <keyword>. Can someone explain to me what a backfill is? I'm a newbie at directories. Is this somehow getting ads set up so the directory doesn't look empty to the first few subscribers?
Lori
[edited by: jeremy_goodrich at 1:44 am (utc) on June 3, 2003]
[edit reason] keep it generic - thanks! [/edit]
Or there's the API's [google.com] which you can do pretty much anything with, although you only get a certain quota of queries which isn't great.
I'm not sure of the terms and conditions for either. Hmm, maybe getting Google results is harder than I thought ;)
If you were thinking of revenue generation maybe you would be better of with a pay per click search? ALso search hippo mentioned above looked OK, although i've never used it.
Then maybe one day i can drop of there results.
For results i would want catergory listings relevant for the links provided , about for or 5 on the top, with then links provided.(Similar to Dmoz)
Then next to each link , the corresponding catergory it comes from, what you think?
That the best solution for result display or you think there is a better way?
As somebody above said, people will search for the strangest things: keyword searches are getting longer and many people will type in navigational searches. For these a spidered backfill works better.
Mark your backfill as such so people know.
I use Searchhippo because they are free and by and large decent enough.
Dmoz would also be free but it is a huge amount of data to download. If you wanted to pay Gigablast has an XML feed.
IMO I would stay away from the PPC backfills if you want quality search results. Also, some of these tiny PPC engines seem to think their feed is made of gold and only want to deal with high traffic players.
If you do not use backfill then at least have a "Continue your search on one other these engines..." (like Dmoz has) plugin at the bottom of your serps.
I advise against using Google for backfill. Last I heard they charge in the mid 6 figures to start and they do enforce against unauthorized use.