Forum Moderators: phranque

Message Too Old, No Replies

A Site SE That Works?

Can anyone make a recommendation?

         

markd

7:16 pm on Feb 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello all

I am tearing my hair out at the moment trying to find a good and effective in site search engine which will list HTML, .ASP, PDF's and Word docs - can anyone help me with a recommendation via sticky if need be.

The solution would need to work on a Windows server and I would like the SE to be hosted within the web space, ie. it won't be remotely hosted. Ideally, I would like the solution to spider the site at regular intervals to index new pages/additions - am I asking to much here? It would need to be able to index .asp dynamic pages generated by a MsSQL database.

I only have about $600 - 1000 to spend.

Thanks for your help

Mark

jatar_k

7:38 pm on Feb 5, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



maybe try looking through sourceforge, I am not overly familiar with solutions for windows

sourceforge.net/softwaremap/trove_list.php?form_cat=93

martinibuster

7:45 pm on Feb 5, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Atomz is free, doesn't show ads, and I'm 99% sure it can do what you want. Free version is for under 500 page sites. Unfortunately, the next step up is an enterprise solution at 15k dollars.

markd

10:15 pm on Feb 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Martin

I did consider Atomz but I think that it is a remotely hosted solution.

I am really looking for something which can reside on the web space/server I will be using.

martinibuster

10:52 pm on Feb 5, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hmm... maybe you know something that I don't know.

What's the downside of a remotely hosted search?

I implemented it for a client, and using includes it calls out for the information (search results), then populates it in a web page template that matches the rest of the web site. The url never changes to anything outside of my client's domain. The url always shows www.myclientsdomain.com.

If there's a downside to this I'm interested in knowing it.
Thanks!

:)

[edited by: martinibuster at 10:57 pm (utc) on Feb. 5, 2003]

linkshark

10:52 pm on Feb 5, 2003 (gmt 0)

10+ Year Member



Try [xav.com...] it runs on Unix, Linux, Windows NT, Windows 2000, Win95/98/ME. It can index HTML, asp, pdf, and cgi files. I don't know about Word.docs.

markd

2:02 pm on Feb 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks all

Martin - the main downside for this particular project is that I am bound by a very tight 'Data protection' clause in the clients contract with me.

Because the web site will actually contain contact from third party companies (it's a bit like a 'business club' project) I am not allowed under this contract to pass this data to a 'third party' - such as a provider of a remotely hosted search. Obviously, a visitor to the site could simply copy the data direct from the web page, but if they do this I am not actually knowingly passing the data to someone else!

I even had to get clearance from the client regarding the web hosting of the project, simply because this data would reside on a web site hosted by a company other than my own.

It's a bit of a restrictive arrangement, but hence the reason why I cannot really have data collected from the site and held elsewhere.

Mark

BlobFisk

2:18 pm on Feb 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Web Wiz Guide have a nice asp search engine [webwizguide.info]. There is a free version that places the Web Wiz Guide logo on the results page, but I think that you can buy a version that removes this.

sun818

5:44 am on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, I can't provide any positive input. The asp search engine BlobFisk mentions is a brute force search of all file types and directories you allow. It works, but there is no indexing of any kind. It does not scale well.

markd

9:25 am on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks all...

Linkshark - how successful have you been indexing .asp with the script you mention?

I had a quick look at the web site and couldn't see if this could be achieved.

Thanks

BlobFisk

11:33 am on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



sun818 is completely correct, the product I mentioned isn't really what you were looking for - more a monster that chews its way through every page. Having reread your post this morning (with coffee!) I see that.

Apologies!

4eyes

11:50 am on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try ksearch - its free and has the ability to search pdfs.

I am using it on a 1600 page site and it works fine, although the index gets a little large.

linkshark

2:19 am on Feb 8, 2003 (gmt 0)

10+ Year Member



markd,

I use it for many different sites and have many asp pages indexed as well as many .asp affiliate links on another site. No problem.

Tops out at 10k documents according to documentation. It has a built in spider you could set to run cron to re-index site (or sites) at regular intervals.

Brett_Tabke

11:49 am on Feb 8, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



A good place to start is Avi's site at [searchtools.com...]

chiyo

12:55 pm on Feb 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



marinibuster.. the downsides of remotely hosted solutions is that you have less control - you are at the mercy of their server (downtimes, slowdowns etc), presentation and relevancy options, and changes of policy (advertising, usage limits etc.)

Of course agree there are upsides too.

aspdaddy

1:18 pm on Feb 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Linkshark,

This one looked really promisng but has the following probs - needs cgi/perl, many windows hosts just wont install it.

Also I cant get it to work on any reseller accounts, error could not verify that web address [mydomain.com...] maps to ftp://www.mydomain.com

I guess because it a shared/virtual server?

I have looked for a good search a while, for windows it seems to be either fp-extensions (crap search), site server (expensive) , write your own using filesystem (brute force search) or remote hosted.

linkshark

4:07 pm on Feb 8, 2003 (gmt 0)

10+ Year Member



>This one looked really promisng but has the following >probs - needs cgi/perl, many windows hosts just wont >
>install it.

Yeah, I am running Unix, so I don't know about windowz.

>Also I cant get it to work on any reseller accounts, error >could not verify that web address [mydomain.com...] >maps to ftp://www.mydomain.com

I use it on several child accounts (reseller). I assume you are talking about the automatic install. I had no probs w/ auto install on shared server.

Try the support forums over there, the author and his followers are totally cool, and usually very helpful w/ all kinds of requests. They can probably help w/ windows application.

jimbeetle

11:15 pm on Feb 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



After reading this thread I decided to check my remotely-hosted freebie site search. Been using it for a couple of years and it's been pretty good, good integration, not bad reports.

Did a couple of searches and found the service had finally (for them) gotten their "search sponsors" listings act together. What had in the past been three very general blurbs before my results that I didn't mind are now three very highly-targeted (based on search keywords) results. Wonder what that's been costing me by way of traffic and sales. (If you use a free service, when was the last time you searched your site?)

Immediately checked out linkshark's recommendation. Couldn't do an automatic install on a site that shares an IP (where I was just going to test it), but installed it automatically on a site with a dedicated IP in about 30 seconds.

Very impressive. Lot's of configuration options. Only glitch was my robots.txt keeping it out (a few general mozilla compatible entries) but simple to change the user agent (watch out for Calamity Jane galloping by).

Just a few more tweaks here and there and it's all set.

Thanks linkshark,

Jim

linkshark

12:17 am on Feb 9, 2003 (gmt 0)

10+ Year Member



Glad it worked for you. I think you can actually set the indexer to ignore robots.txt files when crawling. As well as create custom user agent for log spamming ;-)

Also supports it's own robots noindex meta tag you can add to pages you do NOT want indexed when crawling.

* I think Brett uses a modified version of that script for this site.

daisho

6:06 pm on Feb 10, 2003 (gmt 0)

10+ Year Member



I use [mnogosearch.org...] SE for my site. It works very well and it's very customizable.

I am running on Linux but they also claim to have a windows version.

daisho