Forum Moderators: phranque

Message Too Old, No Replies

Corpus for text analysis

any good sources

         

brotherhood of LAN

12:44 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



im playing around with various algo's and methods of compression. I'm "read" up to the eyeballs ;)

Now that some of the theory is out the way, anyone know of a good source of text that's available ....say...at least 5 meg of pure text or more? Something that has oodles of plain text (preferably all english). Don't say "the web" :) I'd rather not waste peoples' bandwidth, and downloading their HTML is surplus to the cause.

Any pointers -> appreciated :)

digitalghost

12:49 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ayup, when I need text to play with I go here [ibiblio.org] and download a few text files.

brotherhood of LAN

12:57 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



sound, that's the db's, will do great for my db!

Guess I better give them a donation aswell....cent a megabyte sounds fair :)

digitalghost

1:03 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>cent a megabyte sounds fair

Or you could transcribe a book for them. :) I downloaded all the e-texts. Nothing like grabbing a classic for free. Haven't paid much attention to the music effort but it's nice that they're trying.

brotherhood of LAN

1:10 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>transcribe

hmm, sounds like work ;) Was thinking about buying a hand held scanner and taking a visit to the library....sounds like work though.

I'd rather huff encode them and send it back to them in a ZIP :)

When you say you downloaded them all.....do you mean them all?

digitalghost

1:17 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>do you mean them all?

Yep, every one. Popped them into a local DB and now I have my own mini e-library. At the main site they have them neatly zipped up in alphabetical order.

brotherhood of LAN

1:22 am on Mar 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>zipped up

ah, got it, under "This list is generated automatically and is primarily for search engines"

i have a script that hopefully will save me trawling it - great :)

cheers digital, i dont think there's a better place for what im after.