Splitting a txt file into smaller text files.

I am trying to Split text files of the below format into 17 new text files using PERL.
Based on Headings like {Abstract, Inventors,current US Class, Current international class, Appl. No., Filed, Field of Search, US patent Documents,Claims, Description}.

Below is the sample of the file.I have added *CUT* indicating new file
*******************************************************************
[EDIT: deleted several hundred lines of data dump including specifics]
**********************************************************************

I could code for first few fields since they appear in exact same line every time...For the rest, I dont get how to do pattern matching and writing into files simultaneously..Would greatly appreciate some help...

while(($intext=<FILE>))
{
$count++;
#print "$count\n";
#print $intext;

if ($count==17)
{
open(country, ">>Country.txt");
print country "$Text\t\t$intext\n";
close(country);
}
if ($count==19)
{
open(patent, ">>PatentNo.txt");
print patent "$Text\t\t$intext\n";
close(patent);
}
if ($count==21)
{
open(patentee, ">>Patentee.txt");
print patentee "$Text\t\t$intext\n";
close(patentee);
}
if ($count==23)
{
open(date, ">>Date.txt");
print date "$Text\t\t$intext\n";
close(date);
}
if($intext =~ /0 patents/)
{
print "No Patent Found \n You may want to delete the $Text.txt file that has been Generated\n";
}
}
close(FILE);

[edited by: phranque at 9:33 am (utc) on Feb. 8, 2009]
[edit reason] massive data dump [/edit]

use strict; use warnings;my %headings = ( 17 => 'Country', 19 => 'PatentNo', 21 => 'Patentee', 23 => 'Date', 27 => 'Title', ); my @headings = ( 'Abstract', 'Inventors', 'Current U.S. Class', 'Current International Class', 'Appl. No.', 'Filed', 'Field of Search', 'U.S. Patent Documents', 'Claims', 'Description', 'Primary Examiner', 'Assistant Examiner', 'Attorney, Agent or Firm', ); my $Text = '123,456,789'; my $isopen; open(FILE, '<', 'c:/perl_test/patent.txt') or die "$!"; OUTTERLOOP: while (chomp(my $intext = <FILE>)){ next OUTTERLOOP if ($intext =~ /^[ -]*$/); if ($. == 17 �� $. == 19 �� $. == 21 �� $. == 23 �� $. == 27) { static_output($.,$intext); next OUTTERLOOP; } INNERLOOP: while (chomp(my $intext = <FILE>)){ foreach my $heading (@headings) { if ($intext =~ /^$heading:?/) { $isopen = 0 if (close OUT); print ">>>>> $heading\n"; (my $filename = $heading) =~ tr/ /_/; $isopen = open(OUT, ">>", "c:/perl_test/dump/$filename.txt") or die "$!"; print OUT "$Text\t\t$intext\n"; last; next INNERLOOP; } } print OUT "$intext\n" if $isopen; } } print "++++finished++++\n";

sub static_output { my ($heading, $intext) = @_; open(my $OUT, ">>", "c:/perl_test/dump/$headings{$heading}.txt") or die "$!"; print $OUT "$Text\t\t$intext\n"; if ($heading == 27) { chomp(my $next_line = <FILE>); if ($next_line =~ /\S/) { print $OUT "$Text\t\t$next_line\n"; } } return(0); }

use strict;
use warnings;

# The fixed headings
my %headings = (
17 => 'Country',
19 => 'PatentNo',
21 => 'Patentee',
23 => 'Date',
27 => 'Title',
);

# The non-fixed headings
my @headings = (
'Assignee',
'Abstract',
'Inventors',
'Current U.S. Class',
'Current International Class',
'Appl. No.',
'Filed',
'Field of Search',
'U.S. Patent Documents',
'Claims',
'Description',
'Primary Examiner',
'Assistant Examiner',
'Attorney, Agent or Firm',
);

# Just for testing the script
my $Text = '123,456,789';

# A binary flag to determine if a file is opened or closed
my $isopen;

# Open the input file
open(FILE, '<', 'c:/perl_test/patent.txt') or die "$!";

####################################################
# OUTTERLOOP gets the sections of the file
# (%headings) that are always on the same line.
####################################################

OUTTERLOOP:
while (my $intext = <FILE>){
chomp $intext;
next OUTTERLOOP if ($intext =~ /^[ -]*$/);# skip blank lines and lines with only dashes
if ($. == 17 �� $. == 19 �� $. == 21 �� $. == 23 �� $. == 27) {
static_output($.,$intext);
}
next OUTTERLOOP if ($. < 28);
################################################3
# INNERLOOP gets the sections (@headings)
# that might occur on different lines of
# the file and maybe of varying numbers of lines.
################################################
INNERLOOP:
while (my $intext = <FILE>){
chomp $intext;
foreach my $heading (@headings) {
if ($intext =~ /^$heading:?/) {
$isopen = 0 if (close OUT);
# Uncomment next line for debugging
#print ">>>>> $heading\n";
(my $filename = $heading) =~ tr/ /_/;
$isopen = open(OUT, ">>", "c:/perl_test/dump/$filename.txt") or die "$!";
print OUT "$Text\t\t";
last;
next INNERLOOP;
}
}
print OUT "$intext\n" if $isopen;
}
}
print "++++finished++++\n";

#######################################
# sub static_output prints the fixed
# sections to a file
#######################################
sub static_output {
my ($heading, $intext) = @_;
# Uncomment next line for debugging
#print "++++++$headings{$heading}\n";
open(my $OUT, ">>", "c:/perl_test/dump/$headings{$heading}.txt") or die "$!";
print $OUT "$Text\t\t$intext\n";
if ($heading == 27) {
chomp(my $next_line = <FILE>);
if ($next_line =~ /\S/) {#appears to have more data
print $OUT "$Text\t\t$next_line\n";
}
}
return(0);
}

sub static_output {
my ($heading, $intext) = @_;
# Uncomment next line for debugging
#print "++++++$headings{$heading}\n";
open(my $OUT, ">>", "c:/perl_test/dump/$headings{$heading}.txt") or die "$!";
print $OUT "$Text\t\t$intext\n";
if ($heading == 27) {
chomp(my $next_line = <FILE>);
if ($next_line =~ /\S/) {#appears to have more data
print $OUT "$Text\t\t$next_line\n";
}
}
close (FILE);
return 0;
}

Splitting a txt file into smaller text files.

brlinga

krugs

brlinga

krugs

brlinga

callivert

brlinga

krugs

krugs

krugs

brlinga

krugs

krugs

krugs

brlinga

brlinga

krugs

brlinga

krugs

brlinga

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week