Below is the sample of the file.I have added *CUT* indicating new file
*******************************************************************
[EDIT: deleted several hundred lines of data dump including specifics]
**********************************************************************
I could code for first few fields since they appear in exact same line every time...For the rest, I dont get how to do pattern matching and writing into files simultaneously..Would greatly appreciate some help...
while(($intext=<FILE>))
{
$count++;
#print "$count\n";
#print $intext;
if ($count==17)
{
open(country, ">>Country.txt");
print country "$Text\t\t$intext\n";
close(country);
}
if ($count==19)
{
open(patent, ">>PatentNo.txt");
print patent "$Text\t\t$intext\n";
close(patent);
}
if ($count==21)
{
open(patentee, ">>Patentee.txt");
print patentee "$Text\t\t$intext\n";
close(patentee);
}
if ($count==23)
{
open(date, ">>Date.txt");
print date "$Text\t\t$intext\n";
close(date);
}
if($intext =~ /0 patents/)
{
print "No Patent Found \n You may want to delete the $Text.txt file that has been Generated\n";
}
}
close(FILE);
[edited by: phranque at 9:33 am (utc) on Feb. 8, 2009]
[edit reason] massive data dump [/edit]
[US Patent & Trademark Office, Patent Full Text and Image Database]
[Home] [Boolean Search] [Manual Search]
[Number Search] [Help]
[Bottom]
[View Shopping Cart] [Add to Shopping Cart]
[Image]
( 1 of 1 )
--------------------------------------------------
United States Patent
[EDIT: massive data dump with specifics]
The present disclosure includes that contained in
the appended claims as well as that of the
foregoing description. Although this invention has
been described in its preferred form with a certain
degree of particularity, it is understood that the
present disclosure of the preferred form has been
made only by way of example and that numerous
changes in the details of construction and the
combination and arrangement of parts may be
resorted to without departing from the spirit and
scope of the invention.
* * * * *
--------------------------------------------------
[Image]
[View Shopping Cart] [Add to Shopping Cart]
[Top]
[Home] [Boolean Search] [Manual Search]
[Number Search] [Help]
[edited by: phranque at 9:38 am (utc) on Feb. 8, 2009]
[edit reason] removed specifics [/edit]
$Text is its patent no.
I am not trying to strip it off its identifying information in anyways.Its just for some research to get some statistics.
Assignee: (this heading is in one file but not the other)
Foreign Patent Documents
Primary Examiner:
Assistant Examiner:
Attorney, Agent or Firm:
You did not list them in the headings:
Based on Headings like {Abstract, Inventors,current US Class, Current international class, Appl. No., Filed, Field of Search, US patent Documents,Claims, Description}.
And the Line with Sweatband is always exactly there, but it can be two lines sometimes...
And like you have mentioned Foreign Patent document,Primary Examiner, Assistant Examiner, Attorney Agent or firm are to be printed to separate files each..
I greatly appreciate your time and interest..Thanks
use strict;
use warnings;my %headings = (
17 => 'Country',
19 => 'PatentNo',
21 => 'Patentee',
23 => 'Date',
27 => 'Title',
);my @headings = (
'Abstract',
'Inventors',
'Current U.S. Class',
'Current International Class',
'Appl. No.',
'Filed',
'Field of Search',
'U.S. Patent Documents',
'Claims',
'Description',
'Primary Examiner',
'Assistant Examiner',
'Attorney, Agent or Firm',
);my $Text = '123,456,789';
my $isopen;open(FILE, '<', 'c:/perl_test/patent.txt') or die "$!";
OUTTERLOOP:
while (chomp(my $intext = <FILE>)){
next OUTTERLOOP if ($intext =~ /^[ -]*$/);
if ($. == 17 ¦¦ $. == 19 ¦¦ $. == 21 ¦¦ $. == 23 ¦¦ $. == 27) {
static_output($.,$intext);
next OUTTERLOOP;
}
INNERLOOP:
while (chomp(my $intext = <FILE>)){
foreach my $heading (@headings) {
if ($intext =~ /^$heading:?/) {
$isopen = 0 if (close OUT);
print ">>>>> $heading\n";
(my $filename = $heading) =~ tr/ /_/;
$isopen = open(OUT, ">>", "c:/perl_test/dump/$filename.txt") or die "$!";
print OUT "$Text\t\t$intext\n";
last;
next INNERLOOP;
}
}
print OUT "$intext\n" if $isopen;
}
}
print "++++finished++++\n";sub static_output {
my ($heading, $intext) = @_;
open(my $OUT, ">>", "c:/perl_test/dump/$headings{$heading}.txt") or die "$!";
print $OUT "$Text\t\t$intext\n";
if ($heading == 27) {
chomp(my $next_line = <FILE>);
if ($next_line =~ /\S/) {
print $OUT "$Text\t\t$next_line\n";
}
}
return(0);
}
change the paths to files before trying. Make sure to try on some test files and note any problems. I will check back after getting some sleep.
***** You need to change the pipes ¦¦ in the code. For some odd reason this forum changes them to double-pipes. This forum also does not format code well making it hard to read.
[edited by: phranque at 10:23 am (utc) on Feb. 8, 2009]
[edit reason] disabled graphic smileys ;) [/edit]
use strict;
use warnings;# The fixed headings
my %headings = (
17 => 'Country',
19 => 'PatentNo',
21 => 'Patentee',
23 => 'Date',
27 => 'Title',
);# The non-fixed headings
my @headings = (
'Assignee',
'Abstract',
'Inventors',
'Current U.S. Class',
'Current International Class',
'Appl. No.',
'Filed',
'Field of Search',
'U.S. Patent Documents',
'Claims',
'Description',
'Primary Examiner',
'Assistant Examiner',
'Attorney, Agent or Firm',
);# Just for testing the script
my $Text = '123,456,789';# A binary flag to determine if a file is opened or closed
my $isopen;# Open the input file
open(FILE, '<', 'c:/perl_test/patent.txt') or die "$!";####################################################
# OUTTERLOOP gets the sections of the file
# (%headings) that are always on the same line.
####################################################OUTTERLOOP:
while (my $intext = <FILE>){
chomp $intext;
next OUTTERLOOP if ($intext =~ /^[ -]*$/);# skip blank lines and lines with only dashes
if ($. == 17 ¦¦ $. == 19 ¦¦ $. == 21 ¦¦ $. == 23 ¦¦ $. == 27) {
static_output($.,$intext);
}
next OUTTERLOOP if ($. < 28);
################################################3
# INNERLOOP gets the sections (@headings)
# that might occur on different lines of
# the file and maybe of varying numbers of lines.
################################################
INNERLOOP:
while (my $intext = <FILE>){
chomp $intext;
foreach my $heading (@headings) {
if ($intext =~ /^$heading:?/) {
$isopen = 0 if (close OUT);
# Uncomment next line for debugging
#print ">>>>> $heading\n";
(my $filename = $heading) =~ tr/ /_/;
$isopen = open(OUT, ">>", "c:/perl_test/dump/$filename.txt") or die "$!";
print OUT "$Text\t\t";
last;
next INNERLOOP;
}
}
print OUT "$intext\n" if $isopen;
}
}
print "++++finished++++\n";#######################################
# sub static_output prints the fixed
# sections to a file
#######################################
sub static_output {
my ($heading, $intext) = @_;
# Uncomment next line for debugging
#print "++++++$headings{$heading}\n";
open(my $OUT, ">>", "c:/perl_test/dump/$headings{$heading}.txt") or die "$!";
print $OUT "$Text\t\t$intext\n";
if ($heading == 27) {
chomp(my $next_line = <FILE>);
if ($next_line =~ /\S/) {#appears to have more data
print $OUT "$Text\t\t$next_line\n";
}
}
return(0);
}
[edited by: phranque at 7:32 pm (utc) on Feb. 8, 2009]
[edit reason] disabled graphic smileys ;) [/edit]
This is the error thats popping up:
readline() on closed filehandle FILE at C:\Perl\bin\upgrade.pl line 63.
[edited by: phranque at 5:58 am (utc) on Feb. 9, 2009]
[edit reason] specifics [/edit]
sub static_output {
my ($heading, $intext) = @_;
# Uncomment next line for debugging
#print "++++++$headings{$heading}\n";
open(my $OUT, ">>", "c:/perl_test/dump/$headings{$heading}.txt") or die "$!";
print $OUT "$Text\t\t$intext\n";
if ($heading == 27) {
chomp(my $next_line = <FILE>);
if ($next_line =~ /\S/) {#appears to have more data
print $OUT "$Text\t\t$next_line\n";
}
}
close (FILE);
return 0;
}
this line:
close (FILE);
needs to be changed to:
close ($OUT);
Assignee.txt --> Patent No. Assignee
Country.txt --> Patent No. Country
Description.txt-->Patent No. the description...
And did you check out the full code?u likd it?
And apologies fr being so lame...am just a beginner..