Forum Moderators: coopster & phranque

Message Too Old, No Replies

Proxy server to make WebmasterWorld standards compliant

         

andreasfriedrich

8:27 pm on Jan 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is a proxy server that will do some on the fly rewriting of the html code that passes through. What to rewrite is specified in an external file using Perl regular expressions.

#!/usr/bin/perl -Tw 
#
use strict;
$ENV{PATH} = join ":", qw(/bin /usr/bin);
$¦++;
#
my $VERSION_ID = q$Id: pfproxy,v 1.01 2000/xx/xx xx:xx:xx merlyn Exp $;
my $VERSION = (qw$Revision: 1.01 $ )[-1];
#
## Copyright (c) 1996, 1998 by Randal L. Schwartz
## This program is free software; you can redistribute it
## and/or modify it under the same terms as Perl itself.
#
## Fixed some errors and modified by af
#
sub prefix {
my $now = localtime;
#
join "", map { "[$now] [${$}] $_\n" } split /\n/, join "", @_;
}
$SIG{__WARN__} = sub { warn prefix @_ };
$SIG{__DIE__} = sub { die prefix @_ };
&setup_signals();
#
### logging flags
my $LOG_PROC = 0; # begin/end of processes
my $LOG_TRAN = 0; # begin/end of each transaction
#
my $m;
my $file = '/var/www/html/root/WebmasterWorld_convert';
&create_pattern();
#
### configuration
my $HOST = '192.168.0.10';
my $PORT = 11111;
my $SLAVE_COUNT = 8; # how many slaves to fork
my $MAX_PER_SLAVE = 20; # how many transactions per slave
#
### main
warn("running version ", $VERSION);
#
&main();
exit 0;
#
### subs
sub main { # return void
use HTTP::Daemon;
my %kids;
#
my $master = HTTP::Daemon->new(LocalPort => $PORT, LocalAddr => $HOST)
or die "Cannot create master: $!";
warn("master is ", $master->url);
## fork the right number of children
for (1..$SLAVE_COUNT) {
$kids{&fork_a_slave($master)} = "slave";
}
{ # forever:
my $pid = wait;
my $was = delete ($kids{$pid}) ¦¦ "?unknown?";
warn("child $pid ($was) terminated status $?") if $LOG_PROC;
if ($was eq "slave") { # oops, lost a slave
sleep 1; # don't replace it right away (avoid thrash)
$kids{&fork_a_slave($master)} = "slave";
}
} continue { redo }; # semicolon for cperl-mode
}
#
sub setup_signals { # return void
#
setpgrp; # I *am* the leader.
$SIG{HUP} = $SIG{INT} = $SIG{TERM} = sub {
my $sig = shift;
$SIG{$sig} = 'IGNORE';
kill $sig, 0; # death to all-comers
die "killed by $sig";
};
}
#
sub fork_a_slave { # return int (pid)
my $master = shift; # HTTP::Daemon
#
my $pid;
defined ($pid = fork) or die "Cannot fork: $!";
&child_does($master) unless $pid;
$pid;
}
#
sub child_does { # return void
my $master = shift; # HTTP::Daemon
#
my $did = 0; # processed count
#
warn("child started") if $LOG_PROC;
{
flock($master, 2); # LOCK_EX
warn("child has lock") if $LOG_TRAN;
my $slave = $master->accept or die "accept: $!";
warn("child releasing lock") if $LOG_TRAN;
flock($master, 8); # LOCK_UN
my @start_times = (times, time);
$slave->autoflush(1);
warn("connect from ", $slave->peerhost) if $LOG_TRAN;
&handle_one_connection($slave); # closes $slave at right time
if ($LOG_TRAN) {
my @finish_times = (times, time);
for (@finish_times) {
$_ -= shift @start_times; # crude, but effective
}
warn(sprintf "times: %.2f %.2f %.2f %.2f %d\n", @finish_times);
}
#
} continue { redo if ++$did < $MAX_PER_SLAVE };
warn("child terminating") if $LOG_PROC;
exit 0;
}
#
sub handle_one_connection { # return void
use HTTP::Request;
my $handle = shift; # HTTP::Daemon::ClientConn
#
my $request = $handle->get_request;
defined($request) or die "bad request"; # XXX
#
my $response = &fetch_request($request);
#
$handle->send_response($response);
close $handle;
}
#
sub fetch_request { # return HTTP::Response
use HTTP::Response;
my $request = shift; # HTTP::Request
#
my $url = $request->uri;
#
if ($request->uri->scheme!~ /^(https?¦gopher¦ftp)$/) {
my $res = HTTP::Response->new(403, "Forbidden");
$res->content("bad scheme: @{[$request->uri->scheme]}\n");
$res;
} elsif (not $url->authority) {
my $res = HTTP::Response->new(403, "Forbidden");
$res->content("relative URL not permitted\n");
$res;
} else {
## validated request, get it!
warn("processing url is $url") if $LOG_TRAN;
&fetch_validated_request($request);
}
}
#
BEGIN { # local static block
my $agent; # LWP::UserAgent
#
sub fetch_validated_request { # return HTTP::Response
my $request = shift; # HTTP::Request
#
$agent ¦¦= do {
use LWP::UserAgent;
my $agent = LWP::UserAgent->new;
$agent->agent("Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)");
$agent->env_proxy;
$agent;
};
#
my $response = $agent->simple_request($request);
#
if ($response->is_success and
$response->content_type =~ /text\/(plain¦html)/ and
not ($response->content_encoding ¦¦ "") =~ /\S/ and
($request->header("accept-encoding") ¦¦ "") =~ /gzip/) {
require Compress::Zlib;
my $content = $response->content;
my $new_content = Compress::Zlib::memGzip($content);
if (defined $new_content) {
$response->content($new_content);
$response->content_length(length $new_content);
$response->content_encoding("gzip");
warn("gzipping content from ".
(length $content)." to ".
(length $new_content)) if $LOG_TRAN;
}
}
#
if ($response->content_type =~ m!text/html!) {
local $_ = $response->content;
&$m;
$response->content($_);
}
#
$response;
}
}
#
sub create_pattern {
open('IN', "<$file") or die "Couldn´t open $file: $!";
$/ = "\n--\n";
warn "building pattern";
my @rules;
while (<IN>) {
chomp;
my ($pat, $rep, $mod) = split(/\n/);
push @rules, join('', 's{', $pat, '}{', $rep, '}', $mod, ';');
}
my $rules = join '', @rules;
$rules =~ /(.*)/;
$m = eval "sub {$1}";
die $@ if $@;
}

Here´s the conversion file:

<font\s*[^>]+>¦</font\s*> 
__EMPTY_LINE__
g
--
(<tr bgcolor="#000077")
$1 class='headerrec'
gi
--
(<tr bgcolor="#000060")
$1 class='header'
gi
--
(Welcome\s*<b>.*?recent posts</a>)
<div class="tm">$1</div>
--
<!--\s*post\s*\d+\s*-->(.*?)<!--\s*/post\s*-->
<div class="post">$1</div>
g
--
(<td bgcolor="#f2f2f2")
$1 class='cite'
gi
--
<p><!-- jpgr -->
__EMPTY_LINE__
i

The format of this file is as follows:

pattern 
replacement
[modifier]
[--
...]

Have fun messing around with WebmasterWorld´s UI.

Andreas

pendanticist

11:41 pm on Jan 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ooookay andreas, for the sake of the village idiots (like me), uhm, what did you just say? <he asked with bewildered look on face> :o

Pendanticist.

andreasfriedrich

7:45 am on Jan 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The idea behind this post was to provide a method to remove all the font tags and markup the html source of WebmasterWorld properly so that I could apply a nice style sheet.

To achieve that I used that proxy server which sits between your browser and the web. It fetches the pages you request for you and sends them along to your browser. Before doing that it will change the html source according to the regular expressions given in the conversion file.

Among other things the example conversion file will put the top menu into a div element with a class attribute of tm. Then you can apply a style sheet and style this div element any way you want.

Andreas

Brett_Tabke

1:33 am on Feb 9, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month


Valid HTML 4.01!

fyi, WebmasterWorld is 100% w3c compliant.

I will not specify character sets because of backward incompatabilities with browsers.

Nor will I do style sheets at this time. To do it right would take 2 solid months of work and would cost us 10-20% of our daily members. You can argue about it all you want, but what I believe to be true based on my experience is that css does not work in an environment like this at this time. It is always better to stay on the falling edge than the leading edge when it comes to standards.

I have provided members all the tools they can use in editing there skin - it is not perfect I know - but it is pretty good.

andreasfriedrich

11:29 am on Feb 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I guess it really is ;). But Transitional... ;)

Brett, this posting and others to the same effect were in no way meant to critisize you. WebmasterWorld is an excellent community with an excellent forum software. You should be able to tell by the fact that you see me here quite often. ;) Before I came here I was quite sceptic about the whole internet community concept. WebmasterWorld convinced me that something like an online community was indeed possible.

I do appreciate your reasons for not using CSS [w3.org] and for going with Transitional HTML4 [w3.org]. Just understand that there are some people who like to tweek things to their liking. Since I don´t own a car I have to play around with WebmasterWorld ;). And I thought doing that using a proxy server that will rewrite the code to use CSS [w3.org] was quite cool and a good way to test a concept and some code I´ll be using for a project.

Andreas