Forum Moderators: DixonJones

Message Too Old, No Replies

logs not showing full url of a cgi request

         

Frank_Rizzo

11:14 pm on Sep 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a searchable database, thus, the log files show lines similar to

[mysite.com...]

but recently my log files are full of hundreds of:

[mysite.com...]
[mysite.com...]
[mysite.com...]

etc.

This goes on for about half hour with hundreds of lines like that. I'm suspecting that the person is trying to run a script to get all the records in a batch.

But if he was doing this wouldnt I see:

[mysite.com...]
[mysite.com...]
.
.
.
[mysite.com...]

Can the client like cloak the environment variables?

I also notice he's using .NET - the UA shows .NET CLR 1.0.3705. But thats probably nothing significant.

andreasfriedrich

9:59 pm on Sep 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can the client like cloak the environment variables?

Not without hacking into your computer.

What does your script do when there are no meaningful parameter passed?

If you use CGI.pm

$value = $q->param('name');
will return the value of the parameter name regardless whether it was POSTed or GETed to the script. If you want to make sure to get only parameters set in the URL you need to use
$value = $q->url_param('name');

Have a look at the section on Mixing POST and URL Parameters in the CGI.pm documentation [stein.cshl.org].

c3oc3o

11:35 pm on Sep 17, 2002 (gmt 0)

10+ Year Member



Well, if the data is POSTed (<form method="POST">), it doesn't show up in the log files, correct?
Of course you can't hide the variables for the script itself (in which case they would be useless anyways), but the log file wouldn't see them.

andreasfriedrich

11:56 pm on Sep 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, if the data is POSTed (<form method="POST">), it doesn't show up in the log files, correct?

Yes.

As I understand the original post the problem was that a certain script was requested repeatedly apparently to get all the records. But that wouldn´t work as pointed out because there are no parameters recorded in the logs.

My point was to show a way how sb could have achieved the suspected aim of retrieving the records without having the parameter show up in the logs.

I do not understand why one would want to hide the variables for the script itself or what you are getting at with that sentence.

I neither said nor implied that the parameter would be hidden from the script. In fact I pointed out how such a "hidden" (in the logs to be perfectly clear) paramter could still be used.

Frank_Rizzo

8:27 am on Sep 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm using the passing variables with href method:

---------------
choose a sportsperson

<a href="/cgi-bin/searchdata.pl?record=1?sport=football">Joe Montana</a>
<a href="/cgi-bin/searchdata.pl?record=2"&sport=basketball>Magic Johnson</a>
----------------

The searchdata.pl script

$record = param('record');
$sport = param('sport');

In the logs I usually see the parameters. But not with the guy who must be trawling the data. I'm wondering if he has a script on his own site or offline which is not displaying the variables.

andreasfriedrich

12:29 pm on Sep 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From the code snippet you posted it becomes quite clear that POSTing the parameter values will work as well as GETing them.

If you have perl installed you can try it by running the following script:

#!/usr/bin/perl 
use LWP::UserAgent;
$x=new LWP::UserAgent;
$y=$x->post
(
"http://www.domain.tld/cgi-bin/searchdata.pl",
{
record=>1,
sport=>'football',
},
);
print $y->content;

Frank_Rizzo

11:15 am on Sep 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I tried that script and it does pull results from the database without showing in the logs.

I then changed param to url_param and that seems to have done the trick.

The above script now generates a 500 error, but proper users of the site work out. So this is good.

It will be interesting to see what happens on monday when the guy tries to crawl the complete database again.

andreasfriedrich

12:50 pm on Sep 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The above script now generates a 500 error

It might be a better idea to fail gracefully. Test whether url_param returns a valid value. If not print a short error message and exit your script.

Andreas

Frank_Rizzo

2:05 pm on Sep 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thats good stuff.

many thanks.

Frank_Rizzo

9:43 am on Nov 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey guys,

I'm still getting some hassle with this.

Recently my databases were crawled again. The log files are full of

[mysite.com...]

but no 500 errors are generated.

I have no way of knowing what response the guy is getting. i.e. is he receiving valid data with his requests or are the measures I introduced stopping him. How can I tell?

I have implimented the url_param() function and set up a checksum type function so that the crawler needs to know the recno and a key value which is generated from the recno.

A while ago I also tried this:

$mydomain = "http://www.mysite.com";
$runningfrom = $ENV{'HTTP_REFERER'};
unless ($runningfrom =~ m#$mydomain#) {
print "<p> Sorry, you are trying to run this script from an unauthorised location</p>";
exit;
}
}

This worked, but I was getting some genuine users who were getting the error message above. I do not know why some passed and some failed the script above. I assumed it was due to some users using Nortons or something to block the referer.

I was going to get around to converting all the scripts to PHP which should sort this out. Guess I'm going to be busy this weekend.