Forum Moderators: coopster

Message Too Old, No Replies

Postnuke CMS problem with se spiders

         

karraskal

1:29 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



Hi.

I have been running a postnuke (PHP) based website for some time, and I am having some problems with se spiders.

I'll try to explain the problem as best as i could.

First i tried with "Search Engine Spider Simulator" from searchengineworld.com, pointing it to my home page. The results are fine, the spider gets the page text and
a bunch of links.

The problems come when i try to spider a link that goes to a postnuke module (for example, news module).

The url looks like this:
[mysite.com...]

In this case the spider cannot find any text or links. I've already tried similar url's in other postnuke sites, and it worked fine, but i can't find the error in my site.

Currently i am using an Apache web server 1.3.23 with PHP 4.1.1 running under Windows 2000 Server.

The Postnuke version is 0.7.2.1 phoenix.

The log lines in the Apache logs are as follows for the spider:
[26/Jun/2003:14:48:32 +0200] "GET /modules.php?op=modload&name=News&file=index HTTP/1.1" 200 5

This is a standar browser request of the same page (also in apache log file):
"GET /modules.php?op=modload&name=News&file=index HTTP/1.1" 200 26298

I am getting a warning in the PHP log file:
PHP Warning: Failed opening 'modules//.php' for inclusion (include_path='.;e:\php\includes') in e:\inetpub\postnuke\html\modules.php on line 14

And this is line 14 in the modules.php file:
include 'modules/' . pnVarPrepForOS($name) . '/' . pnVarPrepForOS($file) . '.php';

Clearly, in the spider request the variables $name and $file are empty, but this file works perfectly with a browser and in other postnuke sites. Why?

It's my postnuke version? PHP configuration?

Any help would be appreciatted. Am i a nerd or something? xD

Thanks in advance and sorry for my english.

vincevincevince

3:22 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



pnVarPrepForOS($name)
am *guessing* this is browser detection and changes? in that case, the spider, or google spider, etc, may well not be detected. i suggest fishing for the value of $name in both cases, and check what pnVarPrepForOS does :)

karraskal

3:52 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



i was thinking about that, but looking closely i've realized that there are changes in the apache web log:

"GET /modules.php?op=modload&name=News&file=index HTTP/1.1" 200 26298 <-- with browser
"GET /modules.php?op=modload&amp;name=News&amp;file=index HTTP/1.1" 200 5 <-- with spider

i think the problem is in the "&" signs that compose the url.
the spider does not send the "&", instead, it uses "&amp;"

it could be resolved using a function to parse the url before processing in the php file.

does anybody founded this kind of problem before?

jatar_k

3:55 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld karraskal,

firstly Am i a nerd or something?
Well if you are programming php it is probably geek but thanks for a great laugh first thing in the morning. ;)

Since it looks like pnVarPrepForOS is a function I would look at how it is assigning and returning these values. It looks like, as vincevincevince said, some browser detection. It may not have a default value to pass or some such other type issue.

Timotheos

4:28 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



firstly Am i a nerd or something?
Well if you are programming php it is probably geek but thanks for a great laugh first thing in the morning. ;)

Maybe this is a subject for 'foo' but what's the difference between a "nerd" and a "geek"?

karraskal

4:30 pm on Jun 26, 2003 (gmt 0)

10+ Year Member



well, first of all, i am a total newby dealing with spiders, so maybe, mi first question should have been:

would a spider follow a link with variables in the url?

jatar_k

4:32 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



try this one

Variables in the url [webmasterworld.com]