Forum Moderators: open

Message Too Old, No Replies

how google spiders a php file

         

scorpion

7:34 am on Nov 11, 2002 (gmt 0)

10+ Year Member



Suppose a domain name has an index file (index.php) whose purpose is to dynamically generate an entire HTML page (header tag, body tag and all). Will google crawl and index this "page" properly? Namely will it see "thru" and crawl the end-page? But how can it get an accurate listing if the page is dynamically changing? Even so, does google have a preference for an html file or a php file? Can it even "see" any difference?

percentages

7:48 am on Nov 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the PHP file requires a parameter to generate one of many thousands of HTML pages how is Google supposed to know that that parameter might be?

Google seems to crawl PHP files as well as any other type of file, but it can't do it if you are relying on a parameter being passed;)

Google is smart but it doesn't possess ESP.....not yet anyway!

Nick_W

8:31 am on Nov 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



PHP is Server-Side so Google will see whatever the output is...

Nick

rincey

11:00 am on Nov 11, 2002 (gmt 0)

10+ Year Member



Since this is my first posting, hi to all and have mercy with a newbie :)

Now to the reply:
I guess you assure that your index.php puts itself in a defined state by setting some default parameters if no parameters are passed?

While watching some of my domains which use .php-extension and passed parameters I got the impression that files with few (one) parameters like index.php?id=1 are spidered without problems while sites with many params like page.php?id=234&ref=23&prodid=4567 don't likely appear in Google.

In general I tend to use (if possible) Apaches's mod_write module to mask php files and parameters into a .html-filename. So out of "page.php?id=123&client=27" comes "page_123_27.html" or something like that.

There are also some rumors about other se's existing besides google and some of them seem to prefer non dynamic-filenames :)

Ad

ukgimp

11:08 am on Nov 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google and fast are cool with dynamic content providing you dont have load of URL parameters. A lot of people do us the rewrite but I find no problems with one unique parameter to call the correct record. Experieces will vary though.

Think about your navigation to the real records. One page with the ability to access thousands of record is not that logical. Split the resources up. Do a search for Themes, up at the top of this page to see what I mean.

Cheers

kcartlidge

2:27 pm on Nov 17, 2002 (gmt 0)

10+ Year Member



My assumption (based on other site's appearance) is that if the dynamic PHP file returns static info then it can be spidered. If, however, it returns dynamic info then the spidering will be defective if performed at all as the spider has no way of knowing all potential CGI parameters. The same with POSTed entries.

The conclusion? If you are using an index.php purely for dynamic links to other content pages then that should not be a problem as the output from your web server will be static. If you are using it to pull 'random' stuff from a database for instance, then only one occurence per pass will probably be registered by the spider, depending upon the values in use by the PHP script at that time.

andreasfriedrich

3:35 pm on Nov 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld [webmasterworld.com] kcartlidge.

You wrote:

PHP file returns static info then it can be spidered. If, however, it returns dynamic info then the spidering will be defective

Could you please explain the distiction between static/dynamic info that gets returned by the php script.

If a php script returns a Content-Type header 'text/html' then the body of the HTTP response should contain valid HTML. As far as I know there in only HTML and no such things as static HTML or dynamic HTML. The spider does not care at all how the serverīs response is created as long as it does get created and sent to the spider within the spiderīs timeout limit.

Andreas

kcartlidge

5:52 pm on Nov 17, 2002 (gmt 0)

10+ Year Member



Sorry, I should have clarified.

I am not intending to use 'static' or 'dynamic' as technical terms to identify differing types of PHP script [I refer to static/dynamic INFO, not HTML]. Rather, I used it as a shorthand (lazy) way of referring to two types of coding methodology.

By 'static' I mean that every time the page is accessed it will return the same data (to a very high degree).
By 'dynamic' I mean that the page contents vary according to the context in which the page is accessed and so, with the spider routine not necessarily knowing what values the page will be called with, there is no guarantee that what the spider sees is what an actual user would see.

Apologies for the lax terminology.