Forum Moderators: coopster

Message Too Old, No Replies

PHP & google

how do spiders "see" my website?

         

dwidmer

11:40 am on Jul 27, 2003 (gmt 0)

10+ Year Member



Hi,

I've recently redone our website. I went from HTML to a page completely written in PHP with a mySQL database.

After a fews days I realized that google had already visited my page and updated the index. Unfortunately there are only 2 pages indexed. (start page and our disclaimer)

This has lead me to the question: how do spiders actually see PHP websites?

Is it possible that a spider doesn't follow links, because they contain php specific attributes? (like [exampledomain.com...]

The pages that were indexed are "plain" urls (www.exampledomain.com/index.php)

thx

Dan

wruk999

11:47 am on Jul 27, 2003 (gmt 0)

10+ Year Member



Hi Dan,

Google at the moment, indexes pages which contain one variable in the URL, and don't include session id's.

You would need to try and limit your url variables to 1:

ie: [example.com...]

This should, under normal circumstances be indexed fine.
Some searching around here should throw up quite a few threads about Google indexing dynamic pages.

wruk999.

[edited by: wruk999 at 11:52 am (utc) on July 27, 2003]

brotherhood of LAN

11:48 am on Jul 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>>Is it possible that a spider doesn't follow links, because they contain php specific attributes?

Yes, It has been mentioned in a few past threads here at the boards, maybe someone has the URL's to examples off-hand.....

the "id" variable seems to be frowned upon by google, I think Googleguy mentioned that its because it could be a sessionID which is also undesirable for the bot.

URL's with query strings do get spidered, though the consensus is that too many variables (or low pagerank) will result in these pages NOT getting spidered.

I'd check out some of the great .htaccess tutorials and threads in the Website Technologies and related forums; if your pages contain lots of juicy content then they deserve a static URL.

//
You are too fast wruk, even on a sunday ;)

[edited by: brotherhood_of_LAN at 11:58 am (utc) on July 27, 2003]

wruk999

11:51 am on Jul 27, 2003 (gmt 0)

10+ Year Member



Dan,
This thread from the end of April contains GoogleGuy commenting on spidering of dyanmic urls:

Google is now better at spidering dynamic sites.
[webmasterworld.com]

wruk999

dwidmer

11:59 am on Jul 27, 2003 (gmt 0)

10+ Year Member



Thanks,

I might have chosen the wrong keywords in my search, because I didn't find a single post about this topic.

;-)

brotherhood of LAN

2:41 pm on Jul 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Which ones are you after dwidmer, .htaccess or info about dynamic URL's

some .htaccess threads
A Close to perfect .htaccess ban list [webmasterworld.com]
An introduction to Redirecting URLs on an Apache Server [webmasterworld.com]

The Webmaster General and Website Technology forum libraries also have loads of similar info- a good place to start the hunt ;-)

MTKilpatrick

1:20 pm on Jul 29, 2003 (gmt 0)

10+ Year Member



I reckon I can confirm that Google doesn't like the "id" variable in dynamic URLs. I've got a HTTP_USER_AGENT watcher on my site so that I can see who/what is looking. I downloaded the file today and looked for the googlebot entries. It went through some of the index.php?score=HGLL type entries but did not seem to look at the main pages accessed from my menu, of the form index.php?id=1. I've just changed my site to use index.php?page=1 instead of id!

Michael

mat

1:23 pm on Jul 29, 2003 (gmt 0)

10+ Year Member



For a Google-eye view of a page, do a search for Poodle Predictor - honest.

daisho

5:27 pm on Jul 29, 2003 (gmt 0)

10+ Year Member



Might I suggest jumping right into mod_rewrite. This way the URLs look somewhat static and indexable but you still run PHP and have everything dynamic.