PHP vs HTML question

Forum Moderators: open

Message Too Old, No Replies

PHP vs HTML question

individual php pages not being picked up

Toganaut

2:27 pm on Jan 3, 2003 (gmt 0)

Recently switched our main site to php from html. The main menu pages are geing picked up as they used to be when they were html, however the individual product pages have not been. Is there a reason for this. There has been no change in the navigation of the site and Google has spidered the rest of it.

Thanks

korkus2000

2:33 pm on Jan 3, 2003 (gmt 0)

Welcome to WebmasterWorld Toganaut,

Are your php pages dynamic? Do they contain query strings?

[edited by: korkus2000 at 2:33 pm (utc) on Jan. 3, 2003]

lorax

2:33 pm on Jan 3, 2003 (gmt 0)

Hello Toganaut and Welcome to WebmasterWorld!

If your cart has long URLs - especially with lots of vars - then some bots will choke/refuse to follow them.

You might want to read this:
[webmasterworld.com...]

Toganaut

4:27 pm on Jan 3, 2003 (gmt 0)

Thanks, yes they are dynamic and they contain query strings.

Any suggestions?

jaeden

4:30 pm on Jan 3, 2003 (gmt 0)

Research this site and others about using the .htaccess file and mod_rewrite to change your URLs to .html pages

Nick_W

4:33 pm on Jan 3, 2003 (gmt 0)

It's relatively easy to fix it so a url looking like this:
site.com?prd=23&amt=1&usr=234

would end up like this:
site.com/23/1/234

I beleive there is an old article on it over at sitepoint.com if you try their sitesearch for "search engine friendly urls" or similar...

Welcome to WebmasterWorld!

Nick

eddier

4:48 pm on Jan 4, 2003 (gmt 0)

Hi :)

Let me see if I understand, because I've seen the same with google. It does spider my PHP pages from the static HTMLs however it doesn't follow any links from the dynamic pages. It does index the first dynamic page, but then it stops :(

So instead of having:

mysite.com/myprog.php?cat=235&startitem=8 ( for the next page )

You should have a link like:

mysite.com/myprog/235/8

And then it will be indexed. Because I do have a lot of page which still haven't been visited. The only thing I have to wonder about is how to pull it off from a programmers point of view...
I always thought it didn't recurse the dynamic page to avoid picking up double content..

Nick_W

4:51 pm on Jan 4, 2003 (gmt 0)

Follow the vague directions ;) in my earlier post, there is a good resource article for this. It's very easy to do and you can rest easy that google will like it.

Alternatively, use mod_rewrite (havn't a clue on that though!)

Nick

eddier

4:59 pm on Jan 4, 2003 (gmt 0)

Cool. I'll go look for it and give it a test on a page, because I still have a lot of content which is still uncrawled..'

It's pretty and very, very, simple. The only thing not mentioned was that I need to add a base href=mysite.com in the head tags because I got broken image links, but it works splendidly :) :) Let's see if it will be crawled... (Can't wait to check the logs tomorrow...)

Thanks a lot :)

Nick_W

9:31 am on Jan 5, 2003 (gmt 0)

Glad it worked out! You shouldn't need to ad the base href thing if you specify your images relative to root like:

/images/pic.gif

Then, wherever the page that calls them, they always know where the image is...

Nick

MHes

9:45 am on Jan 5, 2003 (gmt 0)

I don't use php etc, but remember reading somewhere here that Google only crawls deep into a dynamic site with long url strings if the pr of the index page is pr5+.

However they have been trying to crawl more dynamic pages recently and certainly the serps are showing more of these in the top 10.

In the meantime the above advice is definately the way forward and works.

eddier

10:57 am on Jan 5, 2003 (gmt 0)

Actually the PR of my index page is 6 :), so that shouldn't be a problem.

However there are hardly any calls to PHP programs from the index page; just static pages. Those pages do have links to PHP for further browsing. Maybe it doesn't advance, because those pages are usually PR3-PR5.

Anyway it doesn't matter anymore, because I do like this idea. It makes for easy to read URLs and that's important as well.

With regards to the base URL. I (still and always will) use Netscape Composer to create the pages because it's easy, works on all browsers (unlike some products...) and generates clean code which can easily processed in program files.

Googlebot and Scooter did do quite a lot of crawling last night, but didn't go to the PHPs yet, because I only implemented it on a couple of pages, so let's wait and see...