Forum Moderators: open

Message Too Old, No Replies

PHP, url only, and no PR on internal pages

         

Spine

4:39 pm on Jan 20, 2005 (gmt 0)

10+ Year Member



I'm a bit stumped.

My brother in-law has me looking at his site to make suggestions about how he could improve.

I took a look with a site: command, and it seems that many of his pages are listed with the URL only. Even worse, none of the pages have PR, just his index page.
The site actually has decent incoming links, decent PR on the index page, and has been around for awhile.

I'm thinking his PHP has something to do with the problem, but I'm an HTML guy, so I don't know quite how or where the problem would cause this.

The problem pages have URLs like:

www.domain.com/directory/example.php?exampleid=1111&cat=11&page=2

Any ideas why these internal pages have no PR at all from his index page?

Spine

9:43 am on Jan 22, 2005 (gmt 0)

10+ Year Member



No ideas?

I'm not the webmaster for his site, and I mostly deal with HTML.

Without being able to look around 'inside', and with PHP I'm kinda guessing, but I'd like to help the guy.

Could it be that a part of the PHP script is in a directory where googlebot can't go? Some of his pages like domain.com/item.php have PR, but ALL of the pages with URLs like I listed above in a directory are URL only.

Something has to be stopping googlebot from getting to these pages. It's almost like one of the tricks that some folks use for bogus recip links, but it's affecting their own internal pages and causing an URL only to show up.

His site also has a lot of drop down menus for linking, but they seem to be in the proper <a href=" format.

JuniorOptimizer

12:40 pm on Jan 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Almost every site listing of a PHP has url only listings. This includes sites that specifically exclude such pages from being crawled.

The reason, as I know it, is that Google has followed a link from "somewhere" and knows about those pages. Even though they don't plan on including them, they "know" about them and list them in the site: command.

inbound

2:08 pm on Jan 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



PHP pages are crawled and added to the main index from my site even though the robots.txt instructs crawlers not to.

I can't imagine anyone finding these php url-only listings but they are in the index as I can find every one of mine by simply searching for filename.php

Maybe you have to exclude EVERY php url in the robots fole for google to take notice, will try a few and report back if anythinkg changes

JuniorOptimizer

2:12 pm on Jan 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please do.

disallow: /*?

That is supposed to disallow all dynamic pages, according to Google. I might try the same thing as you and just exclude the filenames.

Oliver Henniges

5:29 pm on Jan 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It should be quite clear that such sort of page-generation produces an enormous amount of idempotent URLs, is therefore quite close to spam, and we thus cannot expect googlebot to index all these pages.

I recall a competitor, who - a few years ago - designed a separate page for every single product, with very few unique text. He was thrown out of the index after about two weeks. So maybe your script has triggered a spam filter.

Nevertheless in many cases googlebot does index such pages, so here are my unqualified advices:

1) Look at the way how your subpages are designed: There should be considerable lexical and - if possible -structural differences between all URLs in question. Think of this forum for example.

2) It might also be helpful togenerate a static html-sitemap, on which you list all the URLs you want to get indexed. Don't forget to put a link to that sitemap on your mainpage.

3) Make sure the URLs your phpscript produces
a) are syntactically correct, and thus parseable
b) contain a unique title and at least the elementary metatags like Meta name = robots, content = index

Hope this helps

P.S.: Many dmoz-dupes and link-farms have adopted a structure along the lines you indicated. Google has an eye on such experiments; it is considered spam and won't be indexed. So make sure all your pages have unique and valuable content for your visitor, like most posts in here. Very valuable indeed...

BigDave

8:38 pm on Jan 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The last I heard, Google will only go out to two parameters after the?

Spine

10:26 pm on Jan 22, 2005 (gmt 0)

10+ Year Member



Ahhh! That makes some sense BigDave, I have a feeling you are right, thanks.

I'm assuming you mean each =1234 is a parameter.

Like I said before, the pages with simpler URLs like domain.com/example.php seem to be fine, with title, and descriptive snippet, and PR.

I'm just looking at this for my brother in-law who I don't see that often as a freebie favor. The SEO I know is for my own sites, rather than me being in the biz of doing this for other people.
Getting his web people to realize the issue and fix it would be his problem, but the more evidence I have the better.

Now to figure out what this kind of SEO advice is worth, and start dropping hints for x-mas :D

doc_z

11:56 am on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Even worse, none of the pages have PR, just his index page.

The PR which is shown for dynamic pages whith parameteres is incorrect.

The last I heard, Google will only go out to two parameters after the?

There was an improvement in the last years. I have seen URIs with at least five parameters that were crawled and indexed correctly.

A sitemap and additional deep links (external and internal) might solve the problem.

Oliver Henniges

5:10 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To my experience external links are not necessary to get indexed as long as your main page comprises less than 100 links.

All my subpages linked to from the main page inherited PR-1 no matter whether they had external links themselves, because they had sufficient unique content and the mainpage has a static a-href sequence towards them.

But maybe this is different if you run an index.php instead of an index.html-mainpage.

bloke in a box

11:27 am on Jan 24, 2005 (gmt 0)

10+ Year Member



I would strongly suggest having a look at mod_rewrite for apache to turn those ugly looking site urls into something more search engine spider friendly.

Do a search through these forums for more information on how to go about it.

johnnie

11:48 am on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You were probably not crawled. Check robots.txt, your robots-tag and make sure there's no 404/403 happening somehwere. Monitor your access logs for googlebot.

Spine

8:35 am on Jan 26, 2005 (gmt 0)

10+ Year Member



Update:

site:domain.com
shows a mess of url only links

site:domain.com common keyword
shows all the pages with their title, description and all that good stuff. The keyword in question appears on almost all their pages.

Either this is a PHP oddity that I'm not used to (as an html guy) or it's some other weirdness.

There is still no PR showing on most of the pages though, but they may have redesigned recently, I'm going to find out about that.

Also, they have a couple of pages of SERPs, then the 'repeat the search with the omitted results included.' message pops up. So I'm seeing about 4 pages out of 22 with the site command.

Could that be because they have identical meta keyword and description tags on all the pages, including ones where they aren't relevant?

Thanks for all the ideas so far people!

Airportibo

9:10 am on Jan 26, 2005 (gmt 0)

10+ Year Member



Hi,
first of all: php works fine for SEO. The only reason you could have problems is, if you write a php script that is so complex that it takes too long to load. But this wouldn't result in the kind of problems you described.

To cover the basics you should look into mod_rewrite like bloke-in-a-box proposed. Instead of having URLs like
www.domain.com/directory/example.php?exampleid=1111&cat=11&page=2 you can make them look like
www.domain.com/directory/example_111_c111_p2.html
If you cannot use rewrite rules, try not use common parameter names like "id" or "cat", since Google tries to identify sessionIDs in a URL and you don't want them to misinterpret the parameters.
Another reason for the phenomenon you are describing could be, that Google is considering the pages duplicate content and therefore doesn't index them completely. Compare the pages that are only indexed with their URL. If they are very simila, maybe you have to add more and different content to your website.
~Airportibo