Forum Moderators: open
I developped a templating system for building websites that always uses the same php file but behaves differently depending on 4 variables.
The contents of these pages is very different from one page to another. I built it this way in order to make the code very easy to maintain and use.
I wonder though if the pages will be indexed by Google since the file is always the same. The URL might look like this : main.php?ID=2&l=e&dv=3&t=tprof so you can see the url is not very long and never will be. But it is always main.php, which is the script that loads the templates and modules.
Does anyone has any hints on what I could do to make sure my pages are included in google's repertory?
Thank you
Mart
AFAIK Google doesn't even index pages with more than two URL parms. (if one parameter is named e.g. "id" or "sessionid" this one alone is sufficient to keep you out of Srach engines)
And even if they do, your pages would be pretty useless, because (as a rule of thumb) every "plain link" to an inner page costs you about 1 PR-Unit (internal link from a PR 5 page most often results in a PR 4 page if there not too many links on the page). Every URL-Parameter means another PR-Unit off, so a link like ....mypage.php?p1=1&p3=3 from a PR5 page will result in a PR2 page.
Search webmasterworld for "rewriting" - that's what you need if you want to use just one parameter-driven template. This helps you keep your site structure and please the search engines as well.
Google is becoming better in spidering dynamic pages. However, as already mentioned, 4 parameters might be too much. If you're using Apache mod_rewrite might be a solution, i.e. using URLs like '/main/2/e/3/tprof.html'.
Every URL-Parameter means another PR-Unit off, so a link like ....mypage.php?p1=1&p3=3 from a PR5 page will result in a PR2 page.
PR is just based on the linking structure and nothing else. Therefore, it doesn't matter if the pages are static, dynamic with one parameter or dynamic with 4 parameters. However, there is a problem with the PR displayed in the toolbar for dynamic pages with an '?'.
Write your script to create a title page based on the variables. Not sure how you do this in PHP in ASP I just do something like this.
<%
SCRIPT_NAME = Request.ServerVariables("SCRIPT_NAME")
If Script_name = "main.asp" then
PageTitle = "Widgets "& Request.QueryString("catid") & " @ Bob Jones Widgets"
Keyword_Info = Request.QueryString("cat") & " "
End If
%>
<html>
<head>
<title><%= PageTitle %></title>
<meta name="description" content="<%= Keyword_Info %> Widget and Widget Supplies ">
In my case CAT is the Category name. I even take this futher by pulling the item description out of the database when I go to a detail page.
type site:webmasterworld.com GoogleGuy dynamic
I guess I should change the name of the variable ID.
Maybe I could even combine all the vars in a single string to make Google think there is only one var, and then extract the values with a reg exp. But I think it is stupid to have to do it... even to change the names of the pages with mod_rewrite. I mean, the pages are all the same, and there are as many of them even if they do not have vars in their addresses. I just give myself trouble to make google think that my website is'nt dynamic.
The vars in my templating sys are as follows :
l is for language ( I live in a bilingual country!)
dv is for division (some businesses have many divisions with different visuals and menus and all. This var keeps everything well organised in a single database)
t is for the table in which the record is
ID is for the id of the record.
So basically I have a website that is very well organised with 3 levels of submenus. The pages outputted can be completely different one from the other : all the contents change, the title changes, the meta tags, etc. Pages from the same table but with different ID can have a different template too, main.php displays each record in it's own way if necessary.
By the way, all these addresses that I want indexed are present as is in the website's menu <a> tags and are not parsed through javascript.
Ahhhh.... man. I understand everything you guys tell me, and I thank you for your replies. I just hate to have to conform to the requirements of Google :-) They give me extra work...
beleive me y're better off using Mod Rewrite from the start - I lost a lot of time (over a year) of near total absense from google because i didn't rewrite urls at first
Now that i use mod rewrite, all of the site has been indexed
However. They only carry one or two parameters. Most often one.
index.php?ID=443
This has worked perfectly for us. However. When I due to a glitch in my publishing system happened to loose a lot of ID numbers and ended up with IDs ranging over 6 digits, the new pages never made it anywhere in the serps. After backtracking back to 4 digit IDs, the serps improved greatly, and the same pages now rank within the top five in serps.
A very important step here is to make sure to use descriptive, title tags. Don't make the mistake of showing the same title on lots of pages. Especially not if using ID= in urls.
And try to loose at least two of those params while you're at it.
[edited by: Nikke at 11:41 pm (utc) on April 14, 2004]
Rich
I agree, the ID var is a problem. I had a site with 2K+ pages, most of which using one id var (ie. display.php?id=3). Never indexed. Changed the var on a whim one day and bam - hundreds of pages began being indexed in google...
Be extra careful if using a template page using the QS to call in your page content. These types of vars are easily identifiable as SQL variables and can be very easily hacked if you're not careful. Be certain to escape your vars in every, every way. (Sure you've thought of this already - sorry, can't help my security background kicking in!) Good luck!
As for session ID's I have a form of solution. In my templating system, all (and I mean all!) the links are created with a single class to which I pass parameters (basically variables to pass in the GET, but also the protocol, etc.).
I know that all the main pages of my templating system are depending on 4 variables. Some pages depend on more vars, but they are mainly used to display information that is not part of the main content of the website, for example they could be parameters passed to a search function, so I don't need these pages to be indexed.
Also, I don't need session-dependent pages to be indexed, since they depend on user input (for example user profile). The navigation of the website is never dependent on session. So I don't need these pages to be indexed.
Now, my class that builds the links checks for parameters and builds links accordingly : if a session is open or if there are more than the 4 base vars (or if don't want the adresses to be transformed at all through a config variable) the link will be standard php with get var.
On the other hand, if there is no session or if there are only 4 vars, the link will be rewritten to soemthing else (tmain23e2.htm for example). Since this link-creating thing occurs in only one place in all the program, if I have to change my address pattern, I do it only once. It is easier to keep the links 'synchronized' with the htaccess file then.
Mart