Forum Moderators: coopster
Whether I reset the field to NULL or NOT NULL (in phpMyAdmin) makes no difference - the empty cell score is displayed either way. Seems odd to me, so I am trying to get a better understanding of when to specify NULL and NOT NULL and how to construct a query in such a way as to exclude cells with no data. Technically, if I want to exclude results where a cell contains no data, what combination should I be using?
Currently my query is
$query = "SELECT user_browser AS browser, COUNT(*) AS stats FROM userdata GROUP BY browser ASC";// Run the query.
$result = mysql_query($query);// Display
while ($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
echo '<p>' . $row['browser'] . ' ' . $row['stats'] . '</p>';
}
Excuse me if this starts with the basics..
NULL and NOT NULL when used in column definitions simply stipulate whether or not a particular column must have data. If you stipulate NULL, it means it is acceptable leave that column with no data. Thus, if you have six columns and you add a row and stipulate values for only four columns and don't have value for two other columns that are defined to allow null values, there will not be an error. If column 5 is defined to disallow null values and you only give data for the first four columns, mysql will throw an error and refuse to insert or update the row.
A column defined as NOT NULL must take a value, though that value can be one that might be interpreted as "no data" such as an empty string or a zero integer, but this is nevertheless a defined value and different from NULL. If I say
WHERE col_a = 0
this will match a column whose value is zero, but not one which is null. Any expression with a null evaluates to false except, assuming that col_a is null, the expressions
WHERE col_a IS NULL
and
WHERE col_a <=> NULL
So first you need to decide whether that column can allow null values or not. For example, if I were doing an employee database, I would not allow "lastname" to be null since presumably all of my employees will have last names unless I'm a record label who has signed Prince, Moby and Madonna. On the other hand, I would let "mobile_phone" take a null value since many people do not have mobile phones.
Now, if you want to filter out rows where a certain column has NULL or zero values, you need WHERE clauses for both conditions:
WHERE col_a IS NOT NULL AND col_a!= 0
Tom
[edited by: ergophobe at 2:49 am (utc) on Dec. 13, 2004]
WHERE user_browser LIKE '%_%'
but I understand that using LIKE is slower than IS or =.
Also I've now noticed (having just altered some columns from NOT NULL to NULL) that where there is no data from the input query the database records NULL in text form. I suppose this may be adding a few unnecessary bytes per record and that perhaps, for the sake of efficiency, I should be using NOT NULL but in such a way that if there is no data from the input query, the record contains an actual zero value which I can then filter out in my output query.
The WHERE clause I gave at the end of the last post should do it (or some variation).
WHERE col_a IS NOT NULL AND col_a!= '' AND col_a!= 0
would filter out any rows where col_a was null or set to an empty string or a zero.
As for the DB have a text value of 'NULL', actually it doesn't. Whatever client you are using is representing the null value with the text string 'NULL' but it is not stored this way in the DB.
I'm not sure how mysql differentiates null from zero or the empty string Sybase uses a zero-length column for null values and I would suppose MySQL uses a similar method. In the case of Sybase, this means that any column that allows NULLs is effecitively a variable-length column, which could affect performance (in MySQL some operations on a table will be much faster if it has no variable-length columns. Once you have one variable-length column, there's no reall performance hit for a second and third). I don't know whether MySQL's method might lead to a similar performance hit or not.
At least in MySQL, declaring a column NOT NULL is more efficient than NULL because MySQL doesn't have to spend the time to check a column's values during a query to see whether or not they are NULL. It also saves a bit per row in storage. Plus, declaring as NOT NULL can make your queries easier to construct because you don't have to account for cases of NULL. The only times when I would declare a column as NULL would be when I needed to be able to select data based on the validity of certain information. For example, if I were to make a table for data about birds, one column might be to record flight speed. In the cases of birds that cannot fly, however, there is no sort of valid data that could be included--surely not zero--and even an empty string would be inappropriate-- especially if the database were a work in progress. Perhaps you know that eagles can fly, but you don't yet know how fast, so you set the value as an empty string. For penguins and ostriches, however, you know that there can never be any valid data, so you SET flight_speed=NULL. Then, if you wanted to make queries like, SELECT "birds that can't fly," having the column declared as NULL would be useful.
Another example is some sort of genealogical table that is used to record births and deaths. Of course, everyone in the table would have been born, but many might still be alive, and to be forced to enter any sort of value for their deaths would make it slightly more complex when querying the data to find out who is still alive. And, like the birds example, what do you insert for dead people whose dates you don't know, and how do you differentiate them from people who are still alive?
My general rule is to never declare a column as NULL unless I have a very good reason for doing so. Those reasons depend largely on how you intend to use the data. Ergophobe's example of declaring a cellphone column as NULL would be important if, say, you ran a volunteer organization and had some projects that required volunteers to have cellphones--and you wanted to query which people had that qualification. Those who had empty strings, you might follow up on and ask, "Hey, we don't have a record of your cellphone. Do you have one?" If they confirm that they don't, then you might set their record as NULL. On the other hand, if you were simply making a directory of members' phone numbers, and cellphones were some extra data that you would allow for if available (but not select by) there would be no need to make the column NULL.
I guess that all means that I'm just sort of a NOT NULL kind of a guy;)
When I first changed the three fields to NULL, phpMyAdmin automatically set the default value to NULL and I could see "NULL" appearing in the database when there was no data going in for a particular cell. I'm not sure whether this means a text string "NULL" is being recorded or whether NULL in this case is just a MySQL setting. So I set the default to the integer 0 thinking this would save on performance.
For some reason, none of the following work:
"SELECT engine FROM userdata WHERE engine IS NOT NULL AND engine!= 0""SELECT engine FROM userdata WHERE engine > 0"
(Both return zero rows even when there is data)"SELECT engine FROM userdata WHERE engine IS NOT NULL OR engine!= 0"
(Returns all rows, even where there is no recorded data for the cell in question)
So the output query I'm now using is
"SELECT engine FROM userdata WHERE engine > '0'"
... using single speechmarks around the integer 0. This does the job of returning results for all rows that contain data in NULL fields and no results for rows where the cell is blank, though I don't understand why cells with no data don't show 0 when the default value is now set to 0.
I'm still not sure if I'm doing this the correct and most efficient way, or if I've properly understood when to specify a field as NULL. In my example the PHP script is constructed to gather various pieces of user data each time a page is visited (such as the page, user IP, host, agent, referrer, query string, and some but not all search engines the user came from) and the input query assumes data is present in some cases where it might not be, depending on how successfully the script gathers the data. So, some cells remain empty. I suppose I could reconstruct the script with additional conditionals that enter a default value (eg NULL) where there is no data, and to set all database fields as NOT NULL - I think this is what Salsa is suggesting. In this way, the output query could test for the presence of NULL before returning a result. It's all a question of which would be the most efficient, taking everything into consideration.
Thanks for the answers.
Patrick
If column 5 is defined to disallow null values and you only give data for the first four columns, mysql will throw an error and refuse to insert or update the row.
I've only recently changed some of my columns from NOT NULL to NULL, and my database shows that for several months before I did that, rows were still inserted with no data in some cells and some data in other cells. By no data, I mean there is nothing in those cells - they're just blank alongside the ones that do contain data.
For example one of my fields holds the referrer ($refer). There isn't a referrer in many cases, and yet the query:
"INSERT INTO userdata (etc, refer, etc) VALUES ('$etc', '$refer', '$etc')";
successfully entered data each time but leaving the referrer cell empty.
A book I refer to quite often accords with the advice, saying that if no value is specified for a column when adding a record, the default will be used if set (I didn't set one) or an error will occur if the column is defined as NOT NULL. However, this is not what seems to have happened.
I don't doubt the accuracy of the advice - I just don't understand what I'm seeing. Have I misunderstood?
In my (simplified) example I have two fields. One, named user, is set to NOT NULL and the other, named referrer, is set to NULL with a default value of NULL. Looking at the database in phpMyAdmin I see lots of rows where there is a visible value for user but not for referrer. Originally I couldn't see why a query like:
"SELECT COUNT(referrer) FROM mytable WHERE referrer IS NOT NULL"
... returned all the rows including the ones with no visible value for referrer. I assumed no visible value meant there was no data, but I've tested various queries and come to the conclusion that when my input query was:
"INSERT INTO mytable (user, referrer) VALUES ('$user', '$referrer')"
and where there was no referrer data to enter, the query nonetheless entered an empty string into the row. An error would only have resulted if the query had been:
"INSERT INTO mytable (user) VALUES ('$user')"
I hope I've now got this right. This query:
"SELECT * FROM mytable WHERE referrer!= ''"
does what I want it to do... to return results only where there is a visible value in the database, but it seems an empty string is actually a value, so IS NOT NULL wouldn't do - none of the rows is actually NULL. This is the only explanation that accounts for what I've seen.
Autoincrementing id fields from a post form... [webmasterworld.com]