Database:Adding constraints to prevent garbage data
- 1 Meta
- 2 MariaDB difference warning!
- 3 Constraints
- 3.1 Introduction
- 3.2 Foreign key constraints
- 3.3 Using a check constraint for validating dates in SQLite3
- 3.4 Force integer values to INTEGER columns
- 3.5 Certain text formats using GLOB
- 3.6 Naming a constraint for better error messages
- 3.7 Not accepting NULL
- 3.8 Force a column to have unique values in the table
- 4 Videos
- 5 Links
- 6 Work in progress
Expand using link to the right to see the meta section with instructor's notes.
This topic deals with adding constraints to columns when creating a table. The constraints restrict the allowed values for the column, in order to prevent us from inserting junk data into the table.
Expect the main lecture on constraints to take 25-30 minutes. As usual, we recommend that you read the online text, see the video lecture and review the exercises before giving this lecture.
If you go for also the optional extra lecture, expect it to take around 10 minutes but prepare some tables as a proof of concept and show it live after the lecture.
Topics addressed or used in the online exercises
- FOREIGN KEY constraints
- CHECK constraints (including checks for well-formed dates)
- The functions STRFTIME and DATETIME
- Constraint failure error messages
- Using the IN operator for explicit enum-like constraints
- The optional extra lecture shows an alternative approach
- NOT NULL constraints
- UNIQUE constraints
- Globbing using the GLOB function
- Also basic globbing in bash, which uses the same syntax
- Check-your-progress questions with answers on a different page
- This lecture uses syntax and constructs that are specific to SQLite3 - Other DBMSes have a different syntax and often more types of constraints
- Explain that a key can be a foreign key without a constraint (as we have seen in the lecture about JOIN). The FOREIGN KEY constraint is only a tool to ensure consistency so that we can guarantee that a foreign key refers to an existing row in the foreign table
- Globbing can be done either using the GLOB function or the GLOB infix operator
- The order of the glob expression and the string to be tested are different between the function and operator - this is confusing to students (and the authors!)
- A colum with a PRIMARY KEY modifier gets the UNIQUE constraint implicitly
- This does not mean that PRIMARY KEY is a constraint - it is a modifier which includes the UNIQUE constraint
- There are many use cases for UNIQUE constraints for columns which are not primary keys
- Using a CHECK constraint for testing date well-formedness is rather compicated in SQLite compared to other DBMSes that are more strongly typed. SQLite accepts basically anything as the value for a column of type DATETIME. Make sure you read up on how to do the check using the strftime and datetime functions, so that you can explain this to the students (even if we think they will learn it hands-on in the online exercises)
- One way to create a kind of ENUM-like construct in SQLite is shown in the optional extra lecture. Prepare some tables which demonstrate this in action and show it live, if you opt to give also the extra lecture.
MariaDB difference warning!
The syntax in this chapter differs a lot from the syntax used in MariaDB. The syntax in this chapter works for SQLite3. If you want to see how you can do constraints in MariaDB, please see This page
Unless otherwise specified, this chapter uses syntax and functionality of SQLite3. This chapter starts with some theory and examples. Then we link the lecture (as videos and PDF). After this chapter, move on to the next with exercises.
This chapter deals with adding constraints to columns when creating a table. The constraints restricts the allowed values for the column, in order to prevent us from inserting junk data into the table. See the videos, read the PDF, check out the external links for further readin and head on to the exercises in the next chapter.
Foreign key constraints
The purpose of foreign key constraints is to ensure referential integrity, which is to say that a foreign key in a table can only contain values that really exist in the referred table.
Consider the following tables:
CREATE TABLE artist(a_id integer primary key, a_name text); CREATE TABLE track(t_id integer primary key, t_name text, a_id integer);
The idea is to have the track table refer to the artist table by connecting rows on the
a_id column, so that a track in the
track table has a name and refers to an artist of the
artist table using the
For instance, we can have the following state of the database:
sqlite> SELECT t_name, a_name FROM track NATURAL JOIN artist; t_name a_name -------------------- -------------------- space oddity bowie cortez the killer neil young sqlite> SELECT * FROM artist; a_id a_name -------------------- -------------------- 1 bowie 2 neil young sqlite> SELECT * FROM track; t_id t_name a_id -------------------- -------------------- ---------- 1 space oddity 1 2 cortez the killer 2 sqlite>
As you can see, the track table uses an
a_id for representing the track's artist, and that column obviously refers to the
artist table, so that a track with
a_id of 2, refers to "neil young" which is the row in
artist with an
a_id of 2.
But what happens if we INSERT a new row in
track with an
a_id of 3? There is no artist with such an
a_id in the
artist table. We will end up with an inconsistency. Referential integrity is lost.
The solution is to use a foreign key constraint which ensures that only such
a_id values are allowed in the
artist (referring) table that are present in the
artist. In other words, we don't want to allow a track with
a_id of 3, because there is no artist in the
artist table that has
This is the syntax for the create table syntax for the
tracktable with a foreign key constraint:
CREATE TABLE track(t_id INTEGER PRIMARY KEY, t_name TEXT, a_id REFERENCES ARTIST);
However, in SQLite3, foreign key constraints are turned off (not enforced) by default, so we also need to turn them on (for every connection or session to the database):
sqlite> PRAGMA FOREIGN_KEYS=ON;
Now, if we try to insert a track with an invalid
a_id, we will get an error (which is what we want rather than creating an inconsistency):
sqlite> INSERT INTO track(t_name, a_id) VALUES('monkey stole my face', 3); Error: FOREIGN KEY constraint failed
Foreign key constraints - advanced
In SQLite (and most other database mangement systems), there are ways to defer the enforcing of foreign key constraints.
Let's say we really want to add a track with name 'monkey stole my face' (which is a brilliant song, by the way) with an
a_idof 3, before we add the artist with
a_id3. It is not a consistency as long as we remember to add the artist with
a_id3, we are just doing it in the "wrong" order.
We can use transactions in combination with another pragma for this. If we start a new transaction, set foreign keys to deferred and insert the track before the artist, this can be done. Deferring the foreign key constraints means to wait with the validation until we commit the transaction. Here's what it would look like:
sqlite> BEGIN TRANSACTION; sqlite> PRAGMA DEFER_FOREIGN_KEYS=ON; sqlite> INSERT INTO track(t_name, a_id) VALUES('monkey stole my face', 3); sqlite> INSERT INTO artist(a_name) VALUES('mighty boosh'); sqlite> COMMIT;
If you want foreign keys to work like the above (being deferrable by default), they you could have created the
tracktable as folows:
CREATE TABLE track(t_id INTEGER PRIMARY KEY, t_name TEXT, a_id REFERENCES artist DEFERRABLE INITIALLY DEFERRED);
Using deferrable foreign keys only works in a transaction. This is because doing two inserts as an atomic action requires a transaction.
Note: Deferring is an advance concept we include here for completeness. It's quite advanced for an introductory database course. Also, this section is specific to SQLite3. Look up the documentation if you are using a DBMS other than SQLite, in order to learn the rules and syntax for that DBMS.
Using a check constraint for validating dates in SQLite3
SQLite3 is very liberal when it comes to types. In fact, you can even put text (strings) into a column of type INTEGER (if it is not a primary key). So what about dates in columns of type DATETIME? It pretty much also suffer from this loose typing. But you can use the
CHECKfunction together with a date formatting function and the
DATETIMEfunction. This is what such a constraint looks like in SQLite3:
CREATE TABLE authors( authorid INTEGER PRIMARY KEY, name TEXT, born DATETIME check( born IS STRFTIME( '%Y-%m-%d', DATETIME(born) ) ) );
The way that works, is that we specify a dateformat for our dates. The format
%Y-%m-%dmeans YYYY-MM-DD as in for instance 2018-05-17. The function
STRFTIMEtakes a format string and date as arguments. We use the
DATETIMEfunction takes a date string and tries to parse it to a date.
If we'd try to insert, e.g. "9/11/10", the constraint would do the following check:
- take 9/11/10 and convert it to a date (fails and returns null)
- convert the result of the above (NULL) to a string, using the format string '%Y-%m-%d' (fails - evaluates to NULL)
- compare the result to the string to be inserted "9/11/10" which also fails
The constraint let's us specify the exact input format of dates. Internally, the dates inserted (if valid according to our pattern) will be stored as dates, but only date strings compliant to our date format will be accepted. Another example:
We'll try to insert the date string '2018-05-17 07:30:00' (which is a string SQLite3 can parse into a date). However, the string doesn't match our specified date format (we didn't want the hour, minute and seconds part!). This will happen:
- Take "2018-05-17 07:30:00" and convert it do a date (works - we'll get the date 2018-05-17 07:30:00 back!)
- convert 2018-05-17 07:30:00 to a string, using our date format '%Y-%m-%d' - which returns '2018-05-17'
- compare the string '2018-05-17' to the original '2018-05-17 07:30:00' - which returns false
The check fails, since we got false back from the last step.
Force integer values to INTEGER columns
As we mentioned before, SQLite3 is very loosely typed and even accepts strings on columns whose type is INTEGER (if the column is not a primary key).
If we really want to only accept actual integer values in SQLite3 for columns whose type is
INTEGER, we need a constraint also for this.
This is what such a constraint could look like:
CREATE TABLE example ( column_name INTEGER CHECK(TYPEOF(column_name) = 'integer') );
The result of trying to insert a string value to the
column_namecolumn would result in the following error:
sqlite> INSERT INTO example VALUES(3); sqlite> INSERT INTO example VALUES('Some text'); Error: CHECK constraint failed: example
Certain text formats using GLOB
You can use the
GLOBfunction (or operator) to create a constraint for checking that a text follows a GLOB pattern.
You can learn about the GLOB function and globbing in the exercises in the next chapter (follow the next-link at the end of this page).
Here's an example for a (not very complete) GLOB constraint for Swedish personal identification numbers:
CREATE TABLE person(id INTEGER PRIMARY KEY, id_num TEXT CHECK( GLOB('[0-9][0-9][0-1][0-9][0-3][0-9]-[0-9][0-9][0-9][0-9]', id_num) ));
The above requires six numbers followed by a hyphen followed by four numbers. In addition, the third number can only be 0 or 1 (the first number of the month needs to be 0 or 1 - there are no months whose number start with 2 (23 is not a valid month), the fifth number must be between 0 and 3 (the day number cannot start with e.g. 4 - day 48 doesn't exist).
The expression isn't complete since it would accept e.g.
001339-1234(month 13, day 39), but you get the picture.
The brackets is the syntax for expressing one single character and all the valid characters as a list. That means
[abc]means "one character from the list 'a', 'b', 'c'". In order be more expressive, intervals of characters are allowed in the list (as long as they represent the orders of characters in the ascii table. So we could also say
[a-c]to express the same thing. We can also combine intervals:
[a-cf-g]meaning "one character from the list 'a', 'b', 'c', 'f', 'g'".
Naming a constraint for better error messages
Using the GLOB example above, we'd get an error message like this:
CREATE TABLE person( id INTEGER PRIMARY KEY, id_num TEXT check(GLOB('[0-9][0-9][0-1][0-9][0-3][0-9]-[0-9][0-9][0-9][0-9]', id_num))); insert into person(id_num) values ('711144-0000'); Error: CHECK constraint failed: person
We could name the constraint for at better error message:
CREATE TABLE person( id INTEGER PRIMARY KEY, id_num TEXT constraint valid_id_num CHECK(GLOB('[0-9][0-9][0-1][0-9][0-3][0-9]-[0-9][0-9][0-9][0-9]', id_num)) ); INSERT INTO person(id_num) VALUES('711144-0000'); Error: CHECK constraint failed: valid_id_num
Not accepting NULL
If you insert data into a table and leave out a column and its value, the default behavior is that the row gets NULL as the value for the left out columns.
If you don't want to accept NULL for a column (meaning that the row always must have an actual value for a column), you just add a
sqlite> CREATE TABLE beer(beer_id INTEGER PRIMARY KEY, name TEXT NOT NULL); sqlite> INSERT INTO beer(beer_id) VALUES(1); Error: NOT NULL constraint failed: beer.name
Force a column to have unique values in the table
If you don't want to allow duplicate values for a column in your table, you can add a
UNIQUEconstraint. Note that columns that are
PRIMARY KEYget this implicitly - primary keys are always unique in order to make sense.
sqlite> CREATE TABLE users(user_id INTEGER PRIMARY KEY, email TEXT UNIQUE, user_name TEXT UNIQUE NOT NULL); sqlite> INSERT INTO users(email, user_name) VALUES('email@example.com', 'xrazor'); sqlite> INSERT INTO users(email, user_name) VALUES('firstname.lastname@example.org', 'xhesa'); --email must be UNIQUE Error: UNIQUE constraint failed: users.email sqlite> INSERT INTO users(email, user_name) VALUES('email@example.com', 'xrazor'); --name must be UNIQUE Error: UNIQUE constraint failed: users.user_name -- What if we leave out user_name? Then it will be set to NULL sqlite> INSERT INTO users(email) VALUES('firstname.lastname@example.org'); --email must be UNIQUE and NOT NULL Error: NOT NULL constraint failed: users.user_name
We have one lecture (in two videos) in Swedish:
We encourage you to also see the English video lecture on "simulating enums" below.
At the moment, we only have a separate video on "enums" in English:
The main lecture is only available in Swedish (above) but the slides for that one are in English.
Work in progress
Remove this section when the page is production ready.
- Record English version of the videos (medium priority)
Create PDF and video for the MariaDB triggers alternative (low priority)DONE!