2024-02-07: Exercises Needed

This tutorial needs (at least) 40 exercises.
If you would like to propose one,
please submit an issue
with the question you would ask and the SQL you would want the learner to write.
It doesn’t have to use the databases in the db folder,
but that would make learners’ lives easier.
All contributions will be acknowledged in the lesson.

Note: I’m asking for proposals partly because I’m lazy,
but also because I know from experience that
I often focus on a handful of things that I personally found challenging
and overlook other topics.
Asking for help is one way to compensate for those blind spots.

☆ what this is

notes and working examples that instructors can use to perform a lesson

do not expect novices with no prior SQL experience to be able to learn from them

musical analogy

this is the chord changes and melody
we expect instructors to create an arrangement and/or improvise while delivering
see Teaching Tech Together for background

☆ scope

intended audience

Rachel has a master’s degree in cell biology
and now works in a research hospital doing cell assays.
She learned a bit of R in an undergrad biostatistics course
and has been through the Carpentries lesson on the Unix shell.
Rachel is thinking about becoming a data scientist
and would like to understand how data is stored and managed.
Her work schedule is unpredictable and highly variable,
so she needs to be able to learn a bit at a time.

prerequisites

basic Unix command line: cd, ls, * wildcard
basic tabular data analysis: filtering rows, aggregating within groups

learning outcomes

Explain the difference between a database and a database manager.
Write SQL to select, filter, sort, group, and aggregate data.
Define tables and insert, update, and delete records.
Describe different types of join and write queries that use them to combine data.
Use windowing functions to operate on adjacent rows.
Explain what transactions are and write queries that roll back when constraints are violated.
Explain what triggers are and write SQL to create them.
Manipulate JSON data using SQL.
Interact with a database using Python directly, from a Jupyter notebook, and via an ORM.

☆ setup

Download the latest release
Unzip the file in a temporary directory to create:

./db/*.db: the SQLite databases used in the examples
./src/*.*: SQL queries, Python scripts, and other source code
./out/*.*: expected output for examples

☆ background concepts

A database is a collection of data that can be searched and retrieved
A database management system (DBMS) is a program that manages a particular kind of database
Each DBMS stores data in its own way

SQLite stores each database in a single file
PostgreSQL spreads information across many files for higher performance

DBMS can be a library embedded in other programs (SQLite) or a server (PostgreSQL)
A relational database management system (RDBMS) stores data in tables and uses SQL for queries

Unfortunately, every RDBMS has its own dialect of SQL

There are also NoSQL databases like MongoDB that don’t use tables

☆ connect to database

src/connect_penguins.sh

Not actually a query
But we have to do it before we can do anything else

1: select constant

src/select_1.sql

out/select_1.out

select is a keyword
Normally used to select data from table…
…but if all we want is a constant value, we don’t need to specify one
Semi-colon terminator is required

2: select all values from table

src/select_star.sql

select * from little_penguins;

out/select_star.out

Gentoo|Biscoe|51.3|14.2|218.0|5300.0|MALE
Adelie|Dream|35.7|18.0|202.0|3550.0|FEMALE
Adelie|Torgersen|36.6|17.8|185.0|3700.0|FEMALE
Chinstrap|Dream|55.8|19.8|207.0|4000.0|MALE
Adelie|Dream|38.1|18.6|190.0|3700.0|FEMALE
Adelie|Dream|36.2|17.3|187.0|3300.0|FEMALE
Adelie|Dream|39.5|17.8|188.0|3300.0|FEMALE
Gentoo|Biscoe|42.6|13.7|213.0|4950.0|FEMALE
Gentoo|Biscoe|52.1|17.0|230.0|5550.0|MALE
Adelie|Torgersen|36.7|18.8|187.0|3800.0|FEMALE

An actual query
Use * to mean “all columns”
Use from tablename to specify table
Output format is not particularly readable

☆ administrative commands

src/admin_commands.sql

.headers on
.mode markdown
select * from little_penguins;

out/admin_commands.out

| species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex |
|———–|———–|—————-|—————|——————-|————-|——–|
| Gentoo | Biscoe | 51.3 | 14.2 | 218.0 | 5300.0 | MALE |
| Adelie | Dream | 35.7 | 18.0 | 202.0 | 3550.0 | FEMALE |
| Adelie | Torgersen | 36.6 | 17.8 | 185.0 | 3700.0 | FEMALE |
| Chinstrap | Dream | 55.8 | 19.8 | 207.0 | 4000.0 | MALE |
| Adelie | Dream | 38.1 | 18.6 | 190.0 | 3700.0 | FEMALE |
| Adelie | Dream | 36.2 | 17.3 | 187.0 | 3300.0 | FEMALE |
| Adelie | Dream | 39.5 | 17.8 | 188.0 | 3300.0 | FEMALE |
| Gentoo | Biscoe | 42.6 | 13.7 | 213.0 | 4950.0 | FEMALE |
| Gentoo | Biscoe | 52.1 | 17.0 | 230.0 | 5550.0 | MALE |
| Adelie | Torgersen | 36.7 | 18.8 | 187.0 | 3800.0 | FEMALE |

SQLite administrative commands start with . and aren’t part of the SQL standard

PostgreSQL’s special commands start with

Use .help for a complete list

3: specify columns

src/specify_columns.sql

select
species,
island,
sex
from little_penguins;

out/specify_columns.out

| species | island | sex |
|———–|———–|——–|
| Gentoo | Biscoe | MALE |
| Adelie | Dream | FEMALE |
| Adelie | Torgersen | FEMALE |
| Chinstrap | Dream | MALE |
| Adelie | Dream | FEMALE |
| Adelie | Dream | FEMALE |
| Adelie | Dream | FEMALE |
| Gentoo | Biscoe | FEMALE |
| Gentoo | Biscoe | MALE |
| Adelie | Torgersen | FEMALE |

Specify column names separated by commas

In any order
Duplicates allowed

Line breaks allowed encouraged for readability

4: sort

src/sort.sql

select
species,
sex,
island
from little_penguins
order by island asc, sex desc;

out/sort.out

| species | sex | island |
|———–|——–|———–|
| Gentoo | MALE | Biscoe |
| Gentoo | MALE | Biscoe |
| Gentoo | FEMALE | Biscoe |
| Chinstrap | MALE | Dream |
| Adelie | FEMALE | Dream |
| Adelie | FEMALE | Dream |
| Adelie | FEMALE | Dream |
| Adelie | FEMALE | Dream |
| Adelie | FEMALE | Torgersen |
| Adelie | FEMALE | Torgersen |

order by must follow from (which must follow select)
asc is ascending, desc is descending

Default is ascending, but please specify

5: limit output

Full dataset has 344 rows

src/limit.sql

select
species,
sex,
island
from penguins
order by species, sex, island
limit 10;

out/limit.out

| species | sex | island |
|———|——–|———–|
| Adelie | | Dream |
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |

Comments start with — and run to the end of the line
limit N specifies maximum number of rows returned by query

6: page output

src/page.sql

select
species,
sex,
island
from penguins
order by species, sex, island
limit 10 offset 3;

out/page.out

| species | sex | island |
|———|——–|———–|
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |
| Adelie | FEMALE | Biscoe |

offset N must follow limit
Specifies number of rows to skip from the start of the selection
So this query skips the first 3 and shows the next 10

7: remove duplicates

src/distinct.sql

select distinct
species,
sex,
island
from penguins;

out/distinct.out

| species | sex | island |
|———–|——–|———–|
| Adelie | MALE | Torgersen |
| Adelie | FEMALE | Torgersen |
| Adelie | | Torgersen |
| Adelie | FEMALE | Biscoe |
| Adelie | MALE | Biscoe |
| Adelie | FEMALE | Dream |
| Adelie | MALE | Dream |
| Adelie | | Dream |
| Chinstrap | FEMALE | Dream |
| Chinstrap | MALE | Dream |
| Gentoo | FEMALE | Biscoe |
| Gentoo | MALE | Biscoe |
| Gentoo | | Biscoe |

distinct keyword must appear right after select

SQL was supposed to read like English

Shows distinct combinations
Blanks in sex column show missing data

We’ll talk about this in a bit

8: filter results

src/filter.sql

select distinct
species,
sex,
island
from penguins
where island = ‘Biscoe’;

out/filter.out

| species | sex | island |
|———|——–|——–|
| Adelie | FEMALE | Biscoe |
| Adelie | MALE | Biscoe |
| Gentoo | FEMALE | Biscoe |
| Gentoo | MALE | Biscoe |
| Gentoo | | Biscoe |

where condition filters the rows produced by selection
Condition is evaluated independently for each row
Only rows that pass the test appear in results
Use single quotes for ‘text data’ and double quotes for “weird column names”

SQLite will accept double-quoted text data

9: filter with more complex conditions

src/filter_and.sql

select distinct
species,
sex,
island
from penguins
where island = ‘Biscoe’ and sex != ‘MALE’;

out/filter_and.out

| species | sex | island |
|———|——–|——–|
| Adelie | FEMALE | Biscoe |
| Gentoo | FEMALE | Biscoe |

and: both sub-conditions must be true
or: either or both part must be true
Notice that the row for Gentoo penguins on Biscoe island with unknown (empty) sex didn’t pass the test

We’ll talk about this in a bit

10: do calculations

src/calculations.sql

select
flipper_length_mm / 10.0,
body_mass_g / 1000.0
from penguins
limit 3;

out/calculations.out

| flipper_length_mm / 10.0 | body_mass_g / 1000.0 |
|————————–|———————-|
| 18.1 | 3.75 |
| 18.6 | 3.8 |
| 19.5 | 3.25 |

Can do the usual kinds of arithmetic on individual values

Calculation done for each row independently

Column name shows the calculation done

11: rename columns

src/rename_columns.sql

select
flipper_length_mm / 10.0 as flipper_cm,
body_mass_g / 1000.0 as weight_kg,
island as where_found
from penguins
limit 3;

out/rename_columns.out

| flipper_cm | weight_kg | where_found |
|————|———–|————-|
| 18.1 | 3.75 | Torgersen |
| 18.6 | 3.8 | Torgersen |
| 19.5 | 3.25 | Torgersen |

Use expression as name to rename
Give result of calculation a meaningful name
Can also rename columns without modifying

☆ check your understanding

12: calculate with missing values

src/show_missing_values.sql

select
flipper_length_mm / 10.0 as flipper_cm,
body_mass_g / 1000.0 as weight_kg,
island as where_found
from penguins
limit 5;

out/show_missing_values.out

| flipper_cm | weight_kg | where_found |
|————|———–|————-|
| 18.1 | 3.75 | Torgersen |
| 18.6 | 3.8 | Torgersen |
| 19.5 | 3.25 | Torgersen |
| | | Torgersen |
| 19.3 | 3.45 | Torgersen |

SQL uses a special value null to representing missing data

Not 0 or empty string, but “I don’t know”

Flipper length and body weight not known for one of the first five penguins
“I don’t know” divided by 10 or 1000 is “I don’t know”

13: null equality

Repeated from above so it doesn’t count against our query limit

src/filter.sql

select distinct
species,
sex,
island
from penguins
where island = ‘Biscoe’;

out/filter.out

| species | sex | island |
|———|——–|——–|
| Adelie | FEMALE | Biscoe |
| Adelie | MALE | Biscoe |
| Gentoo | FEMALE | Biscoe |
| Gentoo | MALE | Biscoe |
| Gentoo | | Biscoe |

If we ask for female penguins the row with the missing sex drops out

src/null_equality.sql

select distinct
species,
sex,
island
from penguins
where island = ‘Biscoe’ and sex = ‘FEMALE’;

out/null_equality.out

| species | sex | island |
|———|——–|——–|
| Adelie | FEMALE | Biscoe |
| Gentoo | FEMALE | Biscoe |

14: null inequality

But if we ask for penguins that aren’t female it drops out as well

src/null_inequality.sql

select distinct
species,
sex,
island
from penguins
where island = ‘Biscoe’ and sex != ‘FEMALE’;

out/null_inequality.out

| species | sex | island |
|———|——|——–|
| Adelie | MALE | Biscoe |
| Gentoo | MALE | Biscoe |

15: ternary logic

src/ternary_logic.sql

out/ternary_logic.out

| null = null |
|————-|
| |

If we don’t know the left and right values, we don’t know if they’re equal or not
So the result is null
Get the same answer for null != null
Ternary logic

equality

X
Y
null

X
true
false
null

Y
false
true
null

null
null
null
null

16: handle null safely

src/safe_null_equality.sql

select
species,
sex,
island
from penguins
where sex is null;

out/safe_null_equality.out

| species | sex | island |
|———|—–|———–|
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | | Torgersen |
| Adelie | | Dream |
| Gentoo | | Biscoe |
| Gentoo | | Biscoe |
| Gentoo | | Biscoe |
| Gentoo | | Biscoe |
| Gentoo | | Biscoe |

Use is null and is not null to handle null safely
Other parts of SQL handle nulls specially

☆ check your understanding

17: aggregate

src/simple_sum.sql

select sum(body_mass_g) as total_mass
from penguins;

out/simple_sum.out

| total_mass |
|————|
| 1437000.0 |

Aggregation combines many values to produce one
sum is an aggregation function
Combines corresponding values from multiple rows

18: common aggregation functions

src/common_aggregations.sql

select
max(bill_length_mm) as longest_bill,
min(flipper_length_mm) as shortest_flipper,
avg(bill_length_mm) / avg(bill_depth_mm) as weird_ratio
from penguins;

out/common_aggregations.out

| longest_bill | shortest_flipper | weird_ratio |
|————–|——————|——————|
| 59.6 | 172.0 | 2.56087082530644 |

This actually shouldn’t work:
can’t calculate maximum or average if any values are null
SQL does the useful thing instead of the right one

19: counting

src/count_behavior.sql

select
count(*) as count_star,
count(sex) as count_specific,
count(distinct sex) as count_distinct
from penguins;

out/count_behavior.out

| count_star | count_specific | count_distinct |
|————|—————-|—————-|
| 344 | 333 | 2 |

count(*) counts rows
count(column) counts non-null entries in column
count(distinct column) counts distinct non-null entries

20: group

src/simple_group.sql

select avg(body_mass_g) as average_mass_g
from penguins
group by sex;

out/simple_group.out

| average_mass_g |
|——————|
| 4005.55555555556 |
| 3862.27272727273 |
| 4545.68452380952 |

Put rows in groups based on distinct combinations of values in columns specified with group by
Then perform aggregation separately for each group
But which is which?

21: behavior of unaggregated columns

src/unaggregated_columns.sql

select
sex,
avg(body_mass_g) as average_mass_g
from penguins
group by sex;

out/unaggregated_columns.out

| sex | average_mass_g |
|——–|——————|
| | 4005.55555555556 |
| FEMALE | 3862.27272727273 |
| MALE | 4545.68452380952 |

All rows in each group have the same value for sex, so no need to aggregate

22: arbitrary choice in aggregation

src/arbitrary_in_aggregation.sql

select
sex,
body_mass_g
from penguins
group by sex;

out/arbitrary_in_aggregation.out

| sex | body_mass_g |
|——–|————-|
| | |
| FEMALE | 3800.0 |
| MALE | 3750.0 |

If we don’t specify how to aggregate a column,
SQLite chooses any arbitrary value from the group

All penguins in each group have the same sex because we grouped by that, so we get the right answer
The body mass values are in the data but unpredictable
A common mistake

Other database managers don’t do this

E.g., PostgreSQL complains that column must be used in an aggregation function

23: filter aggregated values

src/filter_aggregation.sql

select
sex,
avg(body_mass_g) as average_mass_g
from penguins
group by sex
having average_mass_g > 4000.0;

out/filter_aggregation.out

| sex | average_mass_g |
|——|——————|
| | 4005.55555555556 |
| MALE | 4545.68452380952 |

Using having condition instead of where condition for aggregates

24: readable output

src/readable_aggregation.sql

select
sex,
round(avg(body_mass_g), 1) as average_mass_g
from penguins
group by sex
having average_mass_g > 4000.0;

out/readable_aggregation.out

| sex | average_mass_g |
|——|—————-|
| | 4005.6 |
| MALE | 4545.7 |

Use round(value, decimals) to round off a number

25: filter aggregate inputs

src/filter_aggregate_inputs.sql

select
sex,
round(
avg(body_mass_g) filter (where body_mass_g (
select avg(body_mass_g)
from penguins
)
limit 5;

out/compare_individual_aggregate.out

| body_mass_g |
|————-|
| 4675.0 |
| 4250.0 |
| 4400.0 |
| 4500.0 |
| 4650.0 |

Get average body mass in subquery
Compare each row against that
Requires two scans of the data, but there’s no way to avoid that
Null values aren’t included in the average or in the final results

45: compare individual values to aggregates within groups

src/compare_within_groups.sql

select
penguins.species,
penguins.body_mass_g,
round(averaged.avg_mass_g, 1) as avg_mass_g
from penguins inner join (
select
species,
avg(body_mass_g) as avg_mass_g
from penguins
group by species
) as averaged
on penguins.species = averaged.species
where penguins.body_mass_g > averaged.avg_mass_g
limit 5;

out/compare_within_groups.out

| species | body_mass_g | avg_mass_g |
|———|————-|————|
| Adelie | 3750.0 | 3700.7 |
| Adelie | 3800.0 | 3700.7 |
| Adelie | 4675.0 | 3700.7 |
| Adelie | 4250.0 | 3700.7 |
| Adelie | 3800.0 | 3700.7 |

Subquery runs first to create temporary table averaged with average mass per species
Join that with penguins
Filter to find penguins heavier than average within their species

46: common table expressions

src/common_table_expressions.sql

with grouped as (
select
species,
avg(body_mass_g) as avg_mass_g
from penguins
group by species
)

select
penguins.species,
penguins.body_mass_g,
round(grouped.avg_mass_g, 1) as avg_mass_g
from penguins inner join grouped
where penguins.body_mass_g > grouped.avg_mass_g
limit 5;

out/common_table_expressions.out

| species | body_mass_g | avg_mass_g |
|———|————-|————|
| Adelie | 3750.0 | 3700.7 |
| Adelie | 3800.0 | 3700.7 |
| Adelie | 4675.0 | 3700.7 |
| Adelie | 4250.0 | 3700.7 |
| Adelie | 3800.0 | 3700.7 |

Use common table expression (CTE) to make queries clearer

Nested subqueries quickly become difficult to understand

Database decides how to optimize

☆ explain query plan

src/explain_query_plan.sql

explain query plan
select
species,
avg(body_mass_g)
from penguins
group by species;

out/explain_query_plan.out

QUERY PLAN
|–SCAN penguins
`–USE TEMP B-TREE FOR GROUP BY

SQLite plans to scan every row of the table
It will build a temporary B-tree data structure to group rows

47: enumerate rows

Every table has a special column called rowid

src/rowid.sql

select
rowid,
species,
island
from penguins
limit 5;

out/rowid.out

| rowid | species | island |
|——-|———|———–|
| 1 | Adelie | Torgersen |
| 2 | Adelie | Torgersen |
| 3 | Adelie | Torgersen |
| 4 | Adelie | Torgersen |
| 5 | Adelie | Torgersen |

rowid is persistent within a session

I.e., if we delete the first 5 rows we now have row IDs 6…N

Do not rely on row ID

In particular, do not use it as a key

48: if-else function

src/if_else.sql

with sized_penguins as (
select
species,
iif(
body_mass_g 0.0)
);

insert into job values (‘calibrate’, 1.5);

begin transaction;
insert into job values (‘clean’, 0.5);
rollback;

select * from job;

out/transaction.out

| name | billable |
|———–|———-|
| calibrate | 1.5 |

Statements outside transaction execute and are committed immediately
Statement(s) inside transaction don’t take effect until:

end transaction (success)
rollback (undo)

Can have any number of statements inside a transaction
But cannot nest transactions in SQLite

Other databases support this

80: rollback in constraint

src/rollback_constraint.sql

create table job (
name text not null,
billable real not null,
check (billable > 0.0) on conflict rollback
);

insert into job values
(‘calibrate’, 1.5);
insert into job values
(‘clean’, 0.5),
(‘reset’, -0.5);

select * from job;

out/rollback_constraint.out

Runtime error near line 11: CHECK constraint failed: billable > 0.0 (19)
| name | billable |
|———–|———-|
| calibrate | 1.5 |

All of second insert rolled back as soon as error occurred
But first insert took effect

81: rollback in statement

src/rollback_statement.sql

create table job (
name text not null,
billable real not null,
check (billable > 0.0)
);

insert or rollback into job values
(‘calibrate’, 1.5);
insert or rollback into job values
(‘clean’, 0.5),
(‘reset’, -0.5);

select * from job;

out/rollback_statement.out

Runtime error near line 11: CHECK constraint failed: billable > 0.0 (19)
| name | billable |
|———–|———-|
| calibrate | 1.5 |

Constraint is in table definition
Action is in statement

82: upsert

src/upsert.sql

create table jobs_done (
person text unique,
num integer default 0
);

insert into jobs_done values
(‘zia’, 1);
.print ‘after first’
select * from jobs_done;
.print

insert into jobs_done values
(‘zia’, 1);
.print ‘after failed’
select * from jobs_done;

insert into jobs_done values
(‘zia’, 1)
on conflict(person) do update set num = num + 1;
.print ‘nafter upsert’
select * from jobs_done;

out/upsert.out

after first
| person | num |
|——–|—–|
| zia | 1 |

Runtime error near line 15: UNIQUE constraint failed: jobs_done.person (19)
after failed
| person | num |
|——–|—–|
| zia | 1 |
nafter upsert
| person | num |
|——–|—–|
| zia | 2 |

upsert stands for “update or insert”

Create if record doesn’t exist
Update if it does

Not standard SQL but widely implemented
Example also shows use of SQLite .print command

☆ normalization

First normal form (1NF):
every field of every record contains one indivisible value.

Second normal form (2NF) and third normal form (3NF):
every value in a record that isn’t a key depends solely on the key,
not on other values.

Denormalization: explicitly store values that could be calculated on the fly

To simplify queries and/or make processing faster

83: create trigger

A trigger automatically runs before or after a specified operation
Can have side effects (e.g., update some other table)
And/or implement checks (e.g., make sure other records exist)
Add processing overhead…
…but data is either cheap or correct, never both
Inside trigger, refer to old and new versions of record
as old.column and new.column

src/trigger_setup.sql

— Track hours of lab work.
create table job (
person text not null,
reported real not null check (reported >= 0.0)
);

— Explicitly store per-person total rather than using sum().
create table total (
person text unique not null,
hours real
);

— Initialize totals.
insert into total values
(‘gene’, 0.0),
(‘august’, 0.0);

— Define a trigger.
create trigger total_trigger
before insert on job
begin
— Check that the person exists.
select case
when not exists (select 1 from total where person = new.person)
then raise(rollback, ‘Unknown person ‘)
end;
— Update their total hours (or fail if non-negative constraint violated).
update total
set hours = hours + new.reported
where total.person = new.person;
end;

src/trigger_successful.sql

insert into job values
(‘gene’, 1.5),
(‘august’, 0.5),
(‘gene’, 1.0);

out/trigger_successful.out

| person | reported |
|——–|———-|
| gene | 1.5 |
| august | 0.5 |
| gene | 1.0 |

| person | hours |
|——–|——-|
| gene | 2.5 |
| august | 0.5 |

src/trigger_firing.sql

insert into job values
(‘gene’, 1.0),
(‘august’, -1.0);

out/trigger_firing.out

Runtime error near line 6: CHECK constraint failed: reported >= 0.0 (19)

| person | hours |
|——–|——-|
| gene | 0.0 |
| august | 0.0 |

☆ represent graphs

src/lineage_setup.sql

create table lineage (
parent text not null,
child text not null
);
insert into lineage values
(‘Arturo’, ‘Clemente’),
(‘Darío’, ‘Clemente’),
(‘Clemente’, ‘Homero’),
(‘Clemente’, ‘Ivonne’),
(‘Ivonne’, ‘Lourdes’),
(‘Soledad’, ‘Lourdes’),
(‘Lourdes’, ‘Santiago’);

src/represent_graph.sql

out/represent_graph.out

| parent | child |
|———-|———-|
| Arturo | Clemente |
| Darío | Clemente |
| Clemente | Homero |
| Clemente | Ivonne |
| Ivonne | Lourdes |
| Soledad | Lourdes |
| Lourdes | Santiago |

84: recursive query

src/recursive_lineage.sql

with recursive descendent as (
select
‘Clemente’ as person,
0 as generations
union all
select
lineage.child as person,
descendent.generations + 1 as generations
from descendent inner join lineage
on descendent.person = lineage.parent
)

select
person,
generations
from descendent;

out/recursive_lineage.out

| person | generations |
|———-|————-|
| Clemente | 0 |
| Homero | 1 |
| Ivonne | 1 |
| Lourdes | 2 |
| Santiago | 3 |

Use a recursive CTE to create a temporary table (descendent)
Base case seeds this table
Recursive case relies on value(s) already in that table and external table(s)
union all to combine rows

Can use union but that has lower performance (must check uniqueness each time)

Stops when the recursive case yields an empty row set (nothing new to add)
Then select the desired values from the CTE

src/contact_person.sql

out/contact_person.out

| ident | name |
|——-|———————–|
| 1 | Juana Baeza |
| 2 | Agustín Rodríquez |
| 3 | Ariadna Caraballo |
| 4 | Micaela Laboy |
| 5 | Verónica Altamirano |
| 6 | Reina Rivero |
| 7 | Elias Merino |
| 8
Read More