Product Managers Get Ready for Advance SQL

This is part 3 of my post on learning SQL. I wrote about how to set up Postgres and PgAdmin in my post 1 and covered a few basic SQL queries in post 2. In this post, I want to discuss the aggregate functions, use of GROUP BY, JOINS ( I know...the most dreadful topic), and UNIONs

Aggregate Functions

The main role of aggregate functions is to take multiple inputs and return a single output. Here are the most common aggregate functions.

AVG ()- returns average value
COUNT()- returns the number of values
MAX ()- returns the maximum value
MIN ()- returns the minimum value
SUM ()- returns the sum of all values

The aggregate functions are used in the SELECT clause. Here are a few examples:

1) Find the average, minimum, and maximum rental duration from the film table

SELECT AVG (rental_duration), MIN (rental_duration), MAX (rental_duration) FROM film

2) Find the total amount from the payment table

SELECT SUM (amount) FROM payment

Few key things to note are that aggregate functions return only 1 result and hence we cannot combine them to be displayed with other columns unless we are using GROUP BY .

GROUP BY and HAVING

This is a very powerful function that product managers used extensively. What we do basically is to use an aggregate function on a category. For example, you want to know who are your top paying customers. If you think about this, you will need to do a SUM on the amount paid for all the transactions per customer and then sort them in descending order. Let's look at it by an example. Let's explore the payment table.

If you will notice, you will see the amount column and customer id column in this table. In order to find the top-paying customer, we need the SUM of the amount on all transactions per customer.

SELECT customer_id, SUM(amount) FROM payment

GROUP BY customer_id

ORDER BY SUM(amount) DESC

You will notice that customer_id '148' is our highest paying customer. We can additionally use LIMIT to identify the top 3 paying customers. We can also use COUNT instead of SUM to find out the total number of transactions per customer. We can use these results to identify customer segments for various promotions and whatnot. If you want to use the

filter on aggregate function like only display customers whose SUM >100, we can use HAVING

SELECT customer_id, SUM(amount) FROM payment

GROUP BY customer_id

HAVING SUM(amount) >100

ORDER BY SUM(amount) DESC

JOINS

Until now, all the queries that I have written are executed on a single table. But more than often, we find ourselves running queries that will require us to connect two or more tables. And that's where the JOINS come into play. I think this concept will be best understood with the help of the Venn diagram.

Consider there are 2 tables, Table A and Table B. Now, depending on our requirements we may find ourselves in one of 4 situations

Common records between 2 tables - Can be retrieved using Inner Join
Records that are unique to both Table A and Table B. Can be retrieved using FULL OUTER Join
Records that are only exclusive to Table A- Can be retrieved using LEFT OUTER Join
Records that are only exclusive to Table B- Can be retrieved using RIGHT OUTER Join

In order to join the tables, we need Table B to have the primary key of Table A present as a foreign key ( Remember Primary Key and Foreign Keys from post 1 of this series).

Let's look at each join and their syntaxes

Inner Join- The Syntax of the query is

SELECT * FROM Table A

INNER JOIN Table B

ON Table A.col_match = Table B.col_match

Here is an example. Let's fetch payment_id, customer_id and first_name by joining payment and customer table

SELECT payment_id, payment.customer_id,customer.first_name FROM payment

INNER JOIN customer ON

payment.customer_id = customer.customer_id

FULL OUTER Join- The Syntax of the query is

SELECT * FROM Table A

FULL OUTER JOIN Table B

ON Table A.col_match = Table B.col_match

WHERE Table A.id IS NULL OR Table B.ID IS NULL

Here is an example

SELECT * FROM payment

FULL OUTER JOIN customer ON

payment.customer_id = customer.customer_id

WHERE payment.payment_id IS NULL OR customer.customer_id IS NULL

LEFT OUTER Join

The Syntax of the query is

SELECT * FROM Table A

LEFT JOIN Table B

ON Table A.col_match = Table B.col_match

WHERE Table B.ID IS NULL

Here is an example to find out films that are not in inventory

SELECT film.film_id, title, inventory_id

FROM film

LEFT JOIN inventory ON

inventory.film_id= film.film_id

WHERE inventory.film_id IS NULL

RIGHT OUTER Join

This is similar to LEFT JOIN except that the tables are changed

The Syntax of the query is

SELECT * FROM Table A

RIGHT JOIN Table B

ON Table A.col_match = Table B.col_match

WHERE Table A.ID IS NULL

Here is an example

SELECT film.film_id, title, inventory_id

FROM film

RIGHT JOIN inventory ON

inventory.film_id= film.film_id

WHERE film.film_id IS NULL

UNION

The UNION operator is used to combine the result-set of two or more SELECT statements.

The Syntax of the query is

SELECT column_name(s) FROM Table A

UNION

SELECT column_name(s) FROM Table B

That's all I wanted to cover in this post. There is so much more in SQL like the use of various date operators, operators for conditions such as CASE, COALESCE, operator for converting data type such as CAST, etc but the ones I mentioned in my 3 posts will give us sufficient knowledge to run the basic queries all by ourselves. In my next posts, I want to cover some basic DevOps concepts like CI/CD pipelines, Jenkins, Dockers, and Ansible.

Stay Tuned !!

Product Management for Dummies

Product Managers Get Ready for Advance SQL - Part 3

Recent Posts

Comments