This is part 3 of my post on learning SQL. I wrote about how to set up Postgres and PgAdmin in my post 1 and covered a few basic SQL queries in post 2. In this post, I want to discuss the aggregate functions, use of GROUP BY, JOINS ( I know...the most dreadful topic), and UNIONs
Aggregate Functions
The main role of aggregate functions is to take multiple inputs and return a single output. Here are the most common aggregate functions.
AVG ()- returns average value
COUNT()- returns the number of values
MAX ()- returns the maximum value
MIN ()- returns the minimum value
SUM ()- returns the sum of all values
The aggregate functions are used in the SELECT clause. Here are a few examples:
1) Find the average, minimum, and maximum rental duration from the film table
SELECT AVG (rental_duration), MIN (rental_duration), MAX (rental_duration) FROM film
2) Find the total amount from the payment table
SELECT SUM (amount) FROM payment
Few key things to note are that aggregate functions return only 1 result and hence we cannot combine them to be displayed with other columns unless we are using GROUP BY .
GROUP BY and HAVING
This is a very powerful function that product managers used extensively. What we do basically is to use an aggregate function on a category. For example, you want to know who are your top paying customers. If you think about this, you will need to do a SUM on the amount paid for all the transactions per customer and then sort them in descending order. Let's look at it by an example. Let's explore the payment table.
If you will notice, you will see the amount column and customer id column in this table. In order to find the top-paying customer, we need the SUM of the amount on all transactions per customer.
SELECT customer_id, SUM(amount) FROM payment
GROUP BY customer_id
ORDER BY SUM(amount) DESC
You will notice that customer_id '148' is our highest paying customer. We can additionally use LIMIT to identify the top 3 paying customers. We can also use COUNT instead of SUM to find out the total number of transactions per customer. We can use these results to identify customer segments for various promotions and whatnot. If you want to use the
filter on aggregate function like only display customers whose SUM >100, we can use HAVING
SELECT customer_id, SUM(amount) FROM payment
GROUP BY customer_id
HAVING SUM(amount) >100
ORDER BY SUM(amount) DESC
JOINS
Until now, all the queries that I have written are executed on a single table. But more than often, we find ourselves running queries that will require us to connect two or more tables. And that's where the JOINS come into play. I think this concept will be best understood with the help of the Venn diagram.
Consider there are 2 tables, Table A and Table B. Now, depending on our requirements we may find ourselves in one of 4 situations
Common records between 2 tables - Can be retrieved using Inner Join
Records that are unique to both Table A and Table B. Can be retrieved using FULL OUTER Join
Records that are only exclusive to Table A- Can be retrieved using LEFT OUTER Join
Records that are only exclusive to Table B- Can be retrieved using RIGHT OUTER Join
In order to join the tables, we need Table B to have the primary key of Table A present as a foreign key ( Remember Primary Key and Foreign Keys from post 1 of this series).
Let's look at each join and their syntaxes
Inner Join- The Syntax of the query is
SELECT * FROM Table A
INNER JOIN Table B
ON Table A.col_match = Table B.col_match
Here is an example. Let's fetch payment_id, customer_id and first_name by joining payment and customer table
SELECT payment_id, payment.customer_id,customer.first_name FROM payment
INNER JOIN customer ON
payment.customer_id = customer.customer_id
FULL OUTER Join- The Syntax of the query is
SELECT * FROM Table A
FULL OUTER JOIN Table B
ON Table A.col_match = Table B.col_match
WHERE Table A.id IS NULL OR Table B.ID IS NULL
Here is an example
SELECT * FROM payment
FULL OUTER JOIN customer ON
payment.customer_id = customer.customer_id
WHERE payment.payment_id IS NULL OR customer.customer_id IS NULL
LEFT OUTER Join
The Syntax of the query is
SELECT * FROM Table A
LEFT JOIN Table B
ON Table A.col_match = Table B.col_match
WHERE Table B.ID IS NULL
Here is an example to find out films that are not in inventory
SELECT film.film_id, title, inventory_id
FROM film
LEFT JOIN inventory ON
inventory.film_id= film.film_id
WHERE inventory.film_id IS NULL
RIGHT OUTER Join
This is similar to LEFT JOIN except that the tables are changed
The Syntax of the query is
SELECT * FROM Table A
RIGHT JOIN Table B
ON Table A.col_match = Table B.col_match
WHERE Table A.ID IS NULL
Here is an example
SELECT film.film_id, title, inventory_id
FROM film
RIGHT JOIN inventory ON
inventory.film_id= film.film_id
WHERE film.film_id IS NULL
UNION
The UNION operator is used to combine the result-set of two or more SELECT statements.
The Syntax of the query is
SELECT column_name(s) FROM Table A
UNION
SELECT column_name(s) FROM Table B
That's all I wanted to cover in this post. There is so much more in SQL like the use of various date operators, operators for conditions such as CASE, COALESCE, operator for converting data type such as CAST, etc but the ones I mentioned in my 3 posts will give us sufficient knowledge to run the basic queries all by ourselves. In my next posts, I want to cover some basic DevOps concepts like CI/CD pipelines, Jenkins, Dockers, and Ansible.
Stay Tuned !!
Comments