Every product manager needs to look at historical data to validate assumptions, find patterns, and get some sense of what has happened in the past. This helps in making informed decisions, adjusting the plans for the future, and be better prepared. Most of the time, we end up running SQL (Structured Query Language) queries on various tables to get our answers. These are things that we should be able to do on our own without being dependent on our technical team.
We can use queries to answer a variety of questions such as
Who are the top buyers in each state and how much did they expend?
How much is the average expense for people who left a negative review? What about positive reviews?
How much, in percentage, the sales had increased or decreased over the months in 2020
A list of customers eligible for a new promotion based on a criteria
Top 3 reasons for customers have unsubscribed from our service
For this reason, I firmly believe we as product manager should have a good handle on running queries. One doesn't need to be a database expert but one should know the bare minimum.
In this 3 post series, I will first cover a few basics of databases and then run basic SQL queries on my sample Database that I have setup.
Before we go any further, there are a few basic things I should mention:-
Types of databases
Next time, you are building a new product and your architects and engineers are talking about using MongoDB vs Cassandra DB for databases you should at least understand why they are considering them, and for that, you need to understand different types of databases.
There are two main types of databases-SQL or NoSQL databases.
SQL (Structured Query Language) is a programming language that is used to manage data in relational databases. Relational databases use relations, which are typically called tables, to store data and then match that data by using common characteristics within the data set. SQL databases examples include MySQL, Oracle, PostgreSQL, and Microsoft SQL Server.
NoSQL databases can be document-based, key-value pairs, graph databases. SQL databases have a predefined schema whereas NoSQL databases use dynamic schema for unstructured data. These databases are highly recognized for their ease-of-use, scalable performance, strong resilience, and wide availability. NoSQL database examples include MongoDB, BigTable, RavenDB Cassandra, HBase, Neo4j and CouchDB. There are reasons why one will consider MongoDB over Cassandra and understanding the basics about these databases will help you alot.
Database Schema
We need to understand more about table structure before we start running queries. This is where database schema comes into the picture. A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied to the data. It is essential to understand the schema so that you know how tables are related to each other and how you can query them.
Concepts of Primary Key vs Foreign Key
While we want to query our tables, there are few basic concepts that one must understand such as the concept of Primary and Foreign key. The primary key is used to ensure data in the specific column is unique. A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It uniquely identifies a record in the relational database table. Here is an example
Here are two tables: one for the customer and the other for payment.
Customer Table
Note the Customer_id has [PK] Primary Key mentioned in the customer table
Payment Table
Note the payment_id has [PK] Primary Key mentioned in the customer table and customer_id will act as a foreign key for this table. In case we need to connect both the customer table and the payment table, we will be using this customer_id column.
(Note- I have used POSTGRES as my DB and PgAdmin as my user interface)
ACID Properties
A transaction is a collection of instructions. To maintain the integrity of a database, all transactions must obey ACID properties. ACID is an acronym for-atomicity, consistency, isolation, and durability.
1. Atomicity A transaction is an atomic unit; hence, all the instructions within a transaction will successfully execute, or none of them will execute.
2. Consistency A database is initially in a consistent state, and it should remain consistent after every transaction.
3. Isolation If the multiple transactions are running concurrently, they should not be affected by each other.
4. Durability Changes that have been committed to the database should remain even in the case of software and hardware failure.
Getting Ready to Install
What I have realized is the best way to learn about SQL is to download the database engine, a UI for running queries and creating or downloading a sample database with few tables. I will suggest looking into an Opensource DB engine such as PostGres, PgAdmin as the user interface and there are various free sample databases that can be downloaded. One such example is DVD rental DB. Next, create a Database using the PgAdmin and restore the dvdrental db.
In my next post, let's take this to next level and start executing queries. Until then HAPPY LEARNING!!
Comments