notes

What is a RDBMS?

Think of a Relational Database Management System (RDBMS) as a vast collection of Excel workbooks. Each workbook (in database terms, we call these “tables”) contains different sheets, and on each sheet, you have rows and columns of data. The sheets are organized in a way that the data can be quickly searched, updated, inserted, or deleted.

A relational database consists of multiple tables that relate to each other. The relation between tables is formed in the sense of shared columns.

ACID

A transaction is a sequential group of statements or queries to perform single or multiple tasks in a database. Transactions may consist of a single read, write, delete, or update operations or a combination of these.

ACID – zbiór właściwości gwarantujących poprawne przetwarzanie transakcji w bazach danych

Schemas

Database schema is the structure of a database, refers to the organization of data as a blueprint that demonstrates how the database is constructed. So, a database schema describes how the data may relate to other tables or other data models. However, the schema does not actually contain data.

Database instance is a snap shot of data in a database at a single moment in time.

Types of keys and their functions.

Keys help you to uniquely identify row of data in a table despite the size of the database, which might contains thousand of records. It establishes and identify the relationship between tables.

Database Relationships

Database normalization

Normalization rules divide large tables into smaller tables and join them using relationships. The purpose of normalization in SQL is:

SQL

SQL stands for “Structured Query Language”. It’s used to manage & query data stored in a relational database management system (RDBMS).

What’s the difference between SQL, SQLite, MySQL, PostgreSQL, and SQL Server?

People casually use “SQL” interchangeably with “MySQL” and “PostgreSQL”. While that’s not technically correct, in most cases for beginners in the field, it doesn’t matter too much unless you want to be pedantic.

SQL is general, high-level language for querying and manipulating relational databases (RDBMS). MySQL, Postgres, SQLite, and SQL Server are all RDBMS. You use varying flavors of SQL syntax to query each unique RDBMS. For example, to query Postgres, you write PostgreSQL.

Here we use PostgreSQL because it is 2nd most popular, open source and extensible.

PostgreSQL

Basics

Aggregate functions

GROUP BY

GROUP BY command tells the database to separate your data into groups, which can then be aggregated independently.

SELECT 
	ticker, 
    AVG(open), 
    MAX(close)
FROM stock_prices
GROUP BY ticker
ORDER BY ticker;

You can GROUP BY two columns, even multiple columns, in SQL. To group by multiple categories, just separate column names with commas

HAVING

HAVING allows you to filter data based on values from aggregate functions (as opposed to WHERE clause which filters BEFORE grouping).

SELECT ticker, AVG(open)
FROM stock_prices
GROUP BY ticker
HAVING AVG(open) > 200;

DISTINCT

The DISTINCT SQL command is used in conjunction with the SELECT statement to return only distinct (different) values.

SELECT DISTINCT manufacturer
FROM pharmacy_sales;

You can use DISTINCT with aggregate functions – the most common one being COUNT

SELECT COUNT(DISTINCT user_id) 
FROM trades;

Math functions

CAST

In SQL, integer division discards the remainder from the output, providing only the integer (whole number) part of the result. You can still make SQL achieve decimal output with a few tricks such as:

NULL

In SQL, NULL indicates the absence of a value. Unlike an empty string or zero, NULL doesn’t represent a specific value, but rather, a missing or unknown piece of information.

Now, to correctly identify NULL values, we can utilize the IS NULL or IS NOT NULL condition:

SELECT *
FROM goodreads
WHERE book_title IS NULL;
SELECT COALESCE(book_rating, 0)
FROM goodreads;

If column_name is NULL, it returns the specified ‘expression’. Otherwise, it returns the value of column_name.

IFNULL() function: Handles two arguments, returning the second if the first is null; else, it returns the first.

SELECT book_title, IFNULL(book_rating, 0) AS rated_books
FROM goodreads;

CASE

The CASE statement evaluates the specified conditions sequentially and returns a value when it encounters the first true condition. If none of the conditions are met, the ELSE clause provides a default value

JOINS

SELECT *
FROM artists
JOIN songs
  ON artists.artist_id = songs.artist_id;

JOIN keyword, which is short for an INNER JOIN. There are actually 4 different types of JOINS in SQL.

Type of Joins Description
INNER JOIN Returns only the rows with matching values from both tables.
LEFT JOIN Returns all the rows from the left table and the matching rows from the right table.
RIGHT JOIN Returns all the rows from the right table and the matching rows from the left table.
FULL OUTER JOIN Returns all rows when there is a match in either the left or the right table. If there is no match, NULL values are returned for columns from the table without a match.

Date-time functions

Common Table Expression (CTE) and Subqueries

Common Table Expression, is like a query within a query. With the WITH statement, you can create temporary tables to store results, making complex queries more readable and maintainable. These temporary tables exist only for the duration of the main query, streamlining your analysis process.

-- Start of a CTE
WITH genre_revenue_cte AS (
  SELECT
    genre,
    SUM(concert_revenue) AS total_revenue
  FROM concerts
  GROUP BY genre
)
-- End of a CTE

SELECT
  g.genre,
  g.total_revenue,
  AVG(c.concert_revenue) AS avg_concert_revenue
FROM genre_revenue_cte AS g
INNER JOIN concerts AS c 
  ON g.genre = c.genre
WHERE c.concert_revenue > g.total_revenue * 0.5
GROUP BY g.genre, g.total_revenue;

By nesting queries within parentheses, you can generate temporary tables to perform calculations and filter data within the main query.

SELECT artist_name
FROM concerts
WHERE concert_revenue > (
  SELECT AVG(concert_revenue) FROM concerts);

Window functions

In simple terms, they are functions that operate by creating virtual “windows” within the dataset.

SELECT
  spend,
   SUM(spend) OVER (
     PARTITION BY product
     ORDER BY transaction_date) AS running_total
  FROM product_spend;

Here’s what each SQL command in the window function is doing:

SQL Order of Execution

Clause Order Description
FROM 1 The query begins with the FROM clause, where the database identifies the tables involved and accesses the necessary data.
WHERE 2 The database applies the conditions specified in the WHERE clause to filter the data retrieved from the tables in the FROM clause.
GROUP BY 3 If a GROUP BY clause is present, the data is grouped based on the specified columns, and aggregation functions (such as SUM(), AVG(), COUNT()) are applied to each group.
HAVING 4 The HAVING clause filters the aggregated data based on specified conditions.
SELECT 5 The SELECT clause defines the columns to be included in the final result set.
ORDER BY 6 If an ORDER BY clause is used, the result set is sorted according to the specified columns.
LIMIT/OFFSET 7 If LIMIT or OFFSET clause is present, the result set is restricted to the specified number of rows and optionally offset by a certain number of rows.

Other