SQL Tutorial

Introduction

SQL (Structured Query Language) is a specialized programming language developed in the 1970s by IBM to interact with relational databases. It became the standard language for managing structured data in relational database systems, and over time, it was adopted and enhanced by major database vendors. SQL is now governed by ANSI and ISO standards.

What is a Database?

A database is an organized collection of data stored electronically. It allows efficient data storage, retrieval, and manipulation. Databases are the backbone of modern software applications—from websites and mobile apps to enterprise-level systems.

What SQL Can Do

With SQL, you can:

  • Query and retrieve data using SELECT
  • Insert new records using INSERT
  • Update existing records with UPDATE
  • Delete records using DELETE
  • Create and manage database objects (tables, views, indexes, etc.)

Common Database Systems

There are many relational database systems that support SQL. While the core SQL syntax remains consistent across them, some commands, functions, and features may differ slightly from one system to another.

Database System Description
SQL Server Developed by Microsoft; widely used in enterprise environments.
MySQL Open-source and popular for web applications; backed by Oracle.
PostgreSQL Open-source, feature-rich, and highly extensible.
Oracle Database A powerful enterprise-grade system with advanced features.
SQLite Lightweight, file-based database commonly used in mobile apps.
MariaDB A community-driven fork of MySQL with additional features and enhancements.

💡 Note: SQL fundamentals remain the same across databases, but syntax, data types, and built-in functions may vary. For example, date functions or string manipulation can look a bit different depending on the database system you're using.

🛠️ In This Tutorial

This tutorial is designed from a data engineering perspective, with a focus on SQL Server, Microsoft’s relational database management system (RDBMS). While most concepts and syntax are applicable across other databases, you may encounter minor differences in areas such as:

  • Data types (DATETIME vs TIMESTAMP)
  • Function names (GETDATE() vs NOW())
  • Syntax conventions (e.g., TOP vs LIMIT)

SQL Commands

SQL is categorized into different types of commands, primarily into DDL, DML, and DCL. These commands are used to define, manipulate, and control access to the data in a database.

DDL (Data Definition Language)

DDL is a subset of SQL that deals with the structure or schema of a database. It allows you to define, modify, and delete database objects like tables, indexes, views, and schemas.

  • CREATE: Used to create new database objects like tables, views, and indexes.
    Ex: Creates a Customers table with three columns:
    CREATE TABLE Customers (CustomerID INT, Name VARCHAR(100), Age INT);
  • ALTER: Used to modify an existing database object.
    Ex: Adds an Email column to Customers:
    ALTER TABLE Customers ADD Email VARCHAR(100);
  • DROP: Used to delete an existing database object.
    Ex: Deletes the Customers table:
    DROP TABLE Customers;
  • TRUNCATE: Removes all records from a table but keeps the table structure intact.
    Ex: Clears all rows from Customers:
    TRUNCATE TABLE Customers;
  • RENAME: Used to rename a database object.
    Ex: Renames Customers to Clients:
    RENAME TABLE Customers TO Clients;

DML (Data Manipulation Language)

DML is a subset of SQL that deals with the manipulation of data in the database. It allows you to retrieve, insert, update, and delete data from tables.

  • SELECT: Used to retrieve data from one or more tables.
    Ex: Retrieves all columns and rows from the Customers table:
    SELECT * FROM Customers;
  • INSERT: Used to add new records (rows) to a table.
    Ex: Adds a new row to the Customers table with CustomerID as 1, Name as 'John Doe', and Age as 30:
    INSERT INTO Customers (CustomerID, Name, Age) VALUES (1, 'John Doe', 30);
  • UPDATE: Used to modify existing data in a table.
    Ex: Updates the Age of the customer with CustomerID 1 to 31:
    UPDATE Customers SET Age = 31 WHERE CustomerID = 1;
  • DELETE: Used to remove records from a table.
    Ex: Deletes the row where CustomerID is 1 from the Customers table:
    DELETE FROM Customers WHERE CustomerID = 1;

DCL (Data Control Language)

DCL is a subset of SQL used to control access to data in a database. It deals with permissions and security, allowing you to grant or revoke access rights to users or roles.

  • GRANT: Used to give specific permissions to a user or role on a database object.
    Ex: Grants the SELECT and INSERT permissions on the Customers table to User1:
    GRANT SELECT, INSERT ON Customers TO User1;
  • REVOKE: Used to remove specific permissions previously granted to a user or role.
    Ex: Revokes the SELECT and INSERT permissions from User1 on the Customers table:
    REVOKE SELECT, INSERT ON Customers FROM User1;

Transaction Controls

A transaction is a group of SQL operations (like INSERT, UPDATE, or DELETE) that are executed together. Transactions ensure data consistency by treating these operations as a single unit of work. If any operation fails, the entire transaction can be rolled back to maintain data integrity.

COMMIT

Saves the changes made in the current transaction to the database permanently. Once committed, the changes cannot be undone.

Example: The changes to account balances are saved permanently:

COMMIT;

ROLLBACK

Reverts the database to its previous state by undoing all changes made in the current transaction.

Example: All changes are undone, and the database is restored to its original state:

ROLLBACK;

SAVEPOINT

Creates a checkpoint in the transaction. You can rollback to a specific SAVEPOINT instead of undoing the entire transaction.

Example: Only the changes made after SAVEPOINT sp1 are undone. Changes before sp1 are retained:

SAVEPOINT sp1;
-- Perform some operations
ROLLBACK TO sp1;

DELETE vs TRUNCATE vs DROP

  • DELETE Command: The DELETE command is a Data Manipulation Language (DML) operation used to remove specific rows from a table based on a condition. It is slower in performance compared to TRUNCATE due to the generation of log records for each deleted row, but it supports rollback, allowing recovery of deleted rows if needed.
    Ex: Deletes rows from the employee table where the employee number is 10:
    DELETE FROM employee WHERE eno=10;
  • TRUNCATE Command: The TRUNCATE command is a Data Definition Language (DDL) operation that quickly removes all rows from a table. It is faster than DELETE because it does not generate individual log records for each row. However, TRUNCATE does not support rollback, meaning the data cannot be recovered once deleted.
    Ex: Removes all rows from the employee table:
    TRUNCATE TABLE employee;
  • DROP Command: The DROP command is used to remove entire database objects, such as databases, tables, stored procedures, and views, along with all their associated data and structures. Once executed, the operation cannot be rolled back, and all associated data is permanently lost.
    Ex: Deletes the entire employee table, including all its data and structure:
    DROP TABLE employee;

Order of SQL Execution

The sequence in which SQL statements are executed is crucial for understanding how queries are processed. Below is the correct order:

  • FROM: Specifies the source table(s) for the query.
  • JOIN: Combines rows from multiple tables based on a related column.
  • WHERE: Filters rows based on specified conditions.
  • GROUP BY: Groups rows sharing a property into summary rows.
  • HAVING: Filters groups based on specified conditions.
  • SELECT: Chooses which columns to include in the result set.
  • ORDER BY: Sorts the result set in ascending or descending order.

Constraints in SQL

Constraints in SQL are rules or restrictions applied to columns in a table to ensure the accuracy and reliability of the data in the database. These constraints help enforce data integrity by imposing conditions on the data stored in the table.

NOT NULL

Ensures that a column cannot have a NULL value.

Example: Ensures that the EmployeeID and LastName columns cannot have NULL values:

CREATE TABLE Employees (
  EmployeeID int NOT NULL,
  LastName varchar(255) NOT NULL,
  FirstName varchar(255)
);

DEFAULT

Provides a default value for a column when none is specified.

Example: Sets the default value of OrderDate to the current date if not provided:

CREATE TABLE Orders (
  OrderID int NOT NULL,
  OrderDate date DEFAULT GETDATE()
);

CHECK

Ensures that all values in a column satisfy a specific condition.

Example: Ensures that the Price column cannot have a negative value:

CREATE TABLE Products (
  ProductID int NOT NULL,
  ProductName varchar(255) NOT NULL,
  Price decimal CHECK (Price >= 0)
);

PRIMARY KEY

A column or set of columns that uniquely identifies each record within a table. It must have unique values and cannot contain NULL values. A table can have only one primary key, but it can consist of multiple columns (composite primary key).

Example: Sets the CustomerID as the primary key of the Customers table:

CREATE TABLE Customers (
  CustomerID int NOT NULL,
  LastName varchar(255) NOT NULL,
  FirstName varchar(255),
  PRIMARY KEY (CustomerID)
);

UNIQUE KEY

Ensures that all values in a column are unique.

Example: Ensures that the Username column has unique values:

CREATE TABLE Users (
  UserID int NOT NULL,
  Username varchar(255) NOT NULL UNIQUE,
  Email varchar(255)
);

FOREIGN KEY

Establishes a relationship between columns in two tables. A foreign key refers to a value in the primary key column of another table.

Example: Establishes a relationship between Orders and Customers through the CustomerID:

CREATE TABLE Orders (
  OrderID int NOT NULL,
  OrderDate date,
  CustomerID int,
  PRIMARY KEY (OrderID),
  FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

Aggregate Functions

Aggregate functions in SQL are used to perform calculations on multiple rows of data and return a single summary value. These functions are essential for generating insights from datasets.

SUM

Adds up all the values in a numeric column.

Example: Adds up all values in the column:

SELECT SUM(column_name) FROM table_name WHERE condition;

COUNT

Counts the number of rows that match a specific condition.

Example: Counts the rows where a condition is met:

SELECT COUNT(column_name) FROM table_name WHERE condition;

AVG (Average)

Calculates the average of a numeric column.

Example: Calculates the average of values in the column:

SELECT AVG(column_name) FROM table_name WHERE condition;

MIN (Minimum)

Returns the smallest value in a column.

Example: Retrieves the smallest value from the column:

SELECT MIN(column_name) FROM table_name WHERE condition;

MAX (Maximum)

Returns the largest value in a column.

Example: Retrieves the largest value from the column:

SELECT MAX(column_name) FROM table_name WHERE condition;

SQL Clauses and Functions

This section covers common SQL clauses and functions used for grouping, partitioning, filtering, and performing calculations in SQL queries.

WHERE

Filters records before any grouping or aggregation. It is used to filter rows based on a condition.

Example: Selects all customers older than 25:

SELECT * FROM Customers WHERE Age > 25;

GROUP BY

Used to group rows that have the same values into summary rows, typically with aggregate functions (e.g., COUNT, SUM, AVG).

Example: Groups Customers by their Country and counts the number of customers in each country:

SELECT Country, COUNT(*) FROM Customers GROUP BY Country;

HAVING

Similar to WHERE, but used to filter groups after an aggregation is performed. It works with GROUP BY.

Example: Filters countries with more than 10 customers:

SELECT Country, COUNT(*) FROM Customers GROUP BY Country HAVING COUNT(*) > 10;

PARTITION BY

Used in window functions (such as ROW_NUMBER(), RANK(), DENSE_RANK()) to divide the result set into partitions to which the window function is applied.

Example: Ranks employees by Salary within each Department:


SELECT EmployeeID, Salary, 
       RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS Rank 
FROM Employees;

DISTINCT

Removes duplicate records from the result set.

Example: Removes duplicates from the result:

SELECT DISTINCT Country FROM Customers;

CEIL() / CEILING()

Returns the smallest integer greater than or equal to a given number.

Example: Rounds 5.2 up to the nearest integer:

SELECT CEIL(5.2);

Result: 6

FLOOR()

Returns the largest integer less than or equal to a given number.

Example: Rounds 5.8 down to the nearest integer:

SELECT FLOOR(5.8);

Result: 5

CASE

A conditional expression that works like an IF-ELSE statement in SQL.

Example: Categorizes customers into Age Groups:


SELECT Name, Age,
       CASE
         WHEN Age < 18 THEN 'Minor'
         WHEN Age >= 18 AND Age <= 65 THEN 'Adult'
         ELSE 'Senior'
       END AS AgeGroup
FROM Customers;

Joins

JOINS in SQL are used to combine rows from two or more tables based on a related column between them. When you need to retrieve data from multiple tables in a relational database, you use joins to link the tables and return meaningful results.

INNER JOIN

The most common type of join. It returns only the rows that have matching values in both tables. If no match is found, the row is not included in the result.

Example: This will return only the orders that have a corresponding customer:


SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

LEFT JOIN (or LEFT OUTER JOIN)

Returns all rows from the left table (table1) and the matching rows from the right table (table2). If there is no match, NULL values are returned for columns of the right table.

Example: This will return all customers, and their corresponding orders (if any). If a customer doesn't have an order, the OrderID will be NULL:


SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

RIGHT JOIN (or RIGHT OUTER JOIN)

Similar to LEFT JOIN, but it returns all rows from the right table (table2) and the matching rows from the left table (table1). If there is no match, NULL values are returned for columns of the left table.

Example: This will return all orders, and the corresponding customer information (if available). If an order does not have a corresponding customer, the CustomerName will be NULL:


SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
RIGHT JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

FULL JOIN (or FULL OUTER JOIN)

Returns all rows when there is a match in either the left (table1) or right (table2) table. If there is no match, NULL values are returned for the missing side.

Example: This will return all customers and all orders. If a customer doesn't have an order, OrderID will be NULL, and if an order doesn't have a customer, CustomerName will be NULL:


SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
FULL JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

CROSS JOIN

Returns the Cartesian product of two tables. It combines each row from the first table with each row from the second table. This join doesn't require a condition.

Example: This will return every combination of customer and order (i.e., all possible pairs of customers and orders):


SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
CROSS JOIN Orders;

SELF JOIN

A self join is a join of a table with itself. It is useful when you need to compare rows within the same table.

Example: This query returns employees and their managers, using the same Employees table for both the employee and the manager:


SELECT A.EmployeeName AS Employee, B.EmployeeName AS Manager
FROM Employees A
JOIN Employees B ON A.ManagerID = B.EmployeeID;

NULL Functions

SQL provides various functions to handle NULL values in tables effectively. These functions are particularly useful in ensuring data completeness and handling conditional scenarios in queries.

ISNULL()

Replaces NULL with a specified value.

Example: This query replaces NULL in column1 with 'default_value':


SELECT ISNULL(column1, 'default_value') AS column1_with_default
FROM table_name;

NULLIF()

Returns NULL if two expressions are equal; otherwise, returns the first expression.

Example: This query returns NULL if column1 and column2 are equal:


SELECT NULLIF(column1, column2) AS result
FROM table_name;

COALESCE()

Returns the first non-NULL expression in a list of arguments.

Example: This query returns the first non-NULL value from column1, column2, or 'default_value':


SELECT COALESCE(column1, column2, 'default_value') AS first_non_null
FROM table_name;

Window Functions

Window functions perform calculations across a set of table rows that are related to the current row.

ROW_NUMBER()

A window function that assigns a unique row number to each row in a result set.

Example: This query assigns a unique row number to each student based on their score:


SELECT Name, Score, ROW_NUMBER() OVER (ORDER BY Score DESC) AS RowNum
FROM Students;

RANK()

A window function that assigns a rank to each row within a partition, with gaps in ranking for tied values.

Example: This query assigns a rank to students based on their score:


SELECT Name, Score, RANK() OVER (ORDER BY Score DESC) AS Rank
FROM Students;

DENSE_RANK()

Similar to RANK(), but it does not leave gaps in ranking for tied values.

Example: This query assigns ranks without gaps for students based on their score:


SELECT Name, Score, DENSE_RANK() OVER (ORDER BY Score DESC) AS DenseRank
FROM Students;

LEAD()

Accesses data from the next row.

Example: This query accesses the next salary in the result set:


SELECT EmployeeName, Salary, LEAD(Salary, 1, 0) OVER (ORDER BY Salary) AS NextSalary
FROM Employees;

LAG()

Accesses data from the previous row.

Example: This query accesses the previous salary in the result set:


SELECT EmployeeName, Salary, LAG(Salary, 1, 0) OVER (ORDER BY Salary) AS PreviousSalary
FROM Employees;

Date Functions

Working with dates and times is a big part of SQL Server development. Whether you're generating reports, filtering data, or scheduling tasks — these date functions are your go-to tools.

Below is a quick guide to the most commonly used date functions in SQL Server, complete with usage examples.

GETDATE()

Returns the current system date and time.

SELECT GETDATE();  
-- Output: 2025-04-20 14:30:00.000

GETUTCDATE()

Returns the current UTC date and time.

SELECT GETUTCDATE();

DATEADD()

Adds or subtracts a specified date part (like days, months, or years).

SELECT DATEADD(DAY, 7, GETDATE());       -- Adds 7 days
SELECT DATEADD(MONTH, -1, GETDATE());    -- Subtracts 1 month

DATEDIFF()

Returns the difference between two dates, in units you specify (e.g., days, months, years).

SELECT DATEDIFF(DAY, '2024-01-01', GETDATE());  
-- Days since Jan 1, 2024

EOMONTH()

Returns the last day of the month, optionally offset by months.

SELECT EOMONTH(GETDATE());  
-- Output: 2025-04-30

DATEPART()

Extracts a specific part of a date, such as year, month, or day — returned as an integer.

SELECT DATEPART(YEAR, GETDATE());  
-- Output: 2025

YEAR(), MONTH(), DAY()

Shortcut functions to extract basic parts of a date.

SELECT YEAR(GETDATE()), MONTH(GETDATE()), DAY(GETDATE());  
-- Output: 2025, 4, 20

DATENAME()

Returns the name (string) of the specified date part.

SELECT DATENAME(WEEKDAY, GETDATE());  
-- Output: 'Sunday'

FORMAT() (for display only)

Formats a date/time into a custom string format. Slower, but useful for presentation.

SELECT FORMAT(GETDATE(), 'yyyy-MM-dd');  
-- Output: '2025-04-20'

CONVERT() (to format or strip time)

Converts a date/time to a specific format or data type.

SELECT CONVERT(DATE, GETDATE());  
-- Output: '2025-04-20'

To customize format:

SELECT CONVERT(VARCHAR, GETDATE(), 23);  
-- Output: '2025-04-20' (style 23 = yyyy-mm-dd)

CAST() (convert data types)

The ANSI SQL standard way to convert between types — useful for trimming time or changing date formats.

✅ Common Uses:

-- Remove time
SELECT CAST(GETDATE() AS DATE);  
-- Output: 2025-04-20

-- Get only time
SELECT CAST(GETDATE() AS TIME);  
-- Output: 14:35:21.0000000

-- Convert string to date
SELECT CAST('2025-04-20' AS DATE);

-- Convert date to string
SELECT CAST(GETDATE() AS VARCHAR(20));  
-- Output: 'Apr 20 2025  2:35PM'

CAST() vs CONVERT() — What’s the Difference?

FeatureCAST()CONVERT()
PortabilityANSI SQL compliantSQL Server-specific
FormattingLimitedSupports format styles
SimplicityClean syntaxMore powerful for date conversions

ISDATE()

Checks if a string is a valid date format.

SELECT ISDATE('2025-04-20');  -- Returns 1 (valid)
SELECT ISDATE('invalid date'); -- Returns 0

SYSDATETIME(), SYSUTCDATETIME()

These return the current date/time with higher precision than GETDATE().

SELECT SYSDATETIME();       -- datetime2
SELECT SYSUTCDATETIME();    -- datetime2 in UTC

SWITCHOFFSET() (for time zone offsets)

Changes the time zone offset of a datetimeoffset value.

SELECT SWITCHOFFSET(SYSDATETIMEOFFSET(), '-05:00');

TODATETIMEOFFSET()

Converts a datetime or datetime2 value to a datetimeoffset by adding a time zone.

SELECT TODATETIMEOFFSET(SYSDATETIME(), '+05:30');

DATETRUNC() (SQL Server 2022+)

Truncates a datetime value to a specified precision like year, month, day.

SELECT DATETRUNC(MONTH, GETDATE());  
-- Output: 2025-04-01 00:00:00.000

DATEFROMPARTS(), DATETIMEFROMPARTS()

Constructs a date/datetime from individual values.

SELECT DATEFROMPARTS(2025, 04, 20);  
-- Output: 2025-04-20

SELECT DATETIMEFROMPARTS(2025, 04, 20, 14, 30, 0, 0);  
-- Output: 2025-04-20 14:30:00.000

TIMEFROMPARTS() (less common, but helpful in time data)

SELECT TIMEFROMPARTS(14, 45, 30, 0, 0);  
-- Output: 14:45:30.0000000

SQL Set Operations

SQL set operations are used to combine the results of two or more queries into a single result set. These operations include UNION, INTERSECT, EXCEPT, and MINUS (in some databases). Each operation has specific rules for how the result sets are combined.

UNION:

Combines the results of two or more queries and removes duplicate records. Both queries must have the same number of columns with the same or compatible data types.

SELECT Name FROM Customers
UNION
SELECT Name FROM Employees;

This retrieves all unique names from both Customers and Employees tables.

UNION ALL:

Similar to UNION, but does not remove duplicates. It combines all results, including duplicate rows.

SELECT Name FROM Customers
UNION ALL
SELECT Name FROM Employees;

This retrieves all names from Customers and Employees, including duplicates.

INTERSECT:

Returns only the rows that appear in both queries (common rows). Both queries must have the same number of columns and compatible data types.

SELECT Name FROM Customers
INTERSECT
SELECT Name FROM Employees;

This retrieves the names that exist in both Customers and Employees tables.

EXCEPT (or MINUS in some databases):

Returns the rows from the first query that do not appear in the second query. Both queries must have the same number of columns and compatible data types.

SELECT Name FROM Customers
EXCEPT
SELECT Name FROM Employees;

This retrieves the names that are in the Customers table but not in the Employees table.

System Databases

Master Database:

The Master database is crucial in SQL Server. It contains system-level information such as login accounts, configuration settings, endpoints, and linked servers.

SELECT name FROM master.sys.databases;

Model Database:

The Model database serves as a template for creating new user databases. Any objects or settings defined in the model database are automatically copied when a new database is created.

MSDB Database:

The MSDB database is used by SQL Server Agent for scheduling jobs, managing alerts, and automating tasks. It also stores backup and restore operation histories.

SELECT name, enabled FROM msdb.dbo.sysjobs;

TempDB Database:

The TempDB database is used to store temporary data such as temporary tables, stored procedures, and other objects generated during query processing in SQL Server.

Note: User databases are created to store application-specific data and tables. These databases are distinct from system databases and are used exclusively to manage the application's data.

Recovery Models

Simple Recovery Model:

The Simple Recovery Model is used for basic data protection. The transaction log is automatically truncated after each checkpoint, allowing space to be reused, but it does not support transaction log backups. As a result, point-in-time recovery is not possible. This model is ideal for development or test databases where occasional data loss between backups is acceptable.

Full Recovery Model:

The Full Recovery Model provides complete data protection by recording every transaction in the transaction log. It supports point-in-time recovery and maintains transaction logs until they are backed up or manually truncated. This model is ideal for databases requiring high data availability and minimal data loss, supporting full, differential, and transaction log backups.

Bulk-Logged Recovery Model:

The Bulk-Logged Recovery Model is designed for handling bulk operations efficiently. It minimally logs bulk operations to reduce transaction log size and improve performance. While point-in-time recovery is not supported during bulk operations, it is possible under specific conditions. This model is ideal for databases with frequent bulk operations where performance is prioritized but some loss of granularity in recovery is

Additional Concepts

Types of Indexes

  • Clustered Index:
    • One per table, sorts data.
  • Non-Clustered Index:
    • Multiple per table, does not sort data.

Clustered Index:

A clustered index sorts the physically stored data rows in the table and creates an index key. Since the data is physically sorted, only one clustered index can be created per table. It is best suited for columns that are frequently used in range queries (e.g., date ranges) or for frequently accessed data. By default, the primary key has a clustered index.

The clustered index can be created on a single column or multiple columns. When multiple columns are used, the index is known as a composite clustered index.

Syntax:

CREATE CLUSTERED INDEX IX_ClusteredIndexName 
ON TableName (Column1, Column2, ...);

Non-Clustered Index:

A non-clustered index does not alter the physical order of the table and creates a separate structure to store sorted index keys along with pointers to the actual data rows. Multiple non-clustered indexes can be created on a single table. It is ideal for columns that are frequently used in search conditions (e.g., WHERE clauses) rather than range queries.

Syntax:

CREATE NONCLUSTERED INDEX IX_NonClusteredIndexName 
ON TableName (Column1, Column2, ...);

NOTE:
Regularly rebuild or reorganize indexes to ensure they remain effective, especially if there are many updates or deletions.

Common Table Expression (CTE)

A Common Table Expression (CTE) in SQL Server is a temporary result set, which can be referred within the execution scope of a SELECT, INSERT, UPDATE, or DELETE statement. It simplifies complex queries and supports recursive operations. Below are the use cases.

  • CTEs are ideal for breaking down complex queries into smaller, more manageable parts.
  • CTEs support recursive operations, where a query can refer to itself.

Basic CTE:
This CTE selects all employees from the HR department:

WITH HR_Employees AS
(
  SELECT EmployeeID, EmployeeName
  FROM Employees
  WHERE Department = 'HR'
)
SELECT EmployeeID, EmployeeName
FROM HR_Employees;

Recursive CTE to Generate Dates:
This recursive CTE generates a list of dates between the specified start and end dates:

DECLARE @StartDate DATE = '2003-01-01';
DECLARE @EndDate DATE = '2023-01-10';

WITH cte AS (
  SELECT @StartDate AS CurrentDate
  UNION ALL
  SELECT DATEADD(DAY, 1, CurrentDate)
  FROM cte
  WHERE CurrentDate <= @EndDate
)
SELECT
  CurrentDate,
  YEAR(CurrentDate) AS Yearname,
  MONTH(CurrentDate) AS Month,
  DATENAME(MM, CurrentDate) AS MonthName,
  DATENAME(DW, CurrentDate) AS Days
FROM cte
OPTION (MAXRECURSION 0);

Views

View: A SQL view is a virtual table that stores a query definition. It does not store data physically but provides a way to present data as a result of a SELECT statement.

CREATE VIEW priyaview AS
SELECT * FROM emp;

Materialized View: A Materialized View physically stores the data and needs to be refreshed to get the latest data. It can become outdated between refresh intervals.

CREATE MATERIALIZED VIEW emp_view AS
SELECT * FROM employees
WITH DATA;

Table Variables

Table variables store datasets in memory during execution. They are ideal for small datasets, as they reduce disk I/O operations. However, if the dataset grows large, it spills over to disk, impacting performance.

Table variables are scoped to the batch, stored procedure, or function in which they are declared.

DECLARE @EmployeeTable TABLE (
    Name VARCHAR(100),
    ID INT
);

INSERT INTO @EmployeeTable (Name, ID)
VALUES ('Rajesh', 1);

SELECT * FROM @EmployeeTable;

Temporary Tables

Temporary tables are used for creating temporary storage during complex queries or DML operations. These tables are stored in the tempdb database and can handle large datasets. There are two types of temporary tables:

  • Local Temporary Tables: Scoped to the user's session and dropped automatically at the end of the session.
  • Global Temporary Tables: Accessible by all sessions and dropped when the last session using them is closed.
CREATE TABLE #TempTable (ID INT, Name VARCHAR(50));
INSERT INTO #TempTable (ID, Name)
VALUES (1, 'Alice'), (2, 'Bob');

SELECT * FROM #TempTable;
DROP TABLE #TempTable;

You can insert data from a global temporary table into a local temporary table for further processing:

INSERT INTO #localemp
SELECT * FROM ##globalemp;

Subqueries

subquery: A subquery is a query nested inside another query. It is often used in clauses like SELECT, WHERE, or FROM. It helps filter data based on the result of another query.

Example:

SELECT EmployeeName, DepartmentID
FROM Employees
WHERE DepartmentID IN (SELECT DepartmentID FROM Departments WHERE Budget > 50000);

In this example, we retrieve employees who work in departments with a budget greater than $50,000 using a subquery.

corelated subquery: A correlated subquery refers to a subquery that depends on the outer query. The subquery is evaluated once for each row processed by the outer query. It typically uses values from the outer query in its execution.

Example:

SELECT EmployeeName, Salary, DepartmentID
FROM Employees e1
WHERE Salary > (SELECT AVG(Salary) FROM Employees e2 WHERE e1.DepartmentID = e2.DepartmentID);

This query selects employees whose salary is greater than the average salary in their department. The subquery compares each employee's salary to the average salary within the same department, making it a correlated subquery.

Outliers in SQL

Outliers in SQL refer to data points that are significantly different from most of the other values in a dataset. These unusual values can skew analysis results or indicate errors or anomalies.

For example, in a table of customer purchases, if most purchase amounts are between $10 and $100, but one is $10,000, that $10,000 value would be an outlier.

To identify outliers, you can use methods like:

  • Using statistical methods: For example, finding values that are outside a certain number of standard deviations from the mean.
  • Using percentiles: For example, identifying values in the top or bottom 1% of the dataset.

Merge Join

A Merge Join is a join technique used when both tables being joined are sorted on the join column. The database compares rows from both tables and merges them based on the join condition. It is more efficient when the data is pre-sorted or when using large datasets.

For example, to join two tables efficiently using a merge join:

SELECT * FROM table1 AS t1
JOIN table2 AS t2 ON t1.column = t2.column;

Triggers:

A trigger is a special type of SQL procedure that automatically runs in response to certain events in a database, such as inserting, updating, or deleting data.

Types of Triggers:

  • AFTER Trigger: Executes after a change (such as an INSERT, UPDATE, or DELETE) is made to a table. It is useful for tasks like logging, auditing, or updating related tables.
  • BEFORE Trigger: Executes before a change is made to a table, allowing for validation, dependency checks, or the prevention of invalid operations.
  • INSTEAD OF Trigger: Replaces the default operation with custom logic, allowing you to control what happens during the data manipulation task (such as blocking invalid data or manipulating data in a different way).

Syntax:

CREATE TRIGGER trigger_name
AFTER | BEFORE | INSTEAD OF INSERT | UPDATE | DELETE ON table_name
BEGIN
  -- Action to be performed when the event occurs
END;

Cursors

Cursors are used to retrieve and process rows from a result set sequentially, one row at a time. Cursors enable you to iterate through each row, apply logic, and perform actions on individual records.

Types of Cursors:

  • Implicit Cursor: This type of cursor is automatically created by the database management system (DBMS) when a SQL statement is executed. It's commonly used for single-row queries or when processing queries that return a single result set.
  • Explicit Cursor: An explicit cursor is defined and controlled by the user through SQL commands. It gives more control over fetching and processing data row by row. Explicit cursors are typically used for handling complex queries or when you need to process multiple result sets sequentially.

Data Modeling Techniques

Data Warehousing (DWH)

A Data Warehouse (DWH) is a centralized repository designed to store large volumes of structured data from various sources. It is primarily used for ETL (Extract, Transform, Load) processes and analytical purposes. A Data Warehouse consolidates historical data from multiple sources, storing it for long-term analysis. This data is typically used for trend analysis and business reporting.

Data Warehousing integrates data marts into a central repository, enabling efficient data management, seamless access to large datasets, and streamlined querying across an organization. It supports advanced analytics and decision-making processes by providing a unified platform for data access and reporting.

Data Mart

A Data Mart is a subset of a data warehouse focused on specific business functions or departments, such as sales, marketing, or finance. It simplifies data management and analysis for that particular area, providing a more specialized and efficient way of working with data. Data marts are often created to meet the unique needs of business units or teams, ensuring that they have access to the data most relevant to their operations.

Dimensional Table

A Dimensional Table contains detailed descriptive information about the dimensions of the business, such as time, geography, products, or customers. These tables aid in analysis and reporting by providing descriptive attributes for the data stored in the fact tables. Examples of dimensional attributes include product names, time periods, or customer names.

Types of Dimensions

  • Role-playing: A dimension table used multiple times with different meanings in the same schema. For example, a "Date" dimension could be used for both "Order Date" and "Ship Date".
  • Conformed: A dimension table that maintains relationships with multiple fact tables across the schema. These dimensions are standardized to be used across various business processes.
  • Junk: Contains non-essential attributes that are not critical for analysis, often used to reduce the size of the fact table by grouping these attributes together.
  • Degenerated: Dimensional attributes that are stored directly in the fact table rather than in a separate dimension table. Typically used for transactional data where no additional descriptive details are needed.
  • Slowly Changing Dimension (SCD): Techniques used to manage changes in dimension attributes over time. These techniques help in tracking historical changes in dimension attributes such as customer addresses or product categories.

Fact Table

A Fact Table stores numerical data that can be aggregated for analysis, such as sales amounts, transaction counts, or revenue figures. It contains foreign keys that reference dimensional tables, allowing for the data to be analyzed across different dimensions. Fact tables are typically used for performing calculations and aggregations in data warehousing and business intelligence applications.

Types of Fact Table Measures

  • Additive: Measures that can be summed across all dimensions, such as sales revenue or transaction count.
  • Semi-additive: Measures that can be summed across some dimensions but not all. For example, account balance can be summed across products but not across time.
  • Non-additive: Measures that cannot be summed across any dimension, such as ratios or percentages.

Star Schema

The Star Schema consists of a central fact table surrounded by dimensional tables, forming a star-like structure. This schema simplifies queries and enhances query performance, making it easier to retrieve and aggregate data from multiple dimensions. It is a common schema type in data warehousing.

Snowflake Schema

The Snowflake Schema is an extension of the star schema where the dimensional tables are normalized to reduce redundancy. While this improves data integrity and reduces storage requirements, it can increase complexity by requiring more joins in queries.

Surrogate Key

A Surrogate Key is a unique identifier for each row in a table, often used in place of natural business keys. It enhances performance and data integrity by providing a consistent key that is independent of business logic changes, like changes in names or codes.

Master Table

The Master Table contains key data about primary business entities. Examples include:

  • Customer Table: Stores customer information like ID, name, and contact details.
  • Product Table: Stores product details like ID, name, price, and description.

Transaction Table

The Transaction Table records transactions or events within a business. Examples include:

  • Sales Table: Captures details of sales transactions, such as customer ID, product ID, amount, and quantity.

OLTP (Online Transaction Processing)

OLTP systems are designed to manage day-to-day transactional data. These systems are used in operational environments such as e-commerce, banking, and order processing. OLTP databases prioritize fast insert, update, and delete operations.

OLAP (Online Analytical Processing)

OLAP systems are designed for complex querying and data analysis. They support multi-dimensional queries, trend analysis, and decision-making processes. OLAP databases are used in business intelligence systems to analyze large volumes of historical data.

Slowly Changing Dimension (SCD)

A Slowly Changing Dimension (SCD) refers to methods used to manage changes in dimension data over time. These methods are categorized as follows:

  • SCD1: Overwrites old data with new data, without maintaining any historical data.
  • SCD2: Creates new records for each change in dimension data, preserving historical data for trend analysis.
  • SCD3: Tracks limited historical data by adding new columns to the existing record, preserving a few versions of changes.

ER Modeling

The Entity-Relationship (ER) Model is a data modeling technique used to visually represent the structure of a database. It consists of entities, attributes, and the relationships between them. ER models help in designing databases by mapping the real-world entities and their relationships.

Share This :
Scroll to Top