第一讲Introduction
Lecture Objectives
- Describe what Database Management Systems (DBMSs) are.
- Recognize the problem of Data Redundancy.
- Explain the basic Relational Data Model and some basic Relational Data Model concepts.
- Relate SQL basic concepts.
Outline
- DBMS
- Data Redundancy
- File System vs Databases
- Relational Model
- RDBMS
- SQL
- Questions and Answers
Storing & Managing Information
- Relational database models focus on storing data efficiently.
- They enable retrieval of specific information and obtaining new insights from stored data.
What is a Database and DBMS?
- A database is a shared integrated computer structure that stores a collection of end-user data and metadata.
- Metadata provides a description of the data characteristics and links (relationships) between sets of data.
- A database is an organized electronic filing cabinet designed to define, manipulate, retrieve, and manage data.
- A software called the Database Management System (DBMS) helps manage the database’s contents.
Role and Advantages of DBMSs
- Improved data sharing.
- Better data integration.
- Minimized data inconsistency (storing different versions of the same data).
- Improved data access (quick answers to ad hoc queries).
- Improved decision making.
Why Database Design is Important?
- Focus on transactional databases designed to support a company’s day-to-day operations.
- Database design defines the structure of the database to create and maintain data consistency and improve operational speed.
- A well-designed database facilitates data management through the generation of accurate and valuable information.
- A poorly designed database may lead to errors and bad decision-making, potentially causing organizational failure.
Historical Roots: Manual File Systems
- Managers in small organizations kept manual file systems to track important data.
- Manual file systems are slow and cumbersome for querying.
- Example questions:
- What products did I sell last month, quarter, or year?
- How does my sales volume this year compare to last year?
Historical Roots: Electronic File Systems
- Initially similar to manual file systems but automated.
- Created file structures and software to manage these structures and generate reports automatically.
- File systems grew rapidly, leading to challenges in data management and the need for more robust computer systems.
- Data redundancy and inconsistency were common due to the dispersion of data across different locations.
Data Redundancy: Islands of Information
- Data redundancy leads to data inconsistency, with different versions of the same data existing in different places.
- Example scenario: Sales agent updates address in HR but not in the sales department, leading to inconsistency.
Data Redundancy: Data Anomalies
- Update Anomalies: If a location changes, many rows need to be updated.
- Insertion Anomalies: Creating a department without employees results in many NULL values.
- Deletion Anomalies: Deleting the last employee in a department can result in the loss of the department.
Database Systems vs File Systems
- File Systems: Data spread across different locations, leading to redundancy and inconsistency.
- Database Systems: Centralized data management reduces redundancy and ensures consistency.
Overview of the Relational Model
- First proposed by British data scientist E.F. Codd in 1970.
- Based on mathematical concepts of sets, tuples, and relations.
- Set: Collection of items of the same type with no order or duplicates.
- Tuple: Finite ordered list of elements (n-tuple is a sequence of n elements).
- Relation: Set of n-tuples with no order on n-tuples.
Relational Model Components
- Data stored within relations (tables).
- Relation consists of attributes (columns) and tuples (rows).
- Attributes have associated domains (set of all possible values).
Advantages of Relational DBMS (RDBMS)
- Designed specifically for relational databases.
- Based on sound mathematical theory.
- Implemented with a declarative language for querying.
- Provides a secure, consistent repository with scalable capability.
- No unnecessary repetition of data.
- Data mapped logically, independent of physical location/storage.
RDBMS Architecture
- External Level: Customized user views of the data.
- Conceptual Level: Community view of all available data, independent of storage details.
- Internal Level: Describes physical storage of data.
Disadvantages of RDBMS
- Requires substantial memory and processing power.
- Needs rigorous data modeling.
- Data redundancy and anomalies can occur with an ill-defined database model.
- Best suited for environments where data structures and relationships are well-known.
Structured Query Language (SQL)
- Standard language for relational databases accepted by ANSI and ISO/IEC.
- Provides statements for querying, inserting, updating, deleting rows, and managing database objects and access.
SQL Sublanguages
- Data Definition Language (DDL): Defines database structure.
- Data Manipulation Language (DML): Manipulates data.
- Data Control Language (DCL): Controls access to the database.
- Transaction Control Language (TCL): Manages transactions.
Summary
- The module will teach relational databases.
- Historical roots lie in file systems, which suffer from data redundancy and anomalies.
- Relational databases, based on solid mathematical principles, address many file system issues.
- Adequate data modeling techniques are essential to prevent data anomalies in relational databases.
This overview provides a comprehensive understanding of databases, DBMS, and their historical context, highlighting the importance of well-designed relational databases and the role of SQL in managing them.
第二讲Data Modelling I
Lecture Objectives
- Identify the three phases of database design.
- Model data by creating an Entity-Relationship (ER) model.
Outline
- Three Phases of Database Design
- Conceptual
- Logical
- Physical
- Creating the Conceptual Model
- Entity-Relationship (ER) Model
- Questions and Answers
Criteria to Produce an Optimal Data Model
- Structural validity: Consistency with the way the enterprise defines and organizes information.
- Simplicity: Ease of understanding by Information Systems professionals and non-technical users.
- Expressiveness: Ability to distinguish between different data, relationships between data, and constraints.
- Non-redundancy: Exclusion of extraneous information; representation of any piece of information exactly once.
- Shareability: Not specific to any particular application or technology, usable by many.
- Integrity: Consistency with the way the enterprise uses and manages information.
- Extensibility: Ability to evolve to support new requirements with minimal effect on existing users.
- Representation: Ability to represent a model using an easily understood diagrammatic notation.
Three Phases of Design
- Conceptual Design
- Independent of implementation considerations.
- Built using user requirements specifications.
- Feeds into the logical design phase.
- Logical Design
- Enterprise model based on a specific data model (e.g., relational).
- Independent of DBMS and physical storage.
- Used by software applications.
- Conceptual model mapped to a logical data model.
- Ensures no information is lost.
- Includes normalization.
- Physical Design
- How the data is stored in databases.
- Describes file organizations, indexes, and distribution for efficient data access.
- Focuses on security and efficient implementation of constraints/integrity rules.
- Tailored to a specific DBMS.
Conceptual Model
- Entity-Relationship Model (ER)
- Described by Chen in 1976.
- Based on strong mathematical foundations (Set Theory, Mathematical Relations, Modern Algebra, Logic).
- Graphical and easy to understand.
- Several variations, including Chen and UML representations, and cardinality specifications (Crowfeet, SSADM, Min-Max).
Requirements Analysis for ER Model
- Understand what the organization does.
- Identify roles within the organization.
- Determine information requirements for each type of user.
- Identify business rules or constraints.
- Iteratively refine and test the model.
- Clarify assumptions as soon as possible.
- Continuously verify with users.
ER Model Concepts
- Entity Type: Group of objects with the same properties, identified as having independent existence.
- Entity Occurrence: Uniquely identifiable instance of an entity type.
- Example: “Doctor” is an entity type; “Dr. Hibbert” and “Dr. Nick” are instances.
- Attributes: Relevant properties (e.g., adjectives) about the entity.
- At least one identifying attribute or combination of attributes for each entity occurrence.
- Example attributes for a “Patient” entity might include hair color, date of birth, and religion.
- Attribute Domain: The possible set of values for an attribute.
- Examples:
- Hair_Color: brown, black, blonde, white, grey, brunette
- Date_of_Birth: date
- Student_ID: integer
- Examples:
- Attribute Types:
- Simple Attribute: Single component with an independent existence (e.g., surname, salary).
- Composite Attribute: Multiple components (e.g., address, name).
- Single-Valued Attribute: Holds a single value for each entity instance (e.g., student number).
- Multi-Valued Attribute: Holds multiple values for each entity instance (e.g., company phone numbers).
- Derived Attribute: Value derivable from related attributes or another entity type (e.g., total salary = salary + bonus).
ER Model Example
- Relationships: How entities are related.
- Use verbs to denote relationships (e.g., doctors treat patients).
- Each entity should be related to at least one other entity.
- Relationship Types: Set of meaningful associations among entity types.
- Relationship Occurrences: Unique associations including one instance from each participating entity type.
- Relationship Representation: Lines between entity boxes denote relationships; relationships should be named.
Constraints and Assumptions
- Constraints: Rules ensuring the validity of data (e.g., “Every patient must be registered with a doctor”).
- Assumptions: Represent expectations about reality, to be clarified during the modeling process (e.g., “Patients will only have one appointment per day”).
Conceptual Model Summary
- Steps to Create an ER Model:
- Identify entity types.
- Determine attributes.
- Identify relationship types.
- Associate attributes with entity or relationship types.
- Add additional constraints and assumptions as necessary.
Summary
- Database design consists of three stages: Conceptual, Logical, and Physical.
- Good database design is challenging, and skilled practitioners are rare.
- Database design is not an exact science; multiple solutions can exist for the same problem.
- The main output of the conceptual stage is the ER model, which includes entities, attributes, relationships, and constraints.
第三讲Data Modelling II
Lecture Objectives
- Implement the remaining phases of the database design: Logical and Physical.
- Understand relational database data integrity rules.
- Describe and implement the Date-Codd notation.
Outline
- Recap
- Main Entity Relationship Model (ER): Entity, Attributes, Relationships, Constraints
- Three Phases of Database Design: Logical/Physical
- Logical Design
- Physical Design
- Attributes domains
- Keys and Foreign Keys
- Integrity rules
- Date-Codd Notation
- Questions and Answers
Recap – Data Modelling
- Main Entity Relationship Model (ER)
- Entity: Represents real-world objects or concepts.
- Attributes: Properties or details about the entities.
- Relationships: How entities relate to each other.
- Constraints: Rules that ensure data integrity.
Three Phases of Database Design
Conceptual Design
- Independent of implementation considerations.
- Based on user requirements specifications.
- Provides input to the logical design phase.
Logical Design
- Enterprise model based on a specific data model (e.g., relational).
- Independent of DBMS and physical storage.
- Used by software applications.
- Ensures no information is lost.
- Includes normalization.
Physical Design
- Describes how data is stored in databases.
- Focuses on file organizations, indexes, and distribution for efficient data access.
- Implements security and integrity rules.
- Tailored to a specific DBMS.
Properties of a Table (Relation)
- 2D structure comprising rows and columns.
- Each row (tuple) describes a single entity of the relation.
- No duplicate rows allowed.
- Each column represents an attribute with a distinct name.
- Each cell contains a single atomic value.
- All values in a column must have the same data format.
- Each attribute has a specified range of values (attribute domain).
- The order of rows and columns is insignificant to the DBMS.
- Each table must have a combination of attributes that uniquely identify each row.
Attributes Domains
- Definition: The possible set of values for an attribute.
- Examples:
- Hair_Color: brown, black, blonde, white, grey, brunette
- Date_of_Birth: date
- Student_ID: integer
Benefits of Establishing Data Types
- Storage Efficiency: Optimizes space on disk.
- Predictions: Allows the DBMS to manage memory allocation efficiently.
- Data Consistency: Ensures only valid data is accepted.
Common Data Type Categories
- Text or Character:
char(5)
: Always 5 characters in length.varchar(100)
: Variable length up to 100 characters.varchar
: Variable length string up to 65,535 bytes.
- Integer:
SMALLINT
: Range -32,768 to 32,767, 2-byte size.INT
: Range -2,147,483,648 to 2,147,483,647, 4-byte size.SERIAL
: Automatically generated integers.
- Floating-Point Number:
float(n)
: Floating-point numbers withn
precision, up to 8 bytes.float8
orreal
: 4-byte floating-point numbers.
- Temporal Data Types:
DATE
: Stores dates.TIME
: Stores time of day.TIMESTAMP
: Stores date and time.TIMESTAMPTZ
: Timezone-aware timestamp.INTERVAL
: Stores periods of time.
Relational Database Keys
- Primary Key: A candidate key selected to uniquely identify each occurrence of an entity type.
- Composite Key: A candidate key that consists of two or more attributes.
- Natural Key: Identifying key derived from the entity’s existing attributes (e.g., NHS number for a patient).
- Surrogate Key: An artificial key introduced for efficiency, usually an integer (e.g., patient number).
Controlled Redundancy
- Ensures that the relationships in the database work properly.
- Common attributes link tables together.
- Necessary duplication of values is not considered redundant.
- Unnecessary duplication is considered redundancy.
Foreign Keys
- Attributes whose values match values of a candidate key in another relation.
- Used to combine information from multiple relations (tables).
Integrity Rules
- Entity Integrity: Ensures the primary key has no null values and all entries are unique.
- Referential Integrity: Ensures that foreign keys properly reference primary keys in another table.
Relational Model – Date-Codd
- Codd-Date Definition:
- No diagrams, tables referred to as relations.
- Uses predicate logic and extensive mathematics.
- Based on Codd-Dates rules.
- Uses textual notation to express relations.
Date-Codd Steps
- Name the Relation:
- Example:
Employee
- Example:
- List Attribute Domains:
- Example:
Empno: int, EName: char(50), Salary: float, DeptNo: int
- Example:
- Specify Keys and Attributes:
- Example:
primary key Empno NOT NULL, foreign key DeptNo references Department NOT NULL
- Example:
- Add Constraints:
- Example:
An employee can be assigned to a single department at most.
- Example:
Example: Employee and Department Relations
Employee Relation
Attribute | Domain | Key |
---|---|---|
Empno | int | Primary Key, NOT NULL |
EName | char(50) | |
Salary | float | |
DeptNo | int | Foreign Key, NOT NULL |
Department Relation
Attribute | Domain | Key |
---|---|---|
DeptNo | int | Primary Key, NOT NULL |
DeptName | char(50) | |
Loc | varchar(13) |
Example with Constraints
Employee Relation
Attribute | Domain | Key and Constraints |
---|---|---|
Empno | int | Primary Key, NOT NULL |
EName | char(50) | |
Salary | float | |
DeptNo | int | Foreign Key references Department, NOT NULL |
Constraint: An employee can be assigned to a single department at most. |
Department Relation
Attribute | Domain | Key and Constraints |
---|---|---|
DeptNo | int | Primary Key, NOT NULL |
DeptName | char(50) | |
Loc | varchar(13) | |
Constraint: A department can have many or no employees assigned. |
Summary
- Concepts such as keys, foreign keys, and domains need to be defined in the logical/physical database design phases.
- The model must follow entity and referential integrity rules.
- The Codd-Date notation provides an alternative way to represent a database design.
第四讲Normalisation
Lecture Objectives
- Identify and describe the steps of Normalisation.
- Analyse data to define a strategy to implement the normalisation process to remove data anomalies, unnecessary and uncontrolled data redundancy, and ensure data quality.
Outline
- Recap
- Normalisation
- Dependencies
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Summary
- Q&A
Recap – Data Modelling
- Main Entity Relationship Model (ER):
- Entity
- Attributes
- Relationships
- Constraints
- Domains
- Keys
- Foreign Keys
Table (Relation) Design Steps:
- Collect User Requirements
- Construct ER Diagram
- Transform into Relations
- Normalise
What is Normalisation?
- Method of stripping down each relation to its essential attributes, avoiding repetition and data loss.
- Technique usually results in more relations/tables, each with fewer attributes.
- A well-designed database should comply with three levels of normalisation (1NF, 2NF, 3NF).
- Much of normalisation theory is common sense but needs to be understood.
Functional Dependency of Attributes/Columns
- Applies to attributes within a single relation/table.
- Attribute B is functionally dependent on attribute A if for every value of A, there is only one possible value of B.
- Functional dependency can also involve combinations of attributes (e.g., A,B,C → D).
Functional Dependency Example
Ename
is functionally dependent onEmpno
:- Empno → Ename
- For every value of
Empno
, there is only one possible value ofEname
.
Full Functional Dependency
- Functional dependencies are full if the dependent attribute depends on all the determinant attributes, not on a subset.
Normalisation – First Normal Form (1NF)
- A relation is in 1NF if:
- There are no repeating groups in the table.
- There are no multi-valued attributes.
- All non-primary-key attributes are functionally dependent on the primary key.
- Properly defined relations are already in 1NF.
1NF Example
Initial Table:
Invoice(Inv_no, Cust_no, Cust_address, Inv_date, Item_code, Item_desc, Item_price, Quantity, Price)
1 10 XY1 WZ3 10/09/14 123 Bed 600 2 1200
2 15 WX4 QA9 11/09/14 124,127 Desk, Chair 150, 50 3, 3 600
Steps to achieve 1NF:
- Split attributes with repeated values.
- Move non-repeating and repeating attributes to separate relations.
- Identify primary keys and establish relationships with foreign keys.
Normalisation – Second Normal Form (2NF)
- A relation is in 2NF if:
- It is in 1NF.
- Every non-primary-key attribute is fully functionally dependent on the primary key.
- Non-loss decomposition is used to achieve 2NF, ensuring no data loss.
2NF Example
Original Table:
Invoice(Inv_no, Cust_no, Cust_address, Inv_date)
1 10 XY1 WZ3 10/09/14
2 15 WX4 QA9 11/09/14
Invoice Detail:
Invoice_Detail(Inv_no, Item_code, Item_desc, Item_price, Quantity, Price)
1 123 Bed 600 2 1200
2 124 Desk 150 3 450
2 127 Chair 50 3 150
Steps to achieve 2NF:
- Move attributes not fully dependent on the primary key to a new relation with the determining key.
- Update original and new relations accordingly.
Normalisation – Third Normal Form (3NF)
- A relation is in 3NF if:
- It is in 2NF.
- Every non-primary-key attribute depends only on the primary key (no transitive dependencies).
3NF Example
Steps to achieve 3NF:
- Identify non-primary-key attributes that depend on other non-primary-key attributes.
- Move these attributes to a new relation with the correct primary key.
- Maintain relationships by preserving foreign keys.
Summary of Normalised Model
- 1NF: Remove repeating groups and define a primary key.
- 2NF: Ensure full functional dependency for non-primary-key attributes.
- 3NF: Remove transitive dependencies, ensuring non-primary-key attributes depend only on the primary key.
Higher Forms of Normalisation
- 4NF: Eliminates multiple multi-valued dependencies.
- 5NF: Eliminates all join dependencies not implied by candidate keys.
- While 3NF is sufficient for most practical purposes, higher normal forms are available for specific scenarios.
In the Real World
- Some denormalisation may be needed for performance reasons (e.g., summary information in a data warehouse or copies of tables at both ends of a network).
- Always normalise first, then denormalise if necessary, knowing the reasons for doing so.
Summary
- Normalisation is the process of structuring a relational database to reduce redundancy and improve data integrity.
- A well-designed database should ideally comply with the third normal form (3NF) or higher.
- Normalisation ensures the data is free from anomalies (insertion, update, and deletion anomalies).
第五讲Modelling III
Lecture Objectives
- Identify advanced modelling concepts and structures.
- Recognize possible solutions to unravel issues related to advanced data modelling structures.
Outline
- Many-to-Many Relationships
- More than One Relationship Between Two Entities
- Recursive Relationships
- Binary/Ternary Relationships
- Weak Entities
- ER Connection Traps
- Fan Traps
- Chasm Traps
Many-to-Many Relationships
- Many-to-Many relationships should be resolved before transforming the ER model into a relational model.
- Example: A student is registered in many modules, and each module has many students.
- Can’t use foreign keys, as multiple values would be needed.
More than One Relationship Between Two Entities
- It’s good practice for an entity to have at least one relationship with another entity in the diagram.
- An entity may have more than one relationship with another entity.
Recursive Relationships
- An entity may have a relationship with itself, sometimes called a ‘Pig’s ear’ relationship.
- Example: An employee may supervise other employees.
Binary/Ternary Relationships
- A relationship can involve more than two entities.
- Example: An employee uses a skill on a project.
- Ternary Relationship Example: A sale involving a car, customer, and salesperson.
Weak Entities
- Weak entities are entities whose very existence depends on another entity.
- They inherit at least part of their primary key from the entity to which they are related.
- Example: Employees’ dependents, salary history.
ER Connection Traps
- Problems may arise when designing a conceptual data model called connection traps.
- Often due to a misinterpretation of the meaning of certain relationships.
- Two main types of connection traps:
- Fan Traps
- Chasm Traps
Fan Traps
- Occurs when a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous.
- Example: Identifying which branch an employee works in when there are multiple paths between entities.
Fan Traps Example
- Problem: Which branch does employee ‘SA9’ work in?
- Solution: Clearly define the relationship between staff, division, and branch.
Chasm Traps
- Occurs when a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences.
- Example: Determining which surgery a doctor works in when the relationship is not explicitly defined.
Chasm Traps Example
- Problem: Which surgery does doctor ‘Doc007’ work in?
- Solution: Explicitly define the relationship between doctor, surgery, and office.
Summary of Advanced Modelling
- Advanced modelling addresses potential inconsistencies during the implementation of relational models.
- Many-to-Many relationships need a “composite entity” for implementation.
- Recursive relationships can be solved using aliases during database queries.
- Ternary and higher-order relationships are needed for more complex relationships.
- There are two common problems when designing a conceptual model using ER diagrams: Fan Traps and Chasm Traps.
第六讲course work
Lecture Objectives
- Produce the Conceptual Model in a Data Model Design example.
- Verify the 3rd Normal Form of the model.
Outline
- Conceptual Model Review
- Normalization Review
- Exercise
3 Phases of Design – Conceptual
- The conceptual model used in an enterprise is independent of all implementation considerations.
- The data model is built using user’s requirements specifications.
- It feeds into the logical design phase.
Conceptual Model
ER Model – Getting Started
- Requirements Analysis:
- What does the organization do?
- What are the roles within it?
- What are the information requirements for each kind of user?
- What are the ‘business rules’ or constraints?
- Modelling is iterative; the model needs to be refined and tested.
- Assumptions may be required during development but should be clarified ASAP.
- Keep checking with the users.
Normalization – First Normal Form (1NF)
- A relation is in 1NF if:
- There are no repeating groups in the table.
- There are no multi-valued attributes.
- All non-primary-key attributes are functionally dependent on the primary key.
- Properly defined relations must have a unique primary key as a unique identifier.
- A relation not in 1NF might contain duplicate rows.
Normalization – Second Normal Form (2NF)
- A relation is in 2NF if:
- It is in 1NF.
- Every non-primary-key attribute is fully functionally dependent on the primary key.
- A relation with a single-attribute primary key must be in 2NF by definition.
- We reduce data to 2NF by non-loss decomposition (usually need more relations but can’t afford to lose data).
Normalization – Third Normal Form (3NF)
- A relation is in 3NF if:
- It is in 2NF.
- Every non-primary-key attribute depends only on the primary key (no transitive dependencies).
- We reduce data to 3NF by non-loss decomposition (usually need more relations but can’t afford to lose data).
Exercise Scenario
A university department offers courses. Each course has a name, such as “Computer Science,” and a degree, such as “BSc” or “MSc.” A course is made up of several modules. A module consists of a name, module code, year of study (i.e., 1, 2, 3, or 4), and the term in which it is taught (spring or autumn). Students are enrolled in a course and take modules of that course. A student has a name, a registration number, and is enrolled in a course. Each module is taught by a lecturer. A lecturer has a name and works in a department. Also, each lecturer is a tutor of a group of students. Students can make appointments with a lecturer. An appointment has a date and location between a lecturer and a student.
Entities and Attributes
Entities:
- Module
- Attributes: Module_Code (PK), Module_Name, Year, Term
- Appointment
- Attributes: Lecture_No (PK), Student_Reg (PK), Date (PK), Location
- Course
- Attributes: Course_Name (PK), Degree
- Student
- Attributes: Student_Reg (PK), Student_Name
- Department
- Attributes: Dep_Code (PK), Dep_Name
- Lecturer
- Attributes: Lecturer_No (PK), Lecturer_Name
Relationships:
- Attributes: Lecturer_No (PK), Lecturer_Name
- Courses contain Modules.
- Students enroll in Courses.
- Students take Modules.
- Modules are taught by Lecturers.
- Lecturers belong to Departments.
- Lecturers tutor Students.
- Students can make Appointments with Lecturers.
Conceptual Model
ER Model Concepts:
- Entity: Group of objects with the same properties, identified by the enterprise as having an independent existence.
- Attributes: Relevant property (information/adjectives) about the entity.
- Relationship: How entities are related, often denoted by verbs. Relationships should be named and their degree shown.
Example: - Doctor treats Patients.
- ER Diagram to illustrate entities, attributes, and relationships.
Normalization Process
1NF to 2NF:
- Ensure no repeating groups or multi-valued attributes.
- Ensure all non-primary-key attributes are fully functionally dependent on the primary key.
2NF to 3NF: - Ensure no transitive dependencies.
- Decompose relations as necessary without losing data.
Summary
- Advanced modeling is an iterative process.
- Many-to-Many relationships require a “composite entity” for implementation.
- Fan Traps and Chasm Traps need to be verified at the conceptual level.
- 3rd Normal Form verification is better achieved in the relational model by adding some dummy data.
Questions
- Q&A session to address any queries.
Through the above content, students will understand the process of creating a conceptual model, verifying normalization to the 3rd normal form, and applying these concepts to practical exercises.
第七讲
Lecture Objectives
- Define database tables and constraints in PostgreSQL.
- Manipulate existing database tables in PostgreSQL.
Outline
- DBMS
- SQL Recap
- SQL - Data Definition Language (DDL)
- Create Tables, Domains, Data Types, Default Values
- Database Integrity
- Constraints
- SQL - Data Manipulation Language (DML)
- Summary
- Q&A
Database Management System (DBMS)
- Common DBMS view: Create multiple databases, e.g., Content Management, Sales, Payroll.
- SQL statements to create a database, e.g.,
CREATE DATABASE ContentManagement
.
SQL Language
- Data Definition Language (DDL)
- Data Manipulation Language (DML)
- Data Control Language (DCL)
- Transaction Control Language (TCL)
Usual Practice
- Use a file of SQL commands (.SQL) as a script to create the database.
- Constraints can be added in a separate file.
- Scripts are rerunnable and can have comments.
SQL CREATE TABLE Statement
- Used to create a new table in a database.
- Syntax example:
CREATE TABLE EMP ( EMPNO INTEGER, ENAME VARCHAR(10), JOB VARCHAR(9), MGR INT, HIREDATE DATE, SAL INTEGER, COMM INTEGER, DEPTNO INTEGER );
- Drop an existing table before creating it:
DROP TABLE IF EXISTS emp;
Creating Tables by Copying
- Create a table as a copy of another:
CREATE TABLE new_table AS SELECT * FROM old_table;
- Copy the structure without rows:
CREATE TABLE new_table AS SELECT * FROM old_table WHERE 1=2;
Common Data Types
- Text or Character:
char(5)
: Fixed length of 5 characters.varchar(100)
: Variable length up to 100 characters.varchar
: Variable length string up to 65,535 bytes (PostgreSQL).
- Integer:
SMALLINT
: Range -32,768 to 32,767, size 2 bytes.INT
: Range -2,147,483,648 to 2,147,483,647, size 4 bytes.SERIAL
: Automatically generated integers.
- Floating-point Number:
float(n)
: Floating-point numbers with n precision, up to 8 bytes.float8
orreal
: 4-byte floating-point numbers.
- Temporal Data Types:
DATE
: Stores dates.TIME
: Stores time of day values.TIMESTAMP
: Stores date and time values.TIMESTAMPTZ
: Timezone-aware timestamp.INTERVAL
: Periods of time.
PostgreSQL Dates
- DATE: Stores year, month, day, default format
YYYY-MM-DD
. - TIME: Stores hour, minute, second, millisecond, default format
HH:MM:S.MS
. - TIMESTAMP: Stores year, month, day, hour, minute, second, millisecond, default format
YYYY-MM-DD HH:MM:S.MS
.
Creating Domains
CREATE DOMAIN
creates a new domain, essentially a data type with optional constraints.CREATE DOMAIN validStaffNo AS INT CHECK (VALUE BETWEEN 0 AND 2000);
- Use domain to create a new table:
CREATE TABLE staff4 ( staffno validStaffNo, name VARCHAR(20), job VARCHAR(20) );
Default Values
- Default value assigned by the database if no specific value is given.
CREATE TABLE staff( staffno NUMERIC(4), name VARCHAR(20), job VARCHAR(20) DEFAULT 'Assistant' NOT NULL );
Modifying Existing Tables
- Add columns:
ALTER TABLE tablename ADD columnname datatype;
- Drop columns:
ALTER TABLE tablename DROP COLUMN columnname;
Auto Generating Primary Key
- PostgreSQL provides
SERIAL
andSEQUENCE
facilities.- Using
SERIAL
:CREATE TABLE table_name(id SERIAL);
- Using
SEQUENCE
:CREATE SEQUENCE table_name_id_seq; CREATE TABLE table_name ( id integer NOT NULL DEFAULT nextval('table_name_id_seq') );
- Using
Database Integrity
- Data integrity enforced by a series of constraints or rules.
- Entity Integrity: No duplicate rows.
- Domain Integrity: Valid attribute values, e.g., age 0-200.
- Referential Integrity: Rows cannot be deleted if referenced elsewhere, e.g., department with employees.
- User-Defined Integrity: Business rules, e.g., salary > commission.
Database Constraints
- PRIMARY KEY: Uniquely identifies each record.
- FOREIGN KEY: Ensures referenced value exists in the parent table.
- CHECK: Enforces conditions.
- NOT NULL: Ensures values are not null.
- UNIQUE: Ensures unique values.
SQL Data Manipulation Language (DML)
- INSERT: Insert data into tables.
INSERT INTO tablename(col_1,.., col_n) VALUES ('x_1',.., 'x_n');
- UPDATE: Update existing rows.
UPDATE emp SET sal = 3000 WHERE empno = 7788;
- DELETE: Delete rows.
DELETE FROM emp WHERE deptno = (SELECT deptno FROM dept WHERE dname = 'SALES');
Summary
- SQL statements CREATE, ALTER, DROP are DDL; INSERT, UPDATE, DELETE, SELECT are DML.
- Various PostgreSQL data types can be used, such as NUMERIC, VARCHAR, INTEGER.
- Database integrity ensures business rules consistency.
- Constraints ensure database integrity: PRIMARY KEY, FOREIGN KEY, CHECK, NOT NULL, UNIQUE.
第八讲
Lecture Objectives
- Create transactions to perform multiple changes to the data as a logical unit of work.
- Explain the ‘ACID’ model for implementing transactions.
- Relate to the different database data control commands.
- Identify how to implement transactions, grant, and revoke privileges in PostgreSQL.
Outline
- Transactions
- Definition
- Boundaries
- Transaction States
- Data Control Language (DCL)
- ACID Model
- Summary
- Q&A
SQL and Its Four Sub-languages
- Data Definition Language (DDL)
- Data Manipulation Language (DML)
- Data Control Language (DCL)
- Transaction Control Language (TCL)
Transactions
- A transaction is an action, or a series of actions, carried out as a logical unit of work by a user or an application that reads or updates the database contents.
- It transforms the database from one consistent state into another and is used as a mechanism for concurrency and recovery control in the database.
- It consists of one or more changes (INSERT, UPDATE, DELETE) associated with DML commands.
Transaction Example
Suppose you want to transfer $5 from Account A to Account B. This action can be broken down into the following steps:
- Create a record to transfer $5 from Account A to Account B. This is typically the beginning of a database transaction.
- Read the balance from Account A.
- Subtract $5 from the balance of Account A.
- Read the balance from Account B.
- Add $5 to the balance of Account B.
- The database can run these operations as a single logical unit — either all operations are executed, or none are.
Transaction Boundaries
- A transaction begins:
- On login
- At the end of the last transaction
- In some RDBMSs, with:
BEGIN TRANSACTION
(e.g., PostgreSQL)
- A transaction ends with:
COMMIT
/ROLLBACK
- Any DDL statement (e.g., CREATE, DELETE, TRUNCATE, etc.)
- Logout
- But not with an ungraceful exit, process kill, or interrupt
COMMIT
Command
- Marks the successful end of a transaction.
- Means that changes are now permanent within the database.
AUTOCOMMIT
mode is the default in PostgreSQL, meaning that every DML command is immediately made permanent.
Rolling Back Transactions
- Use the SQL
ROLLBACK
command to revert actions, sometimes called ‘undo’. - All RDBMSs must support
ROLLBACK
. - Saving changes is expensive (consider a multi-million-row delete), so transactions should be as short as possible.
Savepoints
- Not ANSI standard.
- You can set up multiple savepoints within transactions:
SAVEPOINT s1
: DML before the savepoint is saved but not committed.ROLLBACK TO SAVEPOINT s1
: Whole transaction can be committed or rolled back.
- Most useful when SQL is embedded in a procedural language with conditional constructs.
Transaction States
- Active: The transaction is active during its execution; it is the initial state.
- Partially Committed: After the final statement has been executed, the transaction is partially committed.
- Failed: Once it is discovered that the transaction cannot be executed normally, its state is failed.
- Aborted: The transaction is aborted once it is rolled back and the database is restored to its state before the transaction’s execution.
- Committed: The transaction is committed after it is completed successfully.
Data Control Language (DCL)
- To allow other database users to view or modify the contents of any data in a database, you must grant them the privileges to do so.
- Grant privileges syntax:
GRANT privileges ON object TO user;
Example of Granting Privileges
- Grant some privileges (e.g., SELECT, INSERT, UPDATE, DELETE) on a table named
products
to a user nameduser111
:GRANT SELECT, INSERT, UPDATE, DELETE ON products TO user111;
- Grant all permissions:
GRANT ALL ON products TO user111;
- Grant only SELECT access (read-only):
GRANT SELECT ON products TO user111;
Example of Revoking Privileges
- Revoke some privileges (e.g., UPDATE, DELETE) on a table named
products
from a user nameduser111
:REVOKE UPDATE, DELETE ON products FROM user111;
- Revoke all permissions:
REVOKE ALL ON products FROM user111;
- Revoke only SELECT access:
REVOKE SELECT ON products FROM user111;
ACID Transactions
- Atomic: Either all changes (operations) of the transaction are reflected in the database, or none are.
- Consistent: The database remains in a consistent state as transactions are executed in isolation.
- Isolated: Transaction A cannot see Transaction B’s changes until B has finished (committed), although they are allowed to execute concurrently. Intermediate transaction results must be hidden from other concurrently executed transactions.
- Durable: Committed changes of transactions must persist even in a system crash/failure.
Implementation of Atomicity and Durability
- Supported and implemented in the recovery management component of the database system.
- Example: Shadow-database (copy) scheme:
- All updates are made on the shadow copy of the database.
- The db_pointer points to the updated shadow copy after the transaction reaches a partial commit and updated pages have been flushed to disk.
- If the transaction fails, the consistent copy to which db_pointer refers is used, and the shadow copy is deleted.
- Assumptions: No hard disk failures, only one transaction active at a time, no handling of concurrent transactions needed.
Summary
- A transaction is a logical unit of work. It consists of one or more changes to the database using one or more commands: INSERT, UPDATE, DELETE, etc.
- Transactions must conform to the ‘ACID’ model.
- Transactions are managed using the TCL commands, i.e.,
COMMIT
andROLLBACK
. - Privileges to access and modify tables in the database can be set with DCL commands:
GRANT
,REVOKE
.
第九讲Views & Stored Procedures
Objectives
- Explain the meaning of Views in a database.
- Explain what a Stored Procedure is and their purpose.
- Identify the different database manipulations possible with Views.
- Relate to the different steps needed to create a Stored Procedure.
- Describe the different SQL commands to implement Views and Stored Procedures.
Outline
- Views
- Definition
- Advantages
- Create & Query Views
- Stored Procedures
- Definition
- Structure
- Function Arguments
- Variable Declaration
- Variable Assignments
- Summary
- Q&A
Views
Definition:
- A view is a virtual table based on a SELECT statement.
- The SELECT statement can obtain columns, computed columns, aliases, and aggregate functions from one or several tables.
- A view is similar to a real table; the fields are obtained from the so-called base tables.
Syntax:
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition;
Advantages of Views over Tables:
- Security: Users can be given permission to query a view but not the underlying base tables, thus limiting data exposure.
- Simplicity: Views can simplify queries by combining data from multiple tables into a single view.
- Efficiency: Views consume very little storage space since only the definition is stored, and the view is updated dynamically.
- Up-to-date Information: Views are recreated on demand each time they are invoked, reflecting new data additions.
Creating Views in PostgreSQL:
- Example: Create a view with employee number, name, and salary of all managers.
CREATE VIEW managers AS SELECT empno, ename, sal FROM emp WHERE job = 'MANAGER';
Querying Views:
- Views can be used anywhere a table name is expected in SQL statements.
- They can be used inside SELECT statements, stored procedures, or within other views.
- Example:
This query delivers the same result as:SELECT empno FROM managers;
SELECT empno FROM emp WHERE job = 'MANAGER';
Renaming Attributes in a View in PostgreSQL:
- You can give attributes different names in the view.
CREATE VIEW managers_view (id, name, location) AS SELECT emp.empno, emp.ename, dept.loc FROM emp, dept WHERE emp.deptno = dept.deptno AND emp.job = 'MANAGER';
Modifying a View:
- Typically, views cannot be modified directly, but the underlying tables can be modified through the view.
- SQL rules for updating views:
- The SELECT clause must list enough attributes to insert tuples into the view and the base tables. Remaining attributes will be filled with NULL or default values.
Examples:
- The SELECT clause must list enough attributes to insert tuples into the view and the base tables. Remaining attributes will be filled with NULL or default values.
- INSERT:
CREATE VIEW salesmen_view (id, name, salary, job) AS SELECT empno, ename, sal, job FROM emp WHERE job = 'SALESMAN'; INSERT INTO salesmen_view VALUES (8000, 'STAHL', 9000, 'SALESMAN');
- DELETE:
DELETE FROM salesmen_view WHERE salary < 1500;
- UPDATE:
UPDATE salesmen_view SET salary = 3000 WHERE name = 'STAHL';
- DROP:
DROP VIEW salesmen_view;
Stored Procedures
Definition:
- A stored procedure is a named collection of computations and SQL statements stored in the database.
- They encapsulate and represent business transactions, allowing multiple SQL statements to be executed by calling the stored procedure.
Advantages:
- Reduction of Network Traffic: Stored procedures are stored on the database server, eliminating the need to transmit SQL statements over the network.
- Code Reusability and Isolation: They reduce code duplication by allowing code sharing and isolation, minimizing errors and development costs.
What is PL/pgSQL?
- PL/pgSQL is a block-structured procedural language used to implement stored procedures in PostgreSQL.
- It contains a declaration section and an executable section and is closely coupled with SQL.
Anatomy of a Stored Procedure: - Basic structure:
CREATE FUNCTION function_name (arguments) RETURNS return_type AS $$ BEGIN -- statements END; $$ LANGUAGE plpgsql;
Example: Simple Standalone Procedure
- Increment function:
CREATE FUNCTION increment(IN val integer) RETURNS integer AS $$ BEGIN RETURN val + 1; END; $$ LANGUAGE plpgsql; SELECT increment(10);
Function Arguments:
- Example:
CREATE OR REPLACE FUNCTION divisionIntegers(a integer, b integer) RETURNS float AS $$ BEGIN RETURN a / b::float; END; $$ LANGUAGE plpgsql;
Variable Declarations:
- Variables can be declared within the DECLARE section of the function.
- Example:
CREATE OR REPLACE FUNCTION circleArea(radius integer) RETURNS float AS $$ DECLARE pi float; BEGIN pi := 3.14; RETURN radius * radius * pi::float; END; $$ LANGUAGE plpgsql;
Assignments of Composite Variables:
- Assign values using SELECT statement:
SELECT expression INTO target;
Example:
CREATE OR REPLACE FUNCTION testCompositeDeclarations() RETURNS text AS $$
DECLARE
department dept%rowtype;
BEGIN
SELECT 50, 'PURCHASE', 'READING' INTO department.deptno, department.dname, department.loc;
RETURN department;
END; $$
LANGUAGE plpgsql;
Summary to Stored Procedures
- Stored Procedures are collections of computations and SQL statements stored in the database.
- They are block-structured and composed of a declaration section (for variables) and an executable section (for assignments and calculations).
- Assignments in Stored Procedures can be done using the SQL SELECT command.
第十讲Views & Stored Procedures
Lecture Objectives
- Understand the ‘I’ (Isolated) in ACID transactions
- Define concurrency
- Describe the problems of concurrent access
- Outline the RDBMS mechanisms to control concurrency
- Identify ANSI transaction isolation levels
- Define locking control and deadlocks
Outline
- Transactions and ACID review
- Concurrency
- Concurrency Problems
- ANSI/SQL Isolation Levels
- Locking
Transactions and ACID Review
Transactions
- A transaction is an action, or a series of actions carried out as a logical unit of work, by a user or an application that reads or updates the database contents.
- It transforms the database from one consistent state into another and is used as a mechanism for concurrency and recovery control in the database.
- It consists of one or more changes (INSERT, UPDATE, DELETE) associated with DML commands.
ACID Transactions - Atomic: Either all changes (operations) of the transaction are reflected in the database, or none are.
- Consistent: Preserves the database in a consistent state as transactions are executed in isolation.
- Isolated: Transaction A cannot see Transaction B’s changes until B has finished (committed), although they are allowed to execute concurrently. Intermediate transaction results must be hidden from other concurrently executed transactions.
- Durable: Committed changes of transactions must persist even in a system crash/failure.
Concurrency
Role and Advantages of DBMSs
- Improved data sharing.
- Better data integration.
- Minimized data inconsistency (storing different versions of the same data).
- Improved data access (quick answers to ad hoc queries).
- Improved decision making.
Concurrent Database Use - Concurrency allows many users to work on the same data.
- Maximizes data access.
- Minimizes wait time.
- Ensures data is always in a correct state with no violation of integrity constraints.
- Concurrency is a major issue for every RDBMS.
- Queries are less problematic than transactions because a transaction is a set of logically related changes.
Concurrency Problems
- Dirty Read
- Occurs when a transaction reads a row that has been modified by another transaction but not yet committed.
- Example: User U1 is updating data, while User U2 is reporting data. If U1 rolls back, U2’s report is incorrect.
- Lost Update
- Occurs when multiple transactions select the same row and update it based on the original value.
- Example: User U1 increases salary, User U2 deducts tax based on the same original salary, resulting in one update being lost.
- Non-Repeatable Read
- Occurs when a transaction reads data before and after another transaction modifies it.
- Example: User U1 updates a table, User U2 reports on it, resulting in inconsistent reads.
- Phantom Read
- Occurs when the same query executed twice returns different rows.
- Example: User U1 inserts new rows, User U2 reports on the table, resulting in different row counts.
The Scheduler
- Severe problems can arise when two or more concurrent transactions are executed.
- The scheduler is a special DBMS process that establishes the order in which the operations are executed.
- It interleaves the execution of database operations to ensure serializability and isolation of transactions.
- The scheduler’s main job is to create a serializable schedule of transaction operations, yielding the same results as if the transactions were executed serially.
Concurrency Control with Locking
Locking
- Locking is one of the most common techniques used in concurrency control.
- A lock guarantees exclusive use of data items to a concurrent transaction.
- A transaction (e.g., T2) does not have access to a data item currently used by another transaction (e.g., T1).
- A transaction acquires a lock prior to data access; the lock is released when the transaction is completed.
Locking Modes
- Access Exclusive Lock: Prevents other transactions from accessing or modifying the locked object, used for destructive operations.
- Exclusive Lock: Prevents other transactions from modifying the locked object, but allows reading.
- Share Lock: Allows multiple transactions to read the same object concurrently, but prevents modifications.
- Row Share and Row Exclusive Lock: Controls concurrent access to individual rows of a table.
Deadlocks
- Deadlocks occur when two or more transactions are waiting for each other to release locks.
- Example: Tx_1 updates row 7 and tries to update row 3, while Tx_2 updates row 3 and tries to update row 7.
- PostgreSQL automatically detects deadlocks and aborts one of the involved transactions.
ANSI/SQL Isolation Levels
- Read Uncommitted
- Transactions may see not-yet-committed changes by other transactions.
- Offers no protection against data inconsistency.
- Not possible in Oracle, possible (but undesirable) in other RDBMSs like PostgreSQL.
- Read Committed
- Forces transactions to read only committed data.
- Data can change after it is read.
- Default level for most vendors.
- Repeatable Read
- Ensures that row queries return consistent results.
- Keeps read and write locks on the selected data until the transaction ends.
- Does not prevent new rows with the selected criteria from being added.
- Serializable
- Applies read and write locks until the transaction ends (similar to Repeatable Read).
- If a write collision is detected among transactions, only one is executed, and the rest fail.
- Ensures complete isolation but is slower.
Isolation Levels and Problems
Isolation Level | Dirty Read | Lost Updates | Non-Repeatable Read | Phantom Read |
---|---|---|---|---|
Read Uncommitted | Possible | Possible | Possible | Possible |
Read Committed | Not Possible | Possible | Possible | Possible |
Repeatable Read | Not Possible | Not Possible | Not Possible | Possible |
Serializable | Not Possible | Not Possible | Not Possible | Not Possible |
Locking Notes
- All commercial RDBMSs must implement safe locking.
- The challenge is to preserve speed of response.
- Developers should:
- Minimize lock contention by keeping transactions short.
- Avoid explicit locking unless necessary.
- Avoid deadlocks by accessing tables in the same order.
Summary
- Concurrency allows many users to work on the same data.
- However, concurrency can cause consistency problems such as dirty reads, lost updates, non-repeatable reads, and phantom reads.
- ANSI/SQL isolation levels in transactions aim to avoid these consistency problems.
- Locks can be used to prevent readers and writers from blocking each other, but locks can still introduce deadlocks.