本科课程-数据库知识点（英文）

slx · 2024 年6 月 19 日 05:48

第一讲Introduction

Lecture Objectives

Describe what Database Management Systems (DBMSs) are.
Recognize the problem of Data Redundancy.
Explain the basic Relational Data Model and some basic Relational Data Model concepts.
Relate SQL basic concepts.

Outline

DBMS
Data Redundancy
File System vs Databases
Relational Model
RDBMS
SQL
Questions and Answers

Storing & Managing Information

Relational database models focus on storing data efficiently.
They enable retrieval of specific information and obtaining new insights from stored data.

What is a Database and DBMS?

A database is a shared integrated computer structure that stores a collection of end-user data and metadata.
Metadata provides a description of the data characteristics and links (relationships) between sets of data.
A database is an organized electronic filing cabinet designed to define, manipulate, retrieve, and manage data.
A software called the Database Management System (DBMS) helps manage the database’s contents.

Role and Advantages of DBMSs

Improved data sharing.
Better data integration.
Minimized data inconsistency (storing different versions of the same data).
Improved data access (quick answers to ad hoc queries).
Improved decision making.

Why Database Design is Important?

Focus on transactional databases designed to support a company’s day-to-day operations.
Database design defines the structure of the database to create and maintain data consistency and improve operational speed.
A well-designed database facilitates data management through the generation of accurate and valuable information.
A poorly designed database may lead to errors and bad decision-making, potentially causing organizational failure.

Historical Roots: Manual File Systems

Managers in small organizations kept manual file systems to track important data.
Manual file systems are slow and cumbersome for querying.
Example questions:
- What products did I sell last month, quarter, or year?
- How does my sales volume this year compare to last year?

Historical Roots: Electronic File Systems

Initially similar to manual file systems but automated.
Created file structures and software to manage these structures and generate reports automatically.
File systems grew rapidly, leading to challenges in data management and the need for more robust computer systems.
Data redundancy and inconsistency were common due to the dispersion of data across different locations.

Data Redundancy: Islands of Information

Data redundancy leads to data inconsistency, with different versions of the same data existing in different places.
Example scenario: Sales agent updates address in HR but not in the sales department, leading to inconsistency.

Data Redundancy: Data Anomalies

Update Anomalies: If a location changes, many rows need to be updated.
Insertion Anomalies: Creating a department without employees results in many NULL values.
Deletion Anomalies: Deleting the last employee in a department can result in the loss of the department.

Database Systems vs File Systems

File Systems: Data spread across different locations, leading to redundancy and inconsistency.
Database Systems: Centralized data management reduces redundancy and ensures consistency.

Overview of the Relational Model

First proposed by British data scientist E.F. Codd in 1970.
Based on mathematical concepts of sets, tuples, and relations.
Set: Collection of items of the same type with no order or duplicates.
Tuple: Finite ordered list of elements (n-tuple is a sequence of n elements).
Relation: Set of n-tuples with no order on n-tuples.

Relational Model Components

Data stored within relations (tables).
Relation consists of attributes (columns) and tuples (rows).
Attributes have associated domains (set of all possible values).

Advantages of Relational DBMS (RDBMS)

Designed specifically for relational databases.
Based on sound mathematical theory.
Implemented with a declarative language for querying.
Provides a secure, consistent repository with scalable capability.
No unnecessary repetition of data.
Data mapped logically, independent of physical location/storage.

RDBMS Architecture

External Level: Customized user views of the data.
Conceptual Level: Community view of all available data, independent of storage details.
Internal Level: Describes physical storage of data.

Disadvantages of RDBMS

Requires substantial memory and processing power.
Needs rigorous data modeling.
Data redundancy and anomalies can occur with an ill-defined database model.
Best suited for environments where data structures and relationships are well-known.

Structured Query Language (SQL)

Standard language for relational databases accepted by ANSI and ISO/IEC.
Provides statements for querying, inserting, updating, deleting rows, and managing database objects and access.

SQL Sublanguages

Data Definition Language (DDL): Defines database structure.
Data Manipulation Language (DML): Manipulates data.
Data Control Language (DCL): Controls access to the database.
Transaction Control Language (TCL): Manages transactions.

Summary

The module will teach relational databases.
Historical roots lie in file systems, which suffer from data redundancy and anomalies.
Relational databases, based on solid mathematical principles, address many file system issues.
Adequate data modeling techniques are essential to prevent data anomalies in relational databases.
This overview provides a comprehensive understanding of databases, DBMS, and their historical context, highlighting the importance of well-designed relational databases and the role of SQL in managing them.

第二讲Data Modelling I

Lecture Objectives

Identify the three phases of database design.
Model data by creating an Entity-Relationship (ER) model.

Outline

Three Phases of Database Design
- Conceptual
- Logical
- Physical
Creating the Conceptual Model
Entity-Relationship (ER) Model
Questions and Answers

Criteria to Produce an Optimal Data Model

Structural validity: Consistency with the way the enterprise defines and organizes information.
Simplicity: Ease of understanding by Information Systems professionals and non-technical users.
Expressiveness: Ability to distinguish between different data, relationships between data, and constraints.
Non-redundancy: Exclusion of extraneous information; representation of any piece of information exactly once.
Shareability: Not specific to any particular application or technology, usable by many.
Integrity: Consistency with the way the enterprise uses and manages information.
Extensibility: Ability to evolve to support new requirements with minimal effect on existing users.
Representation: Ability to represent a model using an easily understood diagrammatic notation.

Three Phases of Design

Conceptual Design
- Independent of implementation considerations.
- Built using user requirements specifications.
- Feeds into the logical design phase.
Logical Design
- Enterprise model based on a specific data model (e.g., relational).
- Independent of DBMS and physical storage.
- Used by software applications.
- Conceptual model mapped to a logical data model.
- Ensures no information is lost.
- Includes normalization.
Physical Design
- How the data is stored in databases.
- Describes file organizations, indexes, and distribution for efficient data access.
- Focuses on security and efficient implementation of constraints/integrity rules.
- Tailored to a specific DBMS.

Conceptual Model

Entity-Relationship Model (ER)
- Described by Chen in 1976.
- Based on strong mathematical foundations (Set Theory, Mathematical Relations, Modern Algebra, Logic).
- Graphical and easy to understand.
- Several variations, including Chen and UML representations, and cardinality specifications (Crowfeet, SSADM, Min-Max).

Requirements Analysis for ER Model

Understand what the organization does.
Identify roles within the organization.
Determine information requirements for each type of user.
Identify business rules or constraints.
Iteratively refine and test the model.
Clarify assumptions as soon as possible.
Continuously verify with users.

ER Model Concepts

Entity Type: Group of objects with the same properties, identified as having independent existence.
Entity Occurrence: Uniquely identifiable instance of an entity type.
- Example: “Doctor” is an entity type; “Dr. Hibbert” and “Dr. Nick” are instances.
Attributes: Relevant properties (e.g., adjectives) about the entity.
- At least one identifying attribute or combination of attributes for each entity occurrence.
- Example attributes for a “Patient” entity might include hair color, date of birth, and religion.
Attribute Domain: The possible set of values for an attribute.
- Examples:
  - Hair_Color: brown, black, blonde, white, grey, brunette
  - Date_of_Birth: date
  - Student_ID: integer
Attribute Types:
- Simple Attribute: Single component with an independent existence (e.g., surname, salary).
- Composite Attribute: Multiple components (e.g., address, name).
- Single-Valued Attribute: Holds a single value for each entity instance (e.g., student number).
- Multi-Valued Attribute: Holds multiple values for each entity instance (e.g., company phone numbers).
- Derived Attribute: Value derivable from related attributes or another entity type (e.g., total salary = salary + bonus).

ER Model Example

Relationships: How entities are related.
- Use verbs to denote relationships (e.g., doctors treat patients).
- Each entity should be related to at least one other entity.
Relationship Types: Set of meaningful associations among entity types.
Relationship Occurrences: Unique associations including one instance from each participating entity type.
Relationship Representation: Lines between entity boxes denote relationships; relationships should be named.

Constraints and Assumptions

Constraints: Rules ensuring the validity of data (e.g., “Every patient must be registered with a doctor”).
Assumptions: Represent expectations about reality, to be clarified during the modeling process (e.g., “Patients will only have one appointment per day”).

Conceptual Model Summary

Steps to Create an ER Model:
- Identify entity types.
- Determine attributes.
- Identify relationship types.
- Associate attributes with entity or relationship types.
- Add additional constraints and assumptions as necessary.

Summary

Database design consists of three stages: Conceptual, Logical, and Physical.
Good database design is challenging, and skilled practitioners are rare.
Database design is not an exact science; multiple solutions can exist for the same problem.
The main output of the conceptual stage is the ER model, which includes entities, attributes, relationships, and constraints.

第三讲Data Modelling II

Lecture Objectives

Implement the remaining phases of the database design: Logical and Physical.
Understand relational database data integrity rules.
Describe and implement the Date-Codd notation.

Outline

Recap
- Main Entity Relationship Model (ER): Entity, Attributes, Relationships, Constraints
Three Phases of Database Design: Logical/Physical
- Logical Design
- Physical Design
Attributes domains
Keys and Foreign Keys
Integrity rules
Date-Codd Notation
Questions and Answers

Recap – Data Modelling

Main Entity Relationship Model (ER)
- Entity: Represents real-world objects or concepts.
- Attributes: Properties or details about the entities.
- Relationships: How entities relate to each other.
- Constraints: Rules that ensure data integrity.

Three Phases of Database Design

Conceptual Design

Independent of implementation considerations.
Based on user requirements specifications.
Provides input to the logical design phase.

Logical Design

Enterprise model based on a specific data model (e.g., relational).
Independent of DBMS and physical storage.
Used by software applications.
Ensures no information is lost.
Includes normalization.

Physical Design

Describes how data is stored in databases.
Focuses on file organizations, indexes, and distribution for efficient data access.
Implements security and integrity rules.
Tailored to a specific DBMS.

Properties of a Table (Relation)

2D structure comprising rows and columns.
Each row (tuple) describes a single entity of the relation.
No duplicate rows allowed.
Each column represents an attribute with a distinct name.
Each cell contains a single atomic value.
All values in a column must have the same data format.
Each attribute has a specified range of values (attribute domain).
The order of rows and columns is insignificant to the DBMS.
Each table must have a combination of attributes that uniquely identify each row.

Attributes Domains

Definition: The possible set of values for an attribute.
Examples:
- Hair_Color: brown, black, blonde, white, grey, brunette
- Date_of_Birth: date
- Student_ID: integer

Benefits of Establishing Data Types

Storage Efficiency: Optimizes space on disk.
Predictions: Allows the DBMS to manage memory allocation efficiently.
Data Consistency: Ensures only valid data is accepted.

Common Data Type Categories

Text or Character:
- char(5): Always 5 characters in length.
- varchar(100): Variable length up to 100 characters.
- varchar: Variable length string up to 65,535 bytes.
Integer:
- SMALLINT: Range -32,768 to 32,767, 2-byte size.
- INT: Range -2,147,483,648 to 2,147,483,647, 4-byte size.
- SERIAL: Automatically generated integers.
Floating-Point Number:
- float(n): Floating-point numbers with n precision, up to 8 bytes.
- float8 or real: 4-byte floating-point numbers.
Temporal Data Types:
- DATE: Stores dates.
- TIME: Stores time of day.
- TIMESTAMP: Stores date and time.
- TIMESTAMPTZ: Timezone-aware timestamp.
- INTERVAL: Stores periods of time.

Relational Database Keys

Primary Key: A candidate key selected to uniquely identify each occurrence of an entity type.
Composite Key: A candidate key that consists of two or more attributes.
Natural Key: Identifying key derived from the entity’s existing attributes (e.g., NHS number for a patient).
Surrogate Key: An artificial key introduced for efficiency, usually an integer (e.g., patient number).

Controlled Redundancy

Ensures that the relationships in the database work properly.
Common attributes link tables together.
Necessary duplication of values is not considered redundant.
Unnecessary duplication is considered redundancy.

Foreign Keys

Attributes whose values match values of a candidate key in another relation.
Used to combine information from multiple relations (tables).

Integrity Rules

Entity Integrity: Ensures the primary key has no null values and all entries are unique.
Referential Integrity: Ensures that foreign keys properly reference primary keys in another table.

Relational Model – Date-Codd

Codd-Date Definition:
- No diagrams, tables referred to as relations.
- Uses predicate logic and extensive mathematics.
- Based on Codd-Dates rules.
- Uses textual notation to express relations.

Date-Codd Steps

Name the Relation:
- Example: Employee
List Attribute Domains:
- Example: Empno: int, EName: char(50), Salary: float, DeptNo: int
Specify Keys and Attributes:
- Example: primary key Empno NOT NULL, foreign key DeptNo references Department NOT NULL
Add Constraints:
- Example: An employee can be assigned to a single department at most.

Example: Employee and Department Relations

Employee Relation

Attribute	Domain	Key
Empno	int	Primary Key, NOT NULL
EName	char(50)
Salary	float
DeptNo	int	Foreign Key, NOT NULL

Department Relation

Attribute	Domain	Key
DeptNo	int	Primary Key, NOT NULL
DeptName	char(50)
Loc	varchar(13)

Example with Constraints

Employee Relation

Attribute	Domain	Key and Constraints
Empno	int	Primary Key, NOT NULL
EName	char(50)
Salary	float
DeptNo	int	Foreign Key references Department, NOT NULL
Constraint: An employee can be assigned to a single department at most.

Department Relation

Attribute	Domain	Key and Constraints
DeptNo	int	Primary Key, NOT NULL
DeptName	char(50)
Loc	varchar(13)
Constraint: A department can have many or no employees assigned.

Summary

Concepts such as keys, foreign keys, and domains need to be defined in the logical/physical database design phases.
The model must follow entity and referential integrity rules.
The Codd-Date notation provides an alternative way to represent a database design.

第四讲Normalisation

Lecture Objectives

Identify and describe the steps of Normalisation.
Analyse data to define a strategy to implement the normalisation process to remove data anomalies, unnecessary and uncontrolled data redundancy, and ensure data quality.

Outline

Recap
Normalisation
Dependencies
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Summary
Q&A

Recap – Data Modelling

Main Entity Relationship Model (ER):
- Entity
- Attributes
- Relationships
- Constraints
- Domains
- Keys
- Foreign Keys

Table (Relation) Design Steps:

Collect User Requirements
Construct ER Diagram
Transform into Relations
Normalise

What is Normalisation?

Method of stripping down each relation to its essential attributes, avoiding repetition and data loss.
Technique usually results in more relations/tables, each with fewer attributes.
A well-designed database should comply with three levels of normalisation (1NF, 2NF, 3NF).
Much of normalisation theory is common sense but needs to be understood.

Functional Dependency of Attributes/Columns

Applies to attributes within a single relation/table.
Attribute B is functionally dependent on attribute A if for every value of A, there is only one possible value of B.
Functional dependency can also involve combinations of attributes (e.g., A,B,C → D).

Functional Dependency Example

Ename is functionally dependent on Empno:
- Empno → Ename
- For every value of Empno, there is only one possible value of Ename.

Full Functional Dependency

Functional dependencies are full if the dependent attribute depends on all the determinant attributes, not on a subset.

Normalisation – First Normal Form (1NF)

A relation is in 1NF if:
- There are no repeating groups in the table.
- There are no multi-valued attributes.
- All non-primary-key attributes are functionally dependent on the primary key.
- Properly defined relations are already in 1NF.

1NF Example

Initial Table:

Invoice(Inv_no, Cust_no, Cust_address, Inv_date, Item_code, Item_desc, Item_price, Quantity, Price)
1      10      XY1 WZ3      10/09/14   123       Bed         600         2        1200
2      15      WX4 QA9      11/09/14   124,127   Desk, Chair 150, 50     3, 3     600

Steps to achieve 1NF:

Split attributes with repeated values.
Move non-repeating and repeating attributes to separate relations.
Identify primary keys and establish relationships with foreign keys.

Normalisation – Second Normal Form (2NF)

A relation is in 2NF if:
- It is in 1NF.
- Every non-primary-key attribute is fully functionally dependent on the primary key.
Non-loss decomposition is used to achieve 2NF, ensuring no data loss.

2NF Example

Original Table:

Invoice(Inv_no, Cust_no, Cust_address, Inv_date)
1      10      XY1 WZ3      10/09/14
2      15      WX4 QA9      11/09/14

Invoice Detail:

Invoice_Detail(Inv_no, Item_code, Item_desc, Item_price, Quantity, Price)
1      123       Bed         600        2       1200
2      124       Desk        150        3       450
2      127       Chair       50         3       150

Steps to achieve 2NF:

Move attributes not fully dependent on the primary key to a new relation with the determining key.
Update original and new relations accordingly.

Normalisation – Third Normal Form (3NF)

A relation is in 3NF if:
- It is in 2NF.
- Every non-primary-key attribute depends only on the primary key (no transitive dependencies).

3NF Example

Steps to achieve 3NF:

Identify non-primary-key attributes that depend on other non-primary-key attributes.
Move these attributes to a new relation with the correct primary key.
Maintain relationships by preserving foreign keys.

Summary of Normalised Model

1NF: Remove repeating groups and define a primary key.
2NF: Ensure full functional dependency for non-primary-key attributes.
3NF: Remove transitive dependencies, ensuring non-primary-key attributes depend only on the primary key.

Higher Forms of Normalisation

4NF: Eliminates multiple multi-valued dependencies.
5NF: Eliminates all join dependencies not implied by candidate keys.
While 3NF is sufficient for most practical purposes, higher normal forms are available for specific scenarios.

In the Real World

Some denormalisation may be needed for performance reasons (e.g., summary information in a data warehouse or copies of tables at both ends of a network).
Always normalise first, then denormalise if necessary, knowing the reasons for doing so.

Summary

Normalisation is the process of structuring a relational database to reduce redundancy and improve data integrity.
A well-designed database should ideally comply with the third normal form (3NF) or higher.
Normalisation ensures the data is free from anomalies (insertion, update, and deletion anomalies).

第五讲Modelling III

Lecture Objectives

Identify advanced modelling concepts and structures.
Recognize possible solutions to unravel issues related to advanced data modelling structures.

Outline

Many-to-Many Relationships
More than One Relationship Between Two Entities
Recursive Relationships
Binary/Ternary Relationships
Weak Entities
ER Connection Traps
- Fan Traps
- Chasm Traps

Many-to-Many Relationships

Many-to-Many relationships should be resolved before transforming the ER model into a relational model.
Example: A student is registered in many modules, and each module has many students.
Can’t use foreign keys, as multiple values would be needed.

More than One Relationship Between Two Entities

It’s good practice for an entity to have at least one relationship with another entity in the diagram.
An entity may have more than one relationship with another entity.

Recursive Relationships

An entity may have a relationship with itself, sometimes called a ‘Pig’s ear’ relationship.
Example: An employee may supervise other employees.

Binary/Ternary Relationships

A relationship can involve more than two entities.
Example: An employee uses a skill on a project.
Ternary Relationship Example: A sale involving a car, customer, and salesperson.

Weak Entities

Weak entities are entities whose very existence depends on another entity.
They inherit at least part of their primary key from the entity to which they are related.
Example: Employees’ dependents, salary history.

ER Connection Traps

Problems may arise when designing a conceptual data model called connection traps.
Often due to a misinterpretation of the meaning of certain relationships.
Two main types of connection traps:
- Fan Traps
- Chasm Traps

Fan Traps

Occurs when a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous.
Example: Identifying which branch an employee works in when there are multiple paths between entities.

Fan Traps Example

Problem: Which branch does employee ‘SA9’ work in?
Solution: Clearly define the relationship between staff, division, and branch.

Chasm Traps

Occurs when a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences.
Example: Determining which surgery a doctor works in when the relationship is not explicitly defined.

Chasm Traps Example

Problem: Which surgery does doctor ‘Doc007’ work in?
Solution: Explicitly define the relationship between doctor, surgery, and office.

Summary of Advanced Modelling

Advanced modelling addresses potential inconsistencies during the implementation of relational models.
Many-to-Many relationships need a “composite entity” for implementation.
Recursive relationships can be solved using aliases during database queries.
Ternary and higher-order relationships are needed for more complex relationships.
There are two common problems when designing a conceptual model using ER diagrams: Fan Traps and Chasm Traps.

第六讲course work

Lecture Objectives

Produce the Conceptual Model in a Data Model Design example.
Verify the 3rd Normal Form of the model.

Outline

Conceptual Model Review
Normalization Review
Exercise

3 Phases of Design – Conceptual

The conceptual model used in an enterprise is independent of all implementation considerations.
The data model is built using user’s requirements specifications.
It feeds into the logical design phase.

Conceptual Model

ER Model – Getting Started

Requirements Analysis:
- What does the organization do?
- What are the roles within it?
- What are the information requirements for each kind of user?
- What are the ‘business rules’ or constraints?
- Modelling is iterative; the model needs to be refined and tested.
- Assumptions may be required during development but should be clarified ASAP.
- Keep checking with the users.

Normalization – First Normal Form (1NF)

A relation is in 1NF if:
- There are no repeating groups in the table.
- There are no multi-valued attributes.
- All non-primary-key attributes are functionally dependent on the primary key.
- Properly defined relations must have a unique primary key as a unique identifier.
- A relation not in 1NF might contain duplicate rows.

Normalization – Second Normal Form (2NF)

A relation is in 2NF if:
- It is in 1NF.
- Every non-primary-key attribute is fully functionally dependent on the primary key.
- A relation with a single-attribute primary key must be in 2NF by definition.
- We reduce data to 2NF by non-loss decomposition (usually need more relations but can’t afford to lose data).

Normalization – Third Normal Form (3NF)

A relation is in 3NF if:
- It is in 2NF.
- Every non-primary-key attribute depends only on the primary key (no transitive dependencies).
- We reduce data to 3NF by non-loss decomposition (usually need more relations but can’t afford to lose data).

Exercise Scenario

A university department offers courses. Each course has a name, such as “Computer Science,” and a degree, such as “BSc” or “MSc.” A course is made up of several modules. A module consists of a name, module code, year of study (i.e., 1, 2, 3, or 4), and the term in which it is taught (spring or autumn). Students are enrolled in a course and take modules of that course. A student has a name, a registration number, and is enrolled in a course. Each module is taught by a lecturer. A lecturer has a name and works in a department. Also, each lecturer is a tutor of a group of students. Students can make appointments with a lecturer. An appointment has a date and location between a lecturer and a student.

Entities and Attributes

Entities:

Module
- Attributes: Module_Code (PK), Module_Name, Year, Term
Appointment
- Attributes: Lecture_No (PK), Student_Reg (PK), Date (PK), Location
Course
- Attributes: Course_Name (PK), Degree
Student
- Attributes: Student_Reg (PK), Student_Name
Department
- Attributes: Dep_Code (PK), Dep_Name
Lecturer
- Attributes: Lecturer_No (PK), Lecturer_Name
  Relationships:
Courses contain Modules.
Students enroll in Courses.
Students take Modules.
Modules are taught by Lecturers.
Lecturers belong to Departments.
Lecturers tutor Students.
Students can make Appointments with Lecturers.

Conceptual Model

ER Model Concepts:

Entity: Group of objects with the same properties, identified by the enterprise as having an independent existence.
Attributes: Relevant property (information/adjectives) about the entity.
Relationship: How entities are related, often denoted by verbs. Relationships should be named and their degree shown.
Example:
Doctor treats Patients.
ER Diagram to illustrate entities, attributes, and relationships.

Normalization Process

1NF to 2NF:

Ensure no repeating groups or multi-valued attributes.
Ensure all non-primary-key attributes are fully functionally dependent on the primary key.
2NF to 3NF:
Ensure no transitive dependencies.
Decompose relations as necessary without losing data.

Summary

Advanced modeling is an iterative process.
Many-to-Many relationships require a “composite entity” for implementation.
Fan Traps and Chasm Traps need to be verified at the conceptual level.
3rd Normal Form verification is better achieved in the relational model by adding some dummy data.

Questions

Q&A session to address any queries.
Through the above content, students will understand the process of creating a conceptual model, verifying normalization to the 3rd normal form, and applying these concepts to practical exercises.

第七讲

Lecture Objectives

Define database tables and constraints in PostgreSQL.
Manipulate existing database tables in PostgreSQL.

Outline

DBMS
SQL Recap
SQL - Data Definition Language (DDL)
Create Tables, Domains, Data Types, Default Values
Database Integrity
Constraints
SQL - Data Manipulation Language (DML)
Summary
Q&A

Database Management System (DBMS)

Common DBMS view: Create multiple databases, e.g., Content Management, Sales, Payroll.
SQL statements to create a database, e.g., CREATE DATABASE ContentManagement.

SQL Language

Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Control Language (DCL)
Transaction Control Language (TCL)

Usual Practice

Use a file of SQL commands (.SQL) as a script to create the database.
Constraints can be added in a separate file.
Scripts are rerunnable and can have comments.

SQL CREATE TABLE Statement

Used to create a new table in a database.

Syntax example:

CREATE TABLE EMP (
    EMPNO INTEGER,
    ENAME VARCHAR(10),
    JOB VARCHAR(9),
    MGR INT,
    HIREDATE DATE,
    SAL INTEGER,
    COMM INTEGER,
    DEPTNO INTEGER
);

Drop an existing table before creating it:
```
DROP TABLE IF EXISTS emp;
```

Creating Tables by Copying

Create a table as a copy of another:

CREATE TABLE new_table AS SELECT * FROM old_table;

Copy the structure without rows:

CREATE TABLE new_table AS SELECT * FROM old_table WHERE 1=2;

Common Data Types

Text or Character:
- char(5): Fixed length of 5 characters.
- varchar(100): Variable length up to 100 characters.
- varchar: Variable length string up to 65,535 bytes (PostgreSQL).
Integer:
- SMALLINT: Range -32,768 to 32,767, size 2 bytes.
- INT: Range -2,147,483,648 to 2,147,483,647, size 4 bytes.
- SERIAL: Automatically generated integers.
Floating-point Number:
- float(n): Floating-point numbers with n precision, up to 8 bytes.
- float8 or real: 4-byte floating-point numbers.
Temporal Data Types:
- DATE: Stores dates.
- TIME: Stores time of day values.
- TIMESTAMP: Stores date and time values.
- TIMESTAMPTZ: Timezone-aware timestamp.
- INTERVAL: Periods of time.

PostgreSQL Dates

DATE: Stores year, month, day, default format YYYY-MM-DD.
TIME: Stores hour, minute, second, millisecond, default format HH:MM:S.MS.
TIMESTAMP: Stores year, month, day, hour, minute, second, millisecond, default format YYYY-MM-DD HH:MM:S.MS.

Creating Domains

CREATE DOMAIN creates a new domain, essentially a data type with optional constraints.
```
CREATE DOMAIN validStaffNo AS INT CHECK (VALUE BETWEEN 0 AND 2000);
```

Use domain to create a new table:

CREATE TABLE staff4 (
    staffno validStaffNo,
    name VARCHAR(20),
    job VARCHAR(20)
);

Default Values

Default value assigned by the database if no specific value is given.

CREATE TABLE staff(
    staffno NUMERIC(4),
    name VARCHAR(20),
    job VARCHAR(20) DEFAULT 'Assistant' NOT NULL
);

Modifying Existing Tables

Add columns:

ALTER TABLE tablename ADD columnname datatype;

Drop columns:

ALTER TABLE tablename DROP COLUMN columnname;

Auto Generating Primary Key

PostgreSQL provides SERIAL and SEQUENCE facilities.

Using SERIAL:
```
CREATE TABLE table_name(id SERIAL);
```

Using SEQUENCE:

CREATE SEQUENCE table_name_id_seq;
CREATE TABLE table_name (
    id integer NOT NULL DEFAULT nextval('table_name_id_seq')
);

Database Integrity

Data integrity enforced by a series of constraints or rules.
Entity Integrity: No duplicate rows.
Domain Integrity: Valid attribute values, e.g., age 0-200.
Referential Integrity: Rows cannot be deleted if referenced elsewhere, e.g., department with employees.
User-Defined Integrity: Business rules, e.g., salary > commission.

Database Constraints

PRIMARY KEY: Uniquely identifies each record.
FOREIGN KEY: Ensures referenced value exists in the parent table.
CHECK: Enforces conditions.
NOT NULL: Ensures values are not null.
UNIQUE: Ensures unique values.

SQL Data Manipulation Language (DML)

INSERT: Insert data into tables.

INSERT INTO tablename(col_1,.., col_n) VALUES ('x_1',.., 'x_n');

UPDATE: Update existing rows.

UPDATE emp SET sal = 3000 WHERE empno = 7788;

DELETE: Delete rows.

DELETE FROM emp WHERE deptno = (SELECT deptno FROM dept WHERE dname = 'SALES');

Summary

SQL statements CREATE, ALTER, DROP are DDL; INSERT, UPDATE, DELETE, SELECT are DML.
Various PostgreSQL data types can be used, such as NUMERIC, VARCHAR, INTEGER.
Database integrity ensures business rules consistency.
Constraints ensure database integrity: PRIMARY KEY, FOREIGN KEY, CHECK, NOT NULL, UNIQUE.

第八讲

Lecture Objectives

Create transactions to perform multiple changes to the data as a logical unit of work.
Explain the ‘ACID’ model for implementing transactions.
Relate to the different database data control commands.
Identify how to implement transactions, grant, and revoke privileges in PostgreSQL.

Outline

Transactions
- Definition
- Boundaries
- Transaction States
Data Control Language (DCL)
ACID Model
Summary
Q&A

SQL and Its Four Sub-languages

Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Control Language (DCL)
Transaction Control Language (TCL)

Transactions

A transaction is an action, or a series of actions, carried out as a logical unit of work by a user or an application that reads or updates the database contents.
It transforms the database from one consistent state into another and is used as a mechanism for concurrency and recovery control in the database.
It consists of one or more changes (INSERT, UPDATE, DELETE) associated with DML commands.

Transaction Example

Suppose you want to transfer $5 from Account A to Account B. This action can be broken down into the following steps:

Create a record to transfer $5 from Account A to Account B. This is typically the beginning of a database transaction.
Read the balance from Account A.
Subtract $5 from the balance of Account A.
Read the balance from Account B.
Add $5 to the balance of Account B.

The database can run these operations as a single logical unit — either all operations are executed, or none are.

Transaction Boundaries

A transaction begins:
- On login
- At the end of the last transaction
- In some RDBMSs, with: BEGIN TRANSACTION (e.g., PostgreSQL)
A transaction ends with:
- COMMIT / ROLLBACK
- Any DDL statement (e.g., CREATE, DELETE, TRUNCATE, etc.)
- Logout
- But not with an ungraceful exit, process kill, or interrupt

`COMMIT` Command

Marks the successful end of a transaction.
Means that changes are now permanent within the database.
AUTOCOMMIT mode is the default in PostgreSQL, meaning that every DML command is immediately made permanent.

Rolling Back Transactions

Use the SQL ROLLBACK command to revert actions, sometimes called ‘undo’.
All RDBMSs must support ROLLBACK.
Saving changes is expensive (consider a multi-million-row delete), so transactions should be as short as possible.

Savepoints

Not ANSI standard.
You can set up multiple savepoints within transactions:
- SAVEPOINT s1: DML before the savepoint is saved but not committed.
- ROLLBACK TO SAVEPOINT s1: Whole transaction can be committed or rolled back.
Most useful when SQL is embedded in a procedural language with conditional constructs.

Transaction States

Active: The transaction is active during its execution; it is the initial state.
Partially Committed: After the final statement has been executed, the transaction is partially committed.
Failed: Once it is discovered that the transaction cannot be executed normally, its state is failed.
Aborted: The transaction is aborted once it is rolled back and the database is restored to its state before the transaction’s execution.
Committed: The transaction is committed after it is completed successfully.

Data Control Language (DCL)

To allow other database users to view or modify the contents of any data in a database, you must grant them the privileges to do so.
Grant privileges syntax:
```
GRANT privileges ON object TO user;
```

Example of Granting Privileges

Grant some privileges (e.g., SELECT, INSERT, UPDATE, DELETE) on a table named products to a user named user111:
```
GRANT SELECT, INSERT, UPDATE, DELETE ON products TO user111;
```
Grant all permissions:
```
GRANT ALL ON products TO user111;
```
Grant only SELECT access (read-only):
```
GRANT SELECT ON products TO user111;
```

Example of Revoking Privileges

Revoke some privileges (e.g., UPDATE, DELETE) on a table named products from a user named user111:
```
REVOKE UPDATE, DELETE ON products FROM user111;
```
Revoke all permissions:
```
REVOKE ALL ON products FROM user111;
```

Revoke only SELECT access:

REVOKE SELECT ON products FROM user111;

ACID Transactions

Atomic: Either all changes (operations) of the transaction are reflected in the database, or none are.
Consistent: The database remains in a consistent state as transactions are executed in isolation.
Isolated: Transaction A cannot see Transaction B’s changes until B has finished (committed), although they are allowed to execute concurrently. Intermediate transaction results must be hidden from other concurrently executed transactions.
Durable: Committed changes of transactions must persist even in a system crash/failure.

Implementation of Atomicity and Durability

Supported and implemented in the recovery management component of the database system.
Example: Shadow-database (copy) scheme:
- All updates are made on the shadow copy of the database.
- The db_pointer points to the updated shadow copy after the transaction reaches a partial commit and updated pages have been flushed to disk.
- If the transaction fails, the consistent copy to which db_pointer refers is used, and the shadow copy is deleted.
- Assumptions: No hard disk failures, only one transaction active at a time, no handling of concurrent transactions needed.

Summary

A transaction is a logical unit of work. It consists of one or more changes to the database using one or more commands: INSERT, UPDATE, DELETE, etc.
Transactions must conform to the ‘ACID’ model.
Transactions are managed using the TCL commands, i.e., COMMIT and ROLLBACK.
Privileges to access and modify tables in the database can be set with DCL commands: GRANT, REVOKE.

第九讲Views & Stored Procedures

Objectives

Explain the meaning of Views in a database.
Explain what a Stored Procedure is and their purpose.
Identify the different database manipulations possible with Views.
Relate to the different steps needed to create a Stored Procedure.
Describe the different SQL commands to implement Views and Stored Procedures.

Outline

Views
- Definition
- Advantages
- Create & Query Views
Stored Procedures
- Definition
- Structure
- Function Arguments
- Variable Declaration
- Variable Assignments
Summary
Q&A

Views

Definition:

A view is a virtual table based on a SELECT statement.
The SELECT statement can obtain columns, computed columns, aliases, and aggregate functions from one or several tables.
A view is similar to a real table; the fields are obtained from the so-called base tables.
Syntax:

CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition;

Advantages of Views over Tables:

Security: Users can be given permission to query a view but not the underlying base tables, thus limiting data exposure.
Simplicity: Views can simplify queries by combining data from multiple tables into a single view.
Efficiency: Views consume very little storage space since only the definition is stored, and the view is updated dynamically.
Up-to-date Information: Views are recreated on demand each time they are invoked, reflecting new data additions.
Creating Views in PostgreSQL:

Example: Create a view with employee number, name, and salary of all managers.

CREATE VIEW managers AS 
SELECT empno, ename, sal
FROM emp
WHERE job = 'MANAGER';

Querying Views:

Views can be used anywhere a table name is expected in SQL statements.
They can be used inside SELECT statements, stored procedures, or within other views.

Example:

SELECT empno
FROM managers;

This query delivers the same result as:

SELECT empno
FROM emp
WHERE job = 'MANAGER';

Renaming Attributes in a View in PostgreSQL:

You can give attributes different names in the view.

CREATE VIEW managers_view (id, name, location) AS 
SELECT emp.empno, emp.ename, dept.loc
FROM emp, dept
WHERE emp.deptno = dept.deptno AND emp.job = 'MANAGER';

Modifying a View:

Typically, views cannot be modified directly, but the underlying tables can be modified through the view.
SQL rules for updating views:
- The SELECT clause must list enough attributes to insert tuples into the view and the base tables. Remaining attributes will be filled with NULL or default values.
  Examples:

INSERT:

CREATE VIEW salesmen_view (id, name, salary, job) AS 
SELECT empno, ename, sal, job
FROM emp
WHERE job = 'SALESMAN';
INSERT INTO salesmen_view VALUES (8000, 'STAHL', 9000, 'SALESMAN');

DELETE:

DELETE FROM salesmen_view WHERE salary < 1500;

UPDATE:

UPDATE salesmen_view SET salary = 3000 WHERE name = 'STAHL';

DROP:
```
DROP VIEW salesmen_view;
```

Stored Procedures

Definition:

A stored procedure is a named collection of computations and SQL statements stored in the database.
They encapsulate and represent business transactions, allowing multiple SQL statements to be executed by calling the stored procedure.
Advantages:

Reduction of Network Traffic: Stored procedures are stored on the database server, eliminating the need to transmit SQL statements over the network.
Code Reusability and Isolation: They reduce code duplication by allowing code sharing and isolation, minimizing errors and development costs.
What is PL/pgSQL?

PL/pgSQL is a block-structured procedural language used to implement stored procedures in PostgreSQL.
It contains a declaration section and an executable section and is closely coupled with SQL.
Anatomy of a Stored Procedure:

Basic structure:

CREATE FUNCTION function_name (arguments)
RETURNS return_type AS $$
BEGIN
-- statements
END; $$
LANGUAGE plpgsql;

Example: Simple Standalone Procedure

Increment function:

CREATE FUNCTION increment(IN val integer) RETURNS integer AS $$
BEGIN
RETURN val + 1;
END; $$
LANGUAGE plpgsql;
SELECT increment(10);

Function Arguments:

Example:

CREATE OR REPLACE FUNCTION divisionIntegers(a integer, b integer) RETURNS float AS $$
BEGIN
RETURN a / b::float;
END; $$
LANGUAGE plpgsql;

Variable Declarations:

Variables can be declared within the DECLARE section of the function.

Example:

CREATE OR REPLACE FUNCTION circleArea(radius integer) RETURNS float AS $$
DECLARE
pi float;
BEGIN
pi := 3.14;
RETURN radius * radius * pi::float;
END; $$
LANGUAGE plpgsql;

Assignments of Composite Variables:

Assign values using SELECT statement:
```
SELECT expression INTO target;
```

Example:

CREATE OR REPLACE FUNCTION testCompositeDeclarations() RETURNS text AS $$
DECLARE
department dept%rowtype;
BEGIN
SELECT 50, 'PURCHASE', 'READING' INTO department.deptno, department.dname, department.loc;
RETURN department;
END; $$
LANGUAGE plpgsql;

Summary to Stored Procedures

Stored Procedures are collections of computations and SQL statements stored in the database.
They are block-structured and composed of a declaration section (for variables) and an executable section (for assignments and calculations).
Assignments in Stored Procedures can be done using the SQL SELECT command.

第十讲Views & Stored Procedures

Lecture Objectives

Understand the ‘I’ (Isolated) in ACID transactions
Define concurrency
Describe the problems of concurrent access
Outline the RDBMS mechanisms to control concurrency
Identify ANSI transaction isolation levels
Define locking control and deadlocks

Outline

Transactions and ACID review
Concurrency
Concurrency Problems
ANSI/SQL Isolation Levels
Locking

Transactions and ACID Review

Transactions

A transaction is an action, or a series of actions carried out as a logical unit of work, by a user or an application that reads or updates the database contents.
It transforms the database from one consistent state into another and is used as a mechanism for concurrency and recovery control in the database.
It consists of one or more changes (INSERT, UPDATE, DELETE) associated with DML commands.
ACID Transactions
Atomic: Either all changes (operations) of the transaction are reflected in the database, or none are.
Consistent: Preserves the database in a consistent state as transactions are executed in isolation.
Isolated: Transaction A cannot see Transaction B’s changes until B has finished (committed), although they are allowed to execute concurrently. Intermediate transaction results must be hidden from other concurrently executed transactions.
Durable: Committed changes of transactions must persist even in a system crash/failure.

Concurrency

Role and Advantages of DBMSs

Improved data sharing.
Better data integration.
Minimized data inconsistency (storing different versions of the same data).
Improved data access (quick answers to ad hoc queries).
Improved decision making.
Concurrent Database Use
Concurrency allows many users to work on the same data.
Maximizes data access.
Minimizes wait time.
Ensures data is always in a correct state with no violation of integrity constraints.
Concurrency is a major issue for every RDBMS.
Queries are less problematic than transactions because a transaction is a set of logically related changes.

Concurrency Problems

Dirty Read
- Occurs when a transaction reads a row that has been modified by another transaction but not yet committed.
- Example: User U1 is updating data, while User U2 is reporting data. If U1 rolls back, U2’s report is incorrect.
Lost Update
- Occurs when multiple transactions select the same row and update it based on the original value.
- Example: User U1 increases salary, User U2 deducts tax based on the same original salary, resulting in one update being lost.
Non-Repeatable Read
- Occurs when a transaction reads data before and after another transaction modifies it.
- Example: User U1 updates a table, User U2 reports on it, resulting in inconsistent reads.
Phantom Read
- Occurs when the same query executed twice returns different rows.
- Example: User U1 inserts new rows, User U2 reports on the table, resulting in different row counts.
  The Scheduler

Severe problems can arise when two or more concurrent transactions are executed.
The scheduler is a special DBMS process that establishes the order in which the operations are executed.
It interleaves the execution of database operations to ensure serializability and isolation of transactions.
The scheduler’s main job is to create a serializable schedule of transaction operations, yielding the same results as if the transactions were executed serially.

Concurrency Control with Locking

Locking

Locking is one of the most common techniques used in concurrency control.
A lock guarantees exclusive use of data items to a concurrent transaction.
A transaction (e.g., T2) does not have access to a data item currently used by another transaction (e.g., T1).
A transaction acquires a lock prior to data access; the lock is released when the transaction is completed.
Locking Modes

Access Exclusive Lock: Prevents other transactions from accessing or modifying the locked object, used for destructive operations.
Exclusive Lock: Prevents other transactions from modifying the locked object, but allows reading.
Share Lock: Allows multiple transactions to read the same object concurrently, but prevents modifications.
Row Share and Row Exclusive Lock: Controls concurrent access to individual rows of a table.
Deadlocks

Deadlocks occur when two or more transactions are waiting for each other to release locks.
Example: Tx_1 updates row 7 and tries to update row 3, while Tx_2 updates row 3 and tries to update row 7.
PostgreSQL automatically detects deadlocks and aborts one of the involved transactions.

ANSI/SQL Isolation Levels

Read Uncommitted
- Transactions may see not-yet-committed changes by other transactions.
- Offers no protection against data inconsistency.
- Not possible in Oracle, possible (but undesirable) in other RDBMSs like PostgreSQL.
Read Committed
- Forces transactions to read only committed data.
- Data can change after it is read.
- Default level for most vendors.
Repeatable Read
- Ensures that row queries return consistent results.
- Keeps read and write locks on the selected data until the transaction ends.
- Does not prevent new rows with the selected criteria from being added.
Serializable
- Applies read and write locks until the transaction ends (similar to Repeatable Read).
- If a write collision is detected among transactions, only one is executed, and the rest fail.
- Ensures complete isolation but is slower.
  Isolation Levels and Problems

Isolation Level	Dirty Read	Lost Updates	Non-Repeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible	Possible
Read Committed	Not Possible	Possible	Possible	Possible
Repeatable Read	Not Possible	Not Possible	Not Possible	Possible
Serializable	Not Possible	Not Possible	Not Possible	Not Possible

Locking Notes

All commercial RDBMSs must implement safe locking.
The challenge is to preserve speed of response.
Developers should:
- Minimize lock contention by keeping transactions short.
- Avoid explicit locking unless necessary.
- Avoid deadlocks by accessing tables in the same order.

Summary

Concurrency allows many users to work on the same data.
However, concurrency can cause consistency problems such as dirty reads, lost updates, non-repeatable reads, and phantom reads.
ANSI/SQL isolation levels in transactions aim to avoid these consistency problems.
Locks can be used to prevent readers and writers from blocking each other, but locks can still introduce deadlocks.

handsome · 2024 年6 月 19 日 08:13

好长，谢谢

magelaile · 2024 年6 月 19 日 08:32

真长谢谢

Ferrari · 2024 年6 月 19 日 08:34

真长。。。

YanTeng_Duan · 2024 年6 月 19 日 08:57

你真长

binq · 2024 年6 月 19 日 09:06

参考书是什么？

slx · 2024 年6 月 19 日 09:23

我们老师的ppt整理的，不知道参考书

话题		回复	浏览量
DDD（领域驱动设计）系列第三讲 - Repository模式软件开发	4	154	2024 年6 月 4 日
有关于Python笔记系列的一些Q&A 软件开发	3	283	2024 年7 月 2 日
学生物的研究生要为课题组搭建数据库网站需要学习哪些知识？快问快答	5	222	2024 年4 月 9 日
String 字符串操作新增了什么？软件开发	6	474	2024 年2 月 21 日
求es数据库学习资料快问快答	6	329	2024 年4 月 1 日