本科课程-数据库知识点(英文)

第一讲Introduction

Lecture Objectives

  • Describe what Database Management Systems (DBMSs) are.
  • Recognize the problem of Data Redundancy.
  • Explain the basic Relational Data Model and some basic Relational Data Model concepts.
  • Relate SQL basic concepts.

Outline

  1. DBMS
  2. Data Redundancy
  3. File System vs Databases
  4. Relational Model
  5. RDBMS
  6. SQL
  7. Questions and Answers

Storing & Managing Information

  • Relational database models focus on storing data efficiently.
  • They enable retrieval of specific information and obtaining new insights from stored data.

What is a Database and DBMS?

  • A database is a shared integrated computer structure that stores a collection of end-user data and metadata.
  • Metadata provides a description of the data characteristics and links (relationships) between sets of data.
  • A database is an organized electronic filing cabinet designed to define, manipulate, retrieve, and manage data.
  • A software called the Database Management System (DBMS) helps manage the database’s contents.

Role and Advantages of DBMSs

  • Improved data sharing.
  • Better data integration.
  • Minimized data inconsistency (storing different versions of the same data).
  • Improved data access (quick answers to ad hoc queries).
  • Improved decision making.

Why Database Design is Important?

  • Focus on transactional databases designed to support a company’s day-to-day operations.
  • Database design defines the structure of the database to create and maintain data consistency and improve operational speed.
  • A well-designed database facilitates data management through the generation of accurate and valuable information.
  • A poorly designed database may lead to errors and bad decision-making, potentially causing organizational failure.

Historical Roots: Manual File Systems

  • Managers in small organizations kept manual file systems to track important data.
  • Manual file systems are slow and cumbersome for querying.
  • Example questions:
    • What products did I sell last month, quarter, or year?
    • How does my sales volume this year compare to last year?

Historical Roots: Electronic File Systems

  • Initially similar to manual file systems but automated.
  • Created file structures and software to manage these structures and generate reports automatically.
  • File systems grew rapidly, leading to challenges in data management and the need for more robust computer systems.
  • Data redundancy and inconsistency were common due to the dispersion of data across different locations.

Data Redundancy: Islands of Information

  • Data redundancy leads to data inconsistency, with different versions of the same data existing in different places.
  • Example scenario: Sales agent updates address in HR but not in the sales department, leading to inconsistency.

Data Redundancy: Data Anomalies

  • Update Anomalies: If a location changes, many rows need to be updated.
  • Insertion Anomalies: Creating a department without employees results in many NULL values.
  • Deletion Anomalies: Deleting the last employee in a department can result in the loss of the department.

Database Systems vs File Systems

  • File Systems: Data spread across different locations, leading to redundancy and inconsistency.
  • Database Systems: Centralized data management reduces redundancy and ensures consistency.

Overview of the Relational Model

  • First proposed by British data scientist E.F. Codd in 1970.
  • Based on mathematical concepts of sets, tuples, and relations.
  • Set: Collection of items of the same type with no order or duplicates.
  • Tuple: Finite ordered list of elements (n-tuple is a sequence of n elements).
  • Relation: Set of n-tuples with no order on n-tuples.

Relational Model Components

  • Data stored within relations (tables).
  • Relation consists of attributes (columns) and tuples (rows).
  • Attributes have associated domains (set of all possible values).

Advantages of Relational DBMS (RDBMS)

  • Designed specifically for relational databases.
  • Based on sound mathematical theory.
  • Implemented with a declarative language for querying.
  • Provides a secure, consistent repository with scalable capability.
  • No unnecessary repetition of data.
  • Data mapped logically, independent of physical location/storage.

RDBMS Architecture

  • External Level: Customized user views of the data.
  • Conceptual Level: Community view of all available data, independent of storage details.
  • Internal Level: Describes physical storage of data.

Disadvantages of RDBMS

  • Requires substantial memory and processing power.
  • Needs rigorous data modeling.
  • Data redundancy and anomalies can occur with an ill-defined database model.
  • Best suited for environments where data structures and relationships are well-known.

Structured Query Language (SQL)

  • Standard language for relational databases accepted by ANSI and ISO/IEC.
  • Provides statements for querying, inserting, updating, deleting rows, and managing database objects and access.

SQL Sublanguages

  • Data Definition Language (DDL): Defines database structure.
  • Data Manipulation Language (DML): Manipulates data.
  • Data Control Language (DCL): Controls access to the database.
  • Transaction Control Language (TCL): Manages transactions.

Summary

  • The module will teach relational databases.
  • Historical roots lie in file systems, which suffer from data redundancy and anomalies.
  • Relational databases, based on solid mathematical principles, address many file system issues.
  • Adequate data modeling techniques are essential to prevent data anomalies in relational databases.
    This overview provides a comprehensive understanding of databases, DBMS, and their historical context, highlighting the importance of well-designed relational databases and the role of SQL in managing them.

第二讲Data Modelling I

Lecture Objectives

  • Identify the three phases of database design.
  • Model data by creating an Entity-Relationship (ER) model.

Outline

  1. Three Phases of Database Design
    • Conceptual
    • Logical
    • Physical
  2. Creating the Conceptual Model
  3. Entity-Relationship (ER) Model
  4. Questions and Answers

Criteria to Produce an Optimal Data Model

  • Structural validity: Consistency with the way the enterprise defines and organizes information.
  • Simplicity: Ease of understanding by Information Systems professionals and non-technical users.
  • Expressiveness: Ability to distinguish between different data, relationships between data, and constraints.
  • Non-redundancy: Exclusion of extraneous information; representation of any piece of information exactly once.
  • Shareability: Not specific to any particular application or technology, usable by many.
  • Integrity: Consistency with the way the enterprise uses and manages information.
  • Extensibility: Ability to evolve to support new requirements with minimal effect on existing users.
  • Representation: Ability to represent a model using an easily understood diagrammatic notation.

Three Phases of Design

  1. Conceptual Design
    • Independent of implementation considerations.
    • Built using user requirements specifications.
    • Feeds into the logical design phase.
  2. Logical Design
    • Enterprise model based on a specific data model (e.g., relational).
    • Independent of DBMS and physical storage.
    • Used by software applications.
    • Conceptual model mapped to a logical data model.
    • Ensures no information is lost.
    • Includes normalization.
  3. Physical Design
    • How the data is stored in databases.
    • Describes file organizations, indexes, and distribution for efficient data access.
    • Focuses on security and efficient implementation of constraints/integrity rules.
    • Tailored to a specific DBMS.

Conceptual Model

  • Entity-Relationship Model (ER)
    • Described by Chen in 1976.
    • Based on strong mathematical foundations (Set Theory, Mathematical Relations, Modern Algebra, Logic).
    • Graphical and easy to understand.
    • Several variations, including Chen and UML representations, and cardinality specifications (Crowfeet, SSADM, Min-Max).

Requirements Analysis for ER Model

  • Understand what the organization does.
  • Identify roles within the organization.
  • Determine information requirements for each type of user.
  • Identify business rules or constraints.
  • Iteratively refine and test the model.
  • Clarify assumptions as soon as possible.
  • Continuously verify with users.

ER Model Concepts

  • Entity Type: Group of objects with the same properties, identified as having independent existence.
  • Entity Occurrence: Uniquely identifiable instance of an entity type.
    • Example: “Doctor” is an entity type; “Dr. Hibbert” and “Dr. Nick” are instances.
  • Attributes: Relevant properties (e.g., adjectives) about the entity.
    • At least one identifying attribute or combination of attributes for each entity occurrence.
    • Example attributes for a “Patient” entity might include hair color, date of birth, and religion.
  • Attribute Domain: The possible set of values for an attribute.
    • Examples:
      • Hair_Color: brown, black, blonde, white, grey, brunette
      • Date_of_Birth: date
      • Student_ID: integer
  • Attribute Types:
    • Simple Attribute: Single component with an independent existence (e.g., surname, salary).
    • Composite Attribute: Multiple components (e.g., address, name).
    • Single-Valued Attribute: Holds a single value for each entity instance (e.g., student number).
    • Multi-Valued Attribute: Holds multiple values for each entity instance (e.g., company phone numbers).
    • Derived Attribute: Value derivable from related attributes or another entity type (e.g., total salary = salary + bonus).

ER Model Example

  • Relationships: How entities are related.
    • Use verbs to denote relationships (e.g., doctors treat patients).
    • Each entity should be related to at least one other entity.
  • Relationship Types: Set of meaningful associations among entity types.
  • Relationship Occurrences: Unique associations including one instance from each participating entity type.
  • Relationship Representation: Lines between entity boxes denote relationships; relationships should be named.

Constraints and Assumptions

  • Constraints: Rules ensuring the validity of data (e.g., “Every patient must be registered with a doctor”).
  • Assumptions: Represent expectations about reality, to be clarified during the modeling process (e.g., “Patients will only have one appointment per day”).

Conceptual Model Summary

  • Steps to Create an ER Model:
    • Identify entity types.
    • Determine attributes.
    • Identify relationship types.
    • Associate attributes with entity or relationship types.
    • Add additional constraints and assumptions as necessary.

Summary

  • Database design consists of three stages: Conceptual, Logical, and Physical.
  • Good database design is challenging, and skilled practitioners are rare.
  • Database design is not an exact science; multiple solutions can exist for the same problem.
  • The main output of the conceptual stage is the ER model, which includes entities, attributes, relationships, and constraints.

第三讲Data Modelling II

Lecture Objectives

  • Implement the remaining phases of the database design: Logical and Physical.
  • Understand relational database data integrity rules.
  • Describe and implement the Date-Codd notation.

Outline

  1. Recap
    • Main Entity Relationship Model (ER): Entity, Attributes, Relationships, Constraints
  2. Three Phases of Database Design: Logical/Physical
    • Logical Design
    • Physical Design
  3. Attributes domains
  4. Keys and Foreign Keys
  5. Integrity rules
  6. Date-Codd Notation
  7. Questions and Answers

Recap – Data Modelling

  • Main Entity Relationship Model (ER)
    • Entity: Represents real-world objects or concepts.
    • Attributes: Properties or details about the entities.
    • Relationships: How entities relate to each other.
    • Constraints: Rules that ensure data integrity.

Three Phases of Database Design

Conceptual Design

  • Independent of implementation considerations.
  • Based on user requirements specifications.
  • Provides input to the logical design phase.

Logical Design

  • Enterprise model based on a specific data model (e.g., relational).
  • Independent of DBMS and physical storage.
  • Used by software applications.
  • Ensures no information is lost.
  • Includes normalization.

Physical Design

  • Describes how data is stored in databases.
  • Focuses on file organizations, indexes, and distribution for efficient data access.
  • Implements security and integrity rules.
  • Tailored to a specific DBMS.

Properties of a Table (Relation)

  • 2D structure comprising rows and columns.
  • Each row (tuple) describes a single entity of the relation.
  • No duplicate rows allowed.
  • Each column represents an attribute with a distinct name.
  • Each cell contains a single atomic value.
  • All values in a column must have the same data format.
  • Each attribute has a specified range of values (attribute domain).
  • The order of rows and columns is insignificant to the DBMS.
  • Each table must have a combination of attributes that uniquely identify each row.

Attributes Domains

  • Definition: The possible set of values for an attribute.
  • Examples:
    • Hair_Color: brown, black, blonde, white, grey, brunette
    • Date_of_Birth: date
    • Student_ID: integer

Benefits of Establishing Data Types

  • Storage Efficiency: Optimizes space on disk.
  • Predictions: Allows the DBMS to manage memory allocation efficiently.
  • Data Consistency: Ensures only valid data is accepted.

Common Data Type Categories

  • Text or Character:
    • char(5): Always 5 characters in length.
    • varchar(100): Variable length up to 100 characters.
    • varchar: Variable length string up to 65,535 bytes.
  • Integer:
    • SMALLINT: Range -32,768 to 32,767, 2-byte size.
    • INT: Range -2,147,483,648 to 2,147,483,647, 4-byte size.
    • SERIAL: Automatically generated integers.
  • Floating-Point Number:
    • float(n): Floating-point numbers with n precision, up to 8 bytes.
    • float8 or real: 4-byte floating-point numbers.
  • Temporal Data Types:
    • DATE: Stores dates.
    • TIME: Stores time of day.
    • TIMESTAMP: Stores date and time.
    • TIMESTAMPTZ: Timezone-aware timestamp.
    • INTERVAL: Stores periods of time.

Relational Database Keys

  • Primary Key: A candidate key selected to uniquely identify each occurrence of an entity type.
  • Composite Key: A candidate key that consists of two or more attributes.
  • Natural Key: Identifying key derived from the entity’s existing attributes (e.g., NHS number for a patient).
  • Surrogate Key: An artificial key introduced for efficiency, usually an integer (e.g., patient number).

Controlled Redundancy

  • Ensures that the relationships in the database work properly.
  • Common attributes link tables together.
  • Necessary duplication of values is not considered redundant.
  • Unnecessary duplication is considered redundancy.

Foreign Keys

  • Attributes whose values match values of a candidate key in another relation.
  • Used to combine information from multiple relations (tables).

Integrity Rules

  • Entity Integrity: Ensures the primary key has no null values and all entries are unique.
  • Referential Integrity: Ensures that foreign keys properly reference primary keys in another table.

Relational Model – Date-Codd

  • Codd-Date Definition:
    • No diagrams, tables referred to as relations.
    • Uses predicate logic and extensive mathematics.
    • Based on Codd-Dates rules.
    • Uses textual notation to express relations.

Date-Codd Steps

  1. Name the Relation:
    • Example: Employee
  2. List Attribute Domains:
    • Example: Empno: int, EName: char(50), Salary: float, DeptNo: int
  3. Specify Keys and Attributes:
    • Example: primary key Empno NOT NULL, foreign key DeptNo references Department NOT NULL
  4. Add Constraints:
    • Example: An employee can be assigned to a single department at most.

Example: Employee and Department Relations

Employee Relation

Attribute Domain Key
Empno int Primary Key, NOT NULL
EName char(50)
Salary float
DeptNo int Foreign Key, NOT NULL

Department Relation

Attribute Domain Key
DeptNo int Primary Key, NOT NULL
DeptName char(50)
Loc varchar(13)

Example with Constraints

Employee Relation

Attribute Domain Key and Constraints
Empno int Primary Key, NOT NULL
EName char(50)
Salary float
DeptNo int Foreign Key references Department, NOT NULL
Constraint: An employee can be assigned to a single department at most.

Department Relation

Attribute Domain Key and Constraints
DeptNo int Primary Key, NOT NULL
DeptName char(50)
Loc varchar(13)
Constraint: A department can have many or no employees assigned.

Summary

  • Concepts such as keys, foreign keys, and domains need to be defined in the logical/physical database design phases.
  • The model must follow entity and referential integrity rules.
  • The Codd-Date notation provides an alternative way to represent a database design.

第四讲Normalisation

Lecture Objectives

  • Identify and describe the steps of Normalisation.
  • Analyse data to define a strategy to implement the normalisation process to remove data anomalies, unnecessary and uncontrolled data redundancy, and ensure data quality.

Outline

  • Recap
  • Normalisation
  • Dependencies
  • First Normal Form (1NF)
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Summary
  • Q&A

Recap – Data Modelling

  • Main Entity Relationship Model (ER):
    • Entity
    • Attributes
    • Relationships
    • Constraints
    • Domains
    • Keys
    • Foreign Keys

Table (Relation) Design Steps:

  1. Collect User Requirements
  2. Construct ER Diagram
  3. Transform into Relations
  4. Normalise

What is Normalisation?

  • Method of stripping down each relation to its essential attributes, avoiding repetition and data loss.
  • Technique usually results in more relations/tables, each with fewer attributes.
  • A well-designed database should comply with three levels of normalisation (1NF, 2NF, 3NF).
  • Much of normalisation theory is common sense but needs to be understood.

Functional Dependency of Attributes/Columns

  • Applies to attributes within a single relation/table.
  • Attribute B is functionally dependent on attribute A if for every value of A, there is only one possible value of B.
  • Functional dependency can also involve combinations of attributes (e.g., A,B,C → D).

Functional Dependency Example

  • Ename is functionally dependent on Empno:
    • Empno → Ename
    • For every value of Empno, there is only one possible value of Ename.

Full Functional Dependency

  • Functional dependencies are full if the dependent attribute depends on all the determinant attributes, not on a subset.

Normalisation – First Normal Form (1NF)

  • A relation is in 1NF if:
    • There are no repeating groups in the table.
    • There are no multi-valued attributes.
    • All non-primary-key attributes are functionally dependent on the primary key.
    • Properly defined relations are already in 1NF.

1NF Example

Initial Table:

Invoice(Inv_no, Cust_no, Cust_address, Inv_date, Item_code, Item_desc, Item_price, Quantity, Price)
1      10      XY1 WZ3      10/09/14   123       Bed         600         2        1200
2      15      WX4 QA9      11/09/14   124,127   Desk, Chair 150, 50     3, 3     600

Steps to achieve 1NF:

  1. Split attributes with repeated values.
  2. Move non-repeating and repeating attributes to separate relations.
  3. Identify primary keys and establish relationships with foreign keys.

Normalisation – Second Normal Form (2NF)

  • A relation is in 2NF if:
    • It is in 1NF.
    • Every non-primary-key attribute is fully functionally dependent on the primary key.
  • Non-loss decomposition is used to achieve 2NF, ensuring no data loss.

2NF Example

Original Table:

Invoice(Inv_no, Cust_no, Cust_address, Inv_date)
1      10      XY1 WZ3      10/09/14
2      15      WX4 QA9      11/09/14

Invoice Detail:

Invoice_Detail(Inv_no, Item_code, Item_desc, Item_price, Quantity, Price)
1      123       Bed         600        2       1200
2      124       Desk        150        3       450
2      127       Chair       50         3       150

Steps to achieve 2NF:

  1. Move attributes not fully dependent on the primary key to a new relation with the determining key.
  2. Update original and new relations accordingly.

Normalisation – Third Normal Form (3NF)

  • A relation is in 3NF if:
    • It is in 2NF.
    • Every non-primary-key attribute depends only on the primary key (no transitive dependencies).

3NF Example

Steps to achieve 3NF:

  1. Identify non-primary-key attributes that depend on other non-primary-key attributes.
  2. Move these attributes to a new relation with the correct primary key.
  3. Maintain relationships by preserving foreign keys.

Summary of Normalised Model

  • 1NF: Remove repeating groups and define a primary key.
  • 2NF: Ensure full functional dependency for non-primary-key attributes.
  • 3NF: Remove transitive dependencies, ensuring non-primary-key attributes depend only on the primary key.

Higher Forms of Normalisation

  • 4NF: Eliminates multiple multi-valued dependencies.
  • 5NF: Eliminates all join dependencies not implied by candidate keys.
  • While 3NF is sufficient for most practical purposes, higher normal forms are available for specific scenarios.

In the Real World

  • Some denormalisation may be needed for performance reasons (e.g., summary information in a data warehouse or copies of tables at both ends of a network).
  • Always normalise first, then denormalise if necessary, knowing the reasons for doing so.

Summary

  • Normalisation is the process of structuring a relational database to reduce redundancy and improve data integrity.
  • A well-designed database should ideally comply with the third normal form (3NF) or higher.
  • Normalisation ensures the data is free from anomalies (insertion, update, and deletion anomalies).

第五讲Modelling III

Lecture Objectives

  • Identify advanced modelling concepts and structures.
  • Recognize possible solutions to unravel issues related to advanced data modelling structures.

Outline

  • Many-to-Many Relationships
  • More than One Relationship Between Two Entities
  • Recursive Relationships
  • Binary/Ternary Relationships
  • Weak Entities
  • ER Connection Traps
    • Fan Traps
    • Chasm Traps

Many-to-Many Relationships

  • Many-to-Many relationships should be resolved before transforming the ER model into a relational model.
  • Example: A student is registered in many modules, and each module has many students.
  • Can’t use foreign keys, as multiple values would be needed.

More than One Relationship Between Two Entities

  • It’s good practice for an entity to have at least one relationship with another entity in the diagram.
  • An entity may have more than one relationship with another entity.

Recursive Relationships

  • An entity may have a relationship with itself, sometimes called a ‘Pig’s ear’ relationship.
  • Example: An employee may supervise other employees.

Binary/Ternary Relationships

  • A relationship can involve more than two entities.
  • Example: An employee uses a skill on a project.
  • Ternary Relationship Example: A sale involving a car, customer, and salesperson.

Weak Entities

  • Weak entities are entities whose very existence depends on another entity.
  • They inherit at least part of their primary key from the entity to which they are related.
  • Example: Employees’ dependents, salary history.

ER Connection Traps

  • Problems may arise when designing a conceptual data model called connection traps.
  • Often due to a misinterpretation of the meaning of certain relationships.
  • Two main types of connection traps:
    • Fan Traps
    • Chasm Traps

Fan Traps

  • Occurs when a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous.
  • Example: Identifying which branch an employee works in when there are multiple paths between entities.

Fan Traps Example

  • Problem: Which branch does employee ‘SA9’ work in?
  • Solution: Clearly define the relationship between staff, division, and branch.

Chasm Traps

  • Occurs when a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences.
  • Example: Determining which surgery a doctor works in when the relationship is not explicitly defined.

Chasm Traps Example

  • Problem: Which surgery does doctor ‘Doc007’ work in?
  • Solution: Explicitly define the relationship between doctor, surgery, and office.

Summary of Advanced Modelling

  1. Advanced modelling addresses potential inconsistencies during the implementation of relational models.
  2. Many-to-Many relationships need a “composite entity” for implementation.
  3. Recursive relationships can be solved using aliases during database queries.
  4. Ternary and higher-order relationships are needed for more complex relationships.
  5. There are two common problems when designing a conceptual model using ER diagrams: Fan Traps and Chasm Traps.

第六讲course work

Lecture Objectives

  • Produce the Conceptual Model in a Data Model Design example.
  • Verify the 3rd Normal Form of the model.

Outline

  1. Conceptual Model Review
  2. Normalization Review
  3. Exercise

3 Phases of Design – Conceptual

  • The conceptual model used in an enterprise is independent of all implementation considerations.
  • The data model is built using user’s requirements specifications.
  • It feeds into the logical design phase.

Conceptual Model

ER Model – Getting Started

  • Requirements Analysis:
    • What does the organization do?
    • What are the roles within it?
    • What are the information requirements for each kind of user?
    • What are the ‘business rules’ or constraints?
    • Modelling is iterative; the model needs to be refined and tested.
    • Assumptions may be required during development but should be clarified ASAP.
    • Keep checking with the users.

Normalization – First Normal Form (1NF)

  • A relation is in 1NF if:
    • There are no repeating groups in the table.
    • There are no multi-valued attributes.
    • All non-primary-key attributes are functionally dependent on the primary key.
    • Properly defined relations must have a unique primary key as a unique identifier.
    • A relation not in 1NF might contain duplicate rows.

Normalization – Second Normal Form (2NF)

  • A relation is in 2NF if:
    • It is in 1NF.
    • Every non-primary-key attribute is fully functionally dependent on the primary key.
    • A relation with a single-attribute primary key must be in 2NF by definition.
    • We reduce data to 2NF by non-loss decomposition (usually need more relations but can’t afford to lose data).

Normalization – Third Normal Form (3NF)

  • A relation is in 3NF if:
    • It is in 2NF.
    • Every non-primary-key attribute depends only on the primary key (no transitive dependencies).
    • We reduce data to 3NF by non-loss decomposition (usually need more relations but can’t afford to lose data).

Exercise Scenario

A university department offers courses. Each course has a name, such as “Computer Science,” and a degree, such as “BSc” or “MSc.” A course is made up of several modules. A module consists of a name, module code, year of study (i.e., 1, 2, 3, or 4), and the term in which it is taught (spring or autumn). Students are enrolled in a course and take modules of that course. A student has a name, a registration number, and is enrolled in a course. Each module is taught by a lecturer. A lecturer has a name and works in a department. Also, each lecturer is a tutor of a group of students. Students can make appointments with a lecturer. An appointment has a date and location between a lecturer and a student.

Entities and Attributes

Entities:

  • Module
    • Attributes: Module_Code (PK), Module_Name, Year, Term
  • Appointment
    • Attributes: Lecture_No (PK), Student_Reg (PK), Date (PK), Location
  • Course
    • Attributes: Course_Name (PK), Degree
  • Student
    • Attributes: Student_Reg (PK), Student_Name
  • Department
    • Attributes: Dep_Code (PK), Dep_Name
  • Lecturer
    • Attributes: Lecturer_No (PK), Lecturer_Name
      Relationships:
  • Courses contain Modules.
  • Students enroll in Courses.
  • Students take Modules.
  • Modules are taught by Lecturers.
  • Lecturers belong to Departments.
  • Lecturers tutor Students.
  • Students can make Appointments with Lecturers.

Conceptual Model

ER Model Concepts:

  • Entity: Group of objects with the same properties, identified by the enterprise as having an independent existence.
  • Attributes: Relevant property (information/adjectives) about the entity.
  • Relationship: How entities are related, often denoted by verbs. Relationships should be named and their degree shown.
    Example:
  • Doctor treats Patients.
  • ER Diagram to illustrate entities, attributes, and relationships.

Normalization Process

1NF to 2NF:

  • Ensure no repeating groups or multi-valued attributes.
  • Ensure all non-primary-key attributes are fully functionally dependent on the primary key.
    2NF to 3NF:
  • Ensure no transitive dependencies.
  • Decompose relations as necessary without losing data.

Summary

  • Advanced modeling is an iterative process.
  • Many-to-Many relationships require a “composite entity” for implementation.
  • Fan Traps and Chasm Traps need to be verified at the conceptual level.
  • 3rd Normal Form verification is better achieved in the relational model by adding some dummy data.

Questions

  • Q&A session to address any queries.
    Through the above content, students will understand the process of creating a conceptual model, verifying normalization to the 3rd normal form, and applying these concepts to practical exercises.

第七讲

Lecture Objectives

  • Define database tables and constraints in PostgreSQL.
  • Manipulate existing database tables in PostgreSQL.

Outline

  1. DBMS
  2. SQL Recap
  3. SQL - Data Definition Language (DDL)
  4. Create Tables, Domains, Data Types, Default Values
  5. Database Integrity
  6. Constraints
  7. SQL - Data Manipulation Language (DML)
  8. Summary
  9. Q&A

Database Management System (DBMS)

  • Common DBMS view: Create multiple databases, e.g., Content Management, Sales, Payroll.
  • SQL statements to create a database, e.g., CREATE DATABASE ContentManagement.

SQL Language

  • Data Definition Language (DDL)
  • Data Manipulation Language (DML)
  • Data Control Language (DCL)
  • Transaction Control Language (TCL)

Usual Practice

  • Use a file of SQL commands (.SQL) as a script to create the database.
  • Constraints can be added in a separate file.
  • Scripts are rerunnable and can have comments.

SQL CREATE TABLE Statement

  • Used to create a new table in a database.
  • Syntax example:
    CREATE TABLE EMP (
        EMPNO INTEGER,
        ENAME VARCHAR(10),
        JOB VARCHAR(9),
        MGR INT,
        HIREDATE DATE,
        SAL INTEGER,
        COMM INTEGER,
        DEPTNO INTEGER
    );
    
  • Drop an existing table before creating it:
    DROP TABLE IF EXISTS emp;
    

Creating Tables by Copying

  • Create a table as a copy of another:
    CREATE TABLE new_table AS SELECT * FROM old_table;
    
  • Copy the structure without rows:
    CREATE TABLE new_table AS SELECT * FROM old_table WHERE 1=2;
    

Common Data Types

  • Text or Character:
    • char(5): Fixed length of 5 characters.
    • varchar(100): Variable length up to 100 characters.
    • varchar: Variable length string up to 65,535 bytes (PostgreSQL).
  • Integer:
    • SMALLINT: Range -32,768 to 32,767, size 2 bytes.
    • INT: Range -2,147,483,648 to 2,147,483,647, size 4 bytes.
    • SERIAL: Automatically generated integers.
  • Floating-point Number:
    • float(n): Floating-point numbers with n precision, up to 8 bytes.
    • float8 or real: 4-byte floating-point numbers.
  • Temporal Data Types:
    • DATE: Stores dates.
    • TIME: Stores time of day values.
    • TIMESTAMP: Stores date and time values.
    • TIMESTAMPTZ: Timezone-aware timestamp.
    • INTERVAL: Periods of time.

PostgreSQL Dates

  • DATE: Stores year, month, day, default format YYYY-MM-DD.
  • TIME: Stores hour, minute, second, millisecond, default format HH:MM:S.MS.
  • TIMESTAMP: Stores year, month, day, hour, minute, second, millisecond, default format YYYY-MM-DD HH:MM:S.MS.

Creating Domains

  • CREATE DOMAIN creates a new domain, essentially a data type with optional constraints.
    CREATE DOMAIN validStaffNo AS INT CHECK (VALUE BETWEEN 0 AND 2000);
    
  • Use domain to create a new table:
    CREATE TABLE staff4 (
        staffno validStaffNo,
        name VARCHAR(20),
        job VARCHAR(20)
    );
    

Default Values

  • Default value assigned by the database if no specific value is given.
    CREATE TABLE staff(
        staffno NUMERIC(4),
        name VARCHAR(20),
        job VARCHAR(20) DEFAULT 'Assistant' NOT NULL
    );
    

Modifying Existing Tables

  • Add columns:
    ALTER TABLE tablename ADD columnname datatype;
    
  • Drop columns:
    ALTER TABLE tablename DROP COLUMN columnname;
    

Auto Generating Primary Key

  • PostgreSQL provides SERIAL and SEQUENCE facilities.
    • Using SERIAL:
      CREATE TABLE table_name(id SERIAL);
      
    • Using SEQUENCE:
      CREATE SEQUENCE table_name_id_seq;
      CREATE TABLE table_name (
          id integer NOT NULL DEFAULT nextval('table_name_id_seq')
      );
      

Database Integrity

  • Data integrity enforced by a series of constraints or rules.
  • Entity Integrity: No duplicate rows.
  • Domain Integrity: Valid attribute values, e.g., age 0-200.
  • Referential Integrity: Rows cannot be deleted if referenced elsewhere, e.g., department with employees.
  • User-Defined Integrity: Business rules, e.g., salary > commission.

Database Constraints

  • PRIMARY KEY: Uniquely identifies each record.
  • FOREIGN KEY: Ensures referenced value exists in the parent table.
  • CHECK: Enforces conditions.
  • NOT NULL: Ensures values are not null.
  • UNIQUE: Ensures unique values.

SQL Data Manipulation Language (DML)

  • INSERT: Insert data into tables.
    INSERT INTO tablename(col_1,.., col_n) VALUES ('x_1',.., 'x_n');
    
  • UPDATE: Update existing rows.
    UPDATE emp SET sal = 3000 WHERE empno = 7788;
    
  • DELETE: Delete rows.
    DELETE FROM emp WHERE deptno = (SELECT deptno FROM dept WHERE dname = 'SALES');
    

Summary

  • SQL statements CREATE, ALTER, DROP are DDL; INSERT, UPDATE, DELETE, SELECT are DML.
  • Various PostgreSQL data types can be used, such as NUMERIC, VARCHAR, INTEGER.
  • Database integrity ensures business rules consistency.
  • Constraints ensure database integrity: PRIMARY KEY, FOREIGN KEY, CHECK, NOT NULL, UNIQUE.

第八讲

Lecture Objectives

  • Create transactions to perform multiple changes to the data as a logical unit of work.
  • Explain the ‘ACID’ model for implementing transactions.
  • Relate to the different database data control commands.
  • Identify how to implement transactions, grant, and revoke privileges in PostgreSQL.

Outline

  1. Transactions
    • Definition
    • Boundaries
    • Transaction States
  2. Data Control Language (DCL)
  3. ACID Model
  4. Summary
  5. Q&A

SQL and Its Four Sub-languages

  • Data Definition Language (DDL)
  • Data Manipulation Language (DML)
  • Data Control Language (DCL)
  • Transaction Control Language (TCL)

Transactions

  • A transaction is an action, or a series of actions, carried out as a logical unit of work by a user or an application that reads or updates the database contents.
  • It transforms the database from one consistent state into another and is used as a mechanism for concurrency and recovery control in the database.
  • It consists of one or more changes (INSERT, UPDATE, DELETE) associated with DML commands.

Transaction Example

Suppose you want to transfer $5 from Account A to Account B. This action can be broken down into the following steps:

  1. Create a record to transfer $5 from Account A to Account B. This is typically the beginning of a database transaction.
  2. Read the balance from Account A.
  3. Subtract $5 from the balance of Account A.
  4. Read the balance from Account B.
  5. Add $5 to the balance of Account B.
  • The database can run these operations as a single logical unit — either all operations are executed, or none are.

Transaction Boundaries

  • A transaction begins:
    • On login
    • At the end of the last transaction
    • In some RDBMSs, with: BEGIN TRANSACTION (e.g., PostgreSQL)
  • A transaction ends with:
    • COMMIT / ROLLBACK
    • Any DDL statement (e.g., CREATE, DELETE, TRUNCATE, etc.)
    • Logout
    • But not with an ungraceful exit, process kill, or interrupt

COMMIT Command

  • Marks the successful end of a transaction.
  • Means that changes are now permanent within the database.
  • AUTOCOMMIT mode is the default in PostgreSQL, meaning that every DML command is immediately made permanent.

Rolling Back Transactions

  • Use the SQL ROLLBACK command to revert actions, sometimes called ‘undo’.
  • All RDBMSs must support ROLLBACK.
  • Saving changes is expensive (consider a multi-million-row delete), so transactions should be as short as possible.

Savepoints

  • Not ANSI standard.
  • You can set up multiple savepoints within transactions:
    • SAVEPOINT s1: DML before the savepoint is saved but not committed.
    • ROLLBACK TO SAVEPOINT s1: Whole transaction can be committed or rolled back.
  • Most useful when SQL is embedded in a procedural language with conditional constructs.

Transaction States

  • Active: The transaction is active during its execution; it is the initial state.
  • Partially Committed: After the final statement has been executed, the transaction is partially committed.
  • Failed: Once it is discovered that the transaction cannot be executed normally, its state is failed.
  • Aborted: The transaction is aborted once it is rolled back and the database is restored to its state before the transaction’s execution.
  • Committed: The transaction is committed after it is completed successfully.

Data Control Language (DCL)

  • To allow other database users to view or modify the contents of any data in a database, you must grant them the privileges to do so.
  • Grant privileges syntax:
    GRANT privileges ON object TO user;
    

Example of Granting Privileges

  • Grant some privileges (e.g., SELECT, INSERT, UPDATE, DELETE) on a table named products to a user named user111:
    GRANT SELECT, INSERT, UPDATE, DELETE ON products TO user111;
    
  • Grant all permissions:
    GRANT ALL ON products TO user111;
    
  • Grant only SELECT access (read-only):
    GRANT SELECT ON products TO user111;
    

Example of Revoking Privileges

  • Revoke some privileges (e.g., UPDATE, DELETE) on a table named products from a user named user111:
    REVOKE UPDATE, DELETE ON products FROM user111;
    
  • Revoke all permissions:
    REVOKE ALL ON products FROM user111;
    
  • Revoke only SELECT access:
    REVOKE SELECT ON products FROM user111;
    

ACID Transactions

  • Atomic: Either all changes (operations) of the transaction are reflected in the database, or none are.
  • Consistent: The database remains in a consistent state as transactions are executed in isolation.
  • Isolated: Transaction A cannot see Transaction B’s changes until B has finished (committed), although they are allowed to execute concurrently. Intermediate transaction results must be hidden from other concurrently executed transactions.
  • Durable: Committed changes of transactions must persist even in a system crash/failure.

Implementation of Atomicity and Durability

  • Supported and implemented in the recovery management component of the database system.
  • Example: Shadow-database (copy) scheme:
    • All updates are made on the shadow copy of the database.
    • The db_pointer points to the updated shadow copy after the transaction reaches a partial commit and updated pages have been flushed to disk.
    • If the transaction fails, the consistent copy to which db_pointer refers is used, and the shadow copy is deleted.
    • Assumptions: No hard disk failures, only one transaction active at a time, no handling of concurrent transactions needed.

Summary

  • A transaction is a logical unit of work. It consists of one or more changes to the database using one or more commands: INSERT, UPDATE, DELETE, etc.
  • Transactions must conform to the ‘ACID’ model.
  • Transactions are managed using the TCL commands, i.e., COMMIT and ROLLBACK.
  • Privileges to access and modify tables in the database can be set with DCL commands: GRANT, REVOKE.

第九讲Views & Stored Procedures

Objectives

  • Explain the meaning of Views in a database.
  • Explain what a Stored Procedure is and their purpose.
  • Identify the different database manipulations possible with Views.
  • Relate to the different steps needed to create a Stored Procedure.
  • Describe the different SQL commands to implement Views and Stored Procedures.

Outline

  1. Views
    • Definition
    • Advantages
    • Create & Query Views
  2. Stored Procedures
    • Definition
    • Structure
    • Function Arguments
    • Variable Declaration
    • Variable Assignments
  3. Summary
  4. Q&A

Views

Definition:

  • A view is a virtual table based on a SELECT statement.
  • The SELECT statement can obtain columns, computed columns, aliases, and aggregate functions from one or several tables.
  • A view is similar to a real table; the fields are obtained from the so-called base tables.
    Syntax:
CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition;

Advantages of Views over Tables:

  1. Security: Users can be given permission to query a view but not the underlying base tables, thus limiting data exposure.
  2. Simplicity: Views can simplify queries by combining data from multiple tables into a single view.
  3. Efficiency: Views consume very little storage space since only the definition is stored, and the view is updated dynamically.
  4. Up-to-date Information: Views are recreated on demand each time they are invoked, reflecting new data additions.
    Creating Views in PostgreSQL:
  • Example: Create a view with employee number, name, and salary of all managers.
    CREATE VIEW managers AS 
    SELECT empno, ename, sal
    FROM emp
    WHERE job = 'MANAGER';
    

Querying Views:

  • Views can be used anywhere a table name is expected in SQL statements.
  • They can be used inside SELECT statements, stored procedures, or within other views.
  • Example:
    SELECT empno
    FROM managers;
    
    This query delivers the same result as:
    SELECT empno
    FROM emp
    WHERE job = 'MANAGER';
    

Renaming Attributes in a View in PostgreSQL:

  • You can give attributes different names in the view.
    CREATE VIEW managers_view (id, name, location) AS 
    SELECT emp.empno, emp.ename, dept.loc
    FROM emp, dept
    WHERE emp.deptno = dept.deptno AND emp.job = 'MANAGER';
    

Modifying a View:

  • Typically, views cannot be modified directly, but the underlying tables can be modified through the view.
  • SQL rules for updating views:
    • The SELECT clause must list enough attributes to insert tuples into the view and the base tables. Remaining attributes will be filled with NULL or default values.
      Examples:
  • INSERT:
    CREATE VIEW salesmen_view (id, name, salary, job) AS 
    SELECT empno, ename, sal, job
    FROM emp
    WHERE job = 'SALESMAN';
    INSERT INTO salesmen_view VALUES (8000, 'STAHL', 9000, 'SALESMAN');
    
  • DELETE:
    DELETE FROM salesmen_view WHERE salary < 1500;
    
  • UPDATE:
    UPDATE salesmen_view SET salary = 3000 WHERE name = 'STAHL';
    
  • DROP:
    DROP VIEW salesmen_view;
    

Stored Procedures

Definition:

  • A stored procedure is a named collection of computations and SQL statements stored in the database.
  • They encapsulate and represent business transactions, allowing multiple SQL statements to be executed by calling the stored procedure.
    Advantages:
  1. Reduction of Network Traffic: Stored procedures are stored on the database server, eliminating the need to transmit SQL statements over the network.
  2. Code Reusability and Isolation: They reduce code duplication by allowing code sharing and isolation, minimizing errors and development costs.
    What is PL/pgSQL?
  • PL/pgSQL is a block-structured procedural language used to implement stored procedures in PostgreSQL.
  • It contains a declaration section and an executable section and is closely coupled with SQL.
    Anatomy of a Stored Procedure:
  • Basic structure:
    CREATE FUNCTION function_name (arguments)
    RETURNS return_type AS $$
    BEGIN
    -- statements
    END; $$
    LANGUAGE plpgsql;
    

Example: Simple Standalone Procedure

  • Increment function:
    CREATE FUNCTION increment(IN val integer) RETURNS integer AS $$
    BEGIN
    RETURN val + 1;
    END; $$
    LANGUAGE plpgsql;
    SELECT increment(10);
    

Function Arguments:

  • Example:
    CREATE OR REPLACE FUNCTION divisionIntegers(a integer, b integer) RETURNS float AS $$
    BEGIN
    RETURN a / b::float;
    END; $$
    LANGUAGE plpgsql;
    

Variable Declarations:

  • Variables can be declared within the DECLARE section of the function.
  • Example:
    CREATE OR REPLACE FUNCTION circleArea(radius integer) RETURNS float AS $$
    DECLARE
    pi float;
    BEGIN
    pi := 3.14;
    RETURN radius * radius * pi::float;
    END; $$
    LANGUAGE plpgsql;
    

Assignments of Composite Variables:

  • Assign values using SELECT statement:
    SELECT expression INTO target;
    

Example:

CREATE OR REPLACE FUNCTION testCompositeDeclarations() RETURNS text AS $$
DECLARE
department dept%rowtype;
BEGIN
SELECT 50, 'PURCHASE', 'READING' INTO department.deptno, department.dname, department.loc;
RETURN department;
END; $$
LANGUAGE plpgsql;

Summary to Stored Procedures

  • Stored Procedures are collections of computations and SQL statements stored in the database.
  • They are block-structured and composed of a declaration section (for variables) and an executable section (for assignments and calculations).
  • Assignments in Stored Procedures can be done using the SQL SELECT command.

第十讲Views & Stored Procedures

Lecture Objectives

  • Understand the ‘I’ (Isolated) in ACID transactions
  • Define concurrency
  • Describe the problems of concurrent access
  • Outline the RDBMS mechanisms to control concurrency
  • Identify ANSI transaction isolation levels
  • Define locking control and deadlocks

Outline

  1. Transactions and ACID review
  2. Concurrency
  3. Concurrency Problems
  4. ANSI/SQL Isolation Levels
  5. Locking

Transactions and ACID Review

Transactions

  • A transaction is an action, or a series of actions carried out as a logical unit of work, by a user or an application that reads or updates the database contents.
  • It transforms the database from one consistent state into another and is used as a mechanism for concurrency and recovery control in the database.
  • It consists of one or more changes (INSERT, UPDATE, DELETE) associated with DML commands.
    ACID Transactions
  • Atomic: Either all changes (operations) of the transaction are reflected in the database, or none are.
  • Consistent: Preserves the database in a consistent state as transactions are executed in isolation.
  • Isolated: Transaction A cannot see Transaction B’s changes until B has finished (committed), although they are allowed to execute concurrently. Intermediate transaction results must be hidden from other concurrently executed transactions.
  • Durable: Committed changes of transactions must persist even in a system crash/failure.

Concurrency

Role and Advantages of DBMSs

  • Improved data sharing.
  • Better data integration.
  • Minimized data inconsistency (storing different versions of the same data).
  • Improved data access (quick answers to ad hoc queries).
  • Improved decision making.
    Concurrent Database Use
  • Concurrency allows many users to work on the same data.
  • Maximizes data access.
  • Minimizes wait time.
  • Ensures data is always in a correct state with no violation of integrity constraints.
  • Concurrency is a major issue for every RDBMS.
  • Queries are less problematic than transactions because a transaction is a set of logically related changes.

Concurrency Problems

  1. Dirty Read
    • Occurs when a transaction reads a row that has been modified by another transaction but not yet committed.
    • Example: User U1 is updating data, while User U2 is reporting data. If U1 rolls back, U2’s report is incorrect.
  2. Lost Update
    • Occurs when multiple transactions select the same row and update it based on the original value.
    • Example: User U1 increases salary, User U2 deducts tax based on the same original salary, resulting in one update being lost.
  3. Non-Repeatable Read
    • Occurs when a transaction reads data before and after another transaction modifies it.
    • Example: User U1 updates a table, User U2 reports on it, resulting in inconsistent reads.
  4. Phantom Read
    • Occurs when the same query executed twice returns different rows.
    • Example: User U1 inserts new rows, User U2 reports on the table, resulting in different row counts.
      The Scheduler
  • Severe problems can arise when two or more concurrent transactions are executed.
  • The scheduler is a special DBMS process that establishes the order in which the operations are executed.
  • It interleaves the execution of database operations to ensure serializability and isolation of transactions.
  • The scheduler’s main job is to create a serializable schedule of transaction operations, yielding the same results as if the transactions were executed serially.

Concurrency Control with Locking

Locking

  • Locking is one of the most common techniques used in concurrency control.
  • A lock guarantees exclusive use of data items to a concurrent transaction.
  • A transaction (e.g., T2) does not have access to a data item currently used by another transaction (e.g., T1).
  • A transaction acquires a lock prior to data access; the lock is released when the transaction is completed.
    Locking Modes
  1. Access Exclusive Lock: Prevents other transactions from accessing or modifying the locked object, used for destructive operations.
  2. Exclusive Lock: Prevents other transactions from modifying the locked object, but allows reading.
  3. Share Lock: Allows multiple transactions to read the same object concurrently, but prevents modifications.
  4. Row Share and Row Exclusive Lock: Controls concurrent access to individual rows of a table.
    Deadlocks
  • Deadlocks occur when two or more transactions are waiting for each other to release locks.
  • Example: Tx_1 updates row 7 and tries to update row 3, while Tx_2 updates row 3 and tries to update row 7.
  • PostgreSQL automatically detects deadlocks and aborts one of the involved transactions.

ANSI/SQL Isolation Levels

  1. Read Uncommitted
    • Transactions may see not-yet-committed changes by other transactions.
    • Offers no protection against data inconsistency.
    • Not possible in Oracle, possible (but undesirable) in other RDBMSs like PostgreSQL.
  2. Read Committed
    • Forces transactions to read only committed data.
    • Data can change after it is read.
    • Default level for most vendors.
  3. Repeatable Read
    • Ensures that row queries return consistent results.
    • Keeps read and write locks on the selected data until the transaction ends.
    • Does not prevent new rows with the selected criteria from being added.
  4. Serializable
    • Applies read and write locks until the transaction ends (similar to Repeatable Read).
    • If a write collision is detected among transactions, only one is executed, and the rest fail.
    • Ensures complete isolation but is slower.
      Isolation Levels and Problems
Isolation Level Dirty Read Lost Updates Non-Repeatable Read Phantom Read
Read Uncommitted Possible Possible Possible Possible
Read Committed Not Possible Possible Possible Possible
Repeatable Read Not Possible Not Possible Not Possible Possible
Serializable Not Possible Not Possible Not Possible Not Possible

Locking Notes

  • All commercial RDBMSs must implement safe locking.
  • The challenge is to preserve speed of response.
  • Developers should:
    • Minimize lock contention by keeping transactions short.
    • Avoid explicit locking unless necessary.
    • Avoid deadlocks by accessing tables in the same order.

Summary

  • Concurrency allows many users to work on the same data.
  • However, concurrency can cause consistency problems such as dirty reads, lost updates, non-repeatable reads, and phantom reads.
  • ANSI/SQL isolation levels in transactions aim to avoid these consistency problems.
  • Locks can be used to prevent readers and writers from blocking each other, but locks can still introduce deadlocks.
1 个赞

好长,谢谢

1 个赞

真长 谢谢

真长。。。

你真长

参考书是什么?

我们老师的ppt整理的,不知道参考书

1 个赞