06 Aug Chapter 11?Review Questions 1-10 located at the end of your Database Systems?Chapter 11?book.?Please use complete sentences and proper grammar.? 1-?Chapter 13?Review Questions 1-15
Using a word document with a cover sheet –
Complete:
1- Chapter 11 Review Questions 1-10 located at the end of your Database Systems Chapter 11 book. Please use complete sentences and proper grammar.
1- Chapter 13 Review Questions 1-15 located at the end of your Database Systems Chapter 13 book. Please use complete sentences and proper grammar.
Database Systems: Design, Implementation, and
Management Tenth Edition
Chapter 13 Business Intelligence and Data
Warehouses
Objectives
In this chapter, you will learn: • How business intelligence provides a
comprehensive business decision support framework
• About business intelligence architecture, its evolution, and reporting styles
• About the relationship and differences between operational data and decision support data
• What a data warehouse is and how to prepare data for one
Database Systems, 10th Edition 2
Objectives (cont’d.)
• What star schemas are and how they are constructed
• About data analytics, data mining, and predictive analytics
• About online analytical processing (OLAP) • How SQL extensions are used to support
OLAP-type data manipulations
Database Systems, 10th Edition 3
The Need for Data Analysis
• Managers track daily transactions to evaluate how the business is performing
• Strategies should be developed to meet organizational goals using operational databases
• Data analysis provides information about short- term tactical evaluations and strategies
Database Systems, 10th Edition 4
Business Intelligence
• Comprehensive, cohesive, integrated tools and processes – Capture, collect, integrate, store, and analyze
data
– Generate information to support business decision making
• Framework that allows a business to transform: – Data into information
– Information into knowledge
– Knowledge into wisdom Database Systems, 10th Edition 5
Business Intelligence Architecture
• Composed of data, people, processes, technology, and management of components
• Focuses on strategic and tactical use of information
• Key performance indicators (KPI) – Measurements that assess company’s
effectiveness or success in reaching goals
• Multiple tools from different vendors can be integrated into a single BI framework
Database Systems, 10th Edition 6
Database Systems, 10th Edition 7
Business Intelligence Benefits
• Main goal: improved decision making • Other benefits
– Integrating architecture
– Common user interface for data reporting and analysis
– Common data repository fosters single version of company data
– Improved organizational performance
Database Systems, 10th Edition 8
Business Intelligence Evolution
Database Systems, 10th Edition 9
Database Systems, 10th Edition 10
Business Intelligence Technology Trends
• Data storage improvements • Business intelligence appliances • Business intelligence as a service • Big Data analytics • Personal analytics
Database Systems, 10th Edition 11
Decision Support Data
• BI effectiveness depends on quality of data gathered at operational level
• Operational data seldom well-suited for decision support tasks
• Need reformat data in order to be useful for business intelligence
Database Systems, 10th Edition 12
Operational Data vs. Decision Support Data
• Operational data – Mostly stored in relational database – Optimized to support transactions representing
daily operations
• Decision support data differs from operational data in three main areas: – Time span
– Granularity
– Dimensionality
Database Systems, 10th Edition 13
Database Systems, 10th Edition 14
Decision Support Database Requirements
• Specialized DBMS tailored to provide fast answers to complex queries
• Three main requirements – Database schema
– Data extraction and loading
– Database size
Database Systems, 10th Edition 15
Decision Support Database Requirements (cont’d.)
• Database schema – Complex data representations – Aggregated and summarized data – Queries extract multidimensional time slices
• Data extraction and filtering – Supports different data sources
• Flat files • Hierarchical, network, and relational databases • Multiple vendors
– Checking for inconsistent data Database Systems, 10th Edition 16
Decision Support Database Requirements (cont’d.)
• Database size – In 2005, Wal-Mart had 260 terabytes of data in
its data warehouses
– DBMS must support very large databases (VLDBs)
Database Systems, 10th Edition 17
The Data Warehouse
• Integrated, subject-oriented, time-variant, and nonvolatile collection of data – Provides support for decision making
• Usually a read-only database optimized for data analysis and query processing
• Requires time, money, and considerable managerial effort to create
Database Systems, 10th Edition 18
Database Systems, 10th Edition 19
Data Marts
• Small, single-subject data warehouse subset • More manageable data set than data
warehouse • Provides decision support to small group of
people • Typically lower cost and lower implementation
time than data warehouse
Database Systems, 10th Edition 20
Twelve Rules That Define a Data Warehouse
Database Systems, 10th Edition 21
Star Schemas
• Data-modeling technique – Maps multidimensional decision support data
into relational database
• Creates near equivalent of multidimensional database schema from relational data
• Easily implemented model for multidimensional data analysis while preserving relational structures
• Four components: facts, dimensions, attributes, and attribute hierarchies
Database Systems, 10th Edition 22
Facts
• Numeric measurements that represent specific business aspect or activity – Normally stored in fact table that is center of star
schema
• Fact table contains facts linked through their dimensions
• Metrics are facts computed at run time
Database Systems, 10th Edition 23
Dimensions
• Qualifying characteristics provide additional perspectives to a given fact
• Decision support data almost always viewed in relation to other data
• Study facts via dimensions • Dimensions stored in dimension tables
Database Systems, 10th Edition 24
Attributes
• Use to search, filter, and classify facts • Dimensions provide descriptions of facts
through their attributes • No mathematical limit to the number of
dimensions • Slice and dice: focus on slices of the data cube
for more detailed analysis
Database Systems, 10th Edition 25
Attribute Hierarchies
• Provide top-down data organization • Two purposes:
– Aggregation
– Drill-down/roll-up data analysis
• Determine how the data are extracted and represented
• Stored in the DBMS’s data dictionary • Used by OLAP tool to access warehouse
properly
Database Systems, 10th Edition 26
Star Schema Representation
• Facts and dimensions represented in physical tables in data warehouse database
• Many fact rows related to each dimension row – Primary key of fact table is a composite primary
key
– Fact table primary key formed by combining foreign keys pointing to dimension tables
• Dimension tables are smaller than fact tables • Each dimension record is related to thousands
of fact records Database Systems, 10th Edition 27
Performance-Improving Techniques for the Star Schema
• Four techniques to optimize data warehouse design: – Normalizing dimensional tables
– Maintaining multiple fact tables to represent different aggregation levels
– Denormalizing fact tables
– Partitioning and replicating tables
Database Systems, 10th Edition 28
Performance-Improving Techniques for the Star Schema (cont’d.)
• Dimension tables normalized to: – Achieve semantic simplicity – Facilitate end-user navigation through the
dimensions
• Denormalizing fact tables improves data access performance and saves data storage space
• Partitioning splits table into subsets of rows or columns
• Replication makes copy of table and places it in different location
Database Systems, 10th Edition 29
Data Analytics
• Subset of BI functionality • Encompasses a wide range of mathematical,
statistical, and modeling techniques – Purpose of extracting knowledge from data
• Tools can be grouped into two separate areas: – Explanatory analytics
– Predictive analytics
Database Systems, 10th Edition 30
Data Mining
• Data-mining tools do the following: – Analyze data – Uncover problems or opportunities hidden in
data relationships
– Form computer models based on their findings – Use models to predict business behavior
• Runs in two modes – Guided
– Automated
Database Systems, 10th Edition 31
Database Systems, 10th Edition 32
Predictive Analytics
• Employs mathematical and statistical algorithms, neural networks, artificial intelligence, and other advanced modeling tools
• Create actionable predictive models based on available data
• Models are used in areas such as: – Customer relationships, customer service,
customer retention, fraud detection, targeted marketing, and optimized pricing
Database Systems, 10th Edition 33
Online Analytical Processing
• Three main characteristics: – Multidimensional data analysis techniques – Advanced database support
– Easy-to-use end-user interfaces
Database Systems, 10th Edition 34
Multidimensional Data Analysis Techniques
• Data are processed and viewed as part of a multidimensional structure
• Augmented by the following functions: – Advanced data presentation functions
– Advanced data aggregation, consolidation, and classification functions
– Advanced computational functions
– Advanced data modeling functions
Database Systems, 10th Edition 35
Advanced Database Support
• Advanced data access features include: – Access to many different kinds of DBMSs, flat
files, and internal and external data sources
– Access to aggregated data warehouse data
– Advanced data navigation – Rapid and consistent query response times
– Maps end-user requests to appropriate data source and to proper data access language
– Support for very large databases
Database Systems, 10th Edition 36
Easy-to-Use End-User Interface
• Advanced OLAP features are more useful when access is simple
• Many interface features are “borrowed” from previous generations of data analysis tools – Already familiar to end users
– Makes OLAP easily accepted and readily used
Database Systems, 10th Edition 37
OLAP Architecture
• Three main architectural components: – Graphical user interface (GUI) – Analytical processing logic
– Data-processing logic
Database Systems, 10th Edition 38
OLAP Architecture (cont’d.)
• Designed to use both operational and data warehouse data
• In most implementations, data warehouse and OLAP are interrelated and complementary
• OLAP systems merge data warehouse and data mart approaches
Database Systems, 10th Edition 39
Database Systems, 10th Edition 40
Relational OLAP
• Relational online analytical processing (ROLAP) provides the following extensions: – Multidimensional data schema support within the
RDBMS
– Data access language and query performance optimized for multidimensional data
– Support for very large databases (VLDBs)
Database Systems, 10th Edition 41
Multidimensional OLAP
• Multidimensional online analytical processing (MOLAP) extends OLAP functionality to multidimensional database management systems (MDBMSs) – MDBMS end users visualize stored data as a 3D
data cube
– Data cubes can grow to n dimensions, becoming hypercubes
– To speed access, data cubes are held in memory in a cube cache
Database Systems, 10th Edition 42
Relational vs. Multidimensional OLAP
• Selection of one or the other depends on evaluator’s vantage point
• Proper evaluation must include supported hardware, compatibility with DBMS, etc.
• ROLAP and MOLAP vendors working toward integration within unified framework
• Relational databases use star schema design to handle multidimensional data
Database Systems, 10th Edition 43
Database Systems, 10th Edition 44
SQL Extensions for OLAP
• Proliferation of OLAP tools fostered development of SQL extensions
• Many innovations have become part of standard SQL
• All SQL commands will work in data warehouse as expected
• Most queries include many data groupings and aggregations over multiple columns
Database Systems, 10th Edition 45
The ROLLUP Extension
• Used with GROUP BY clause to generate aggregates by different dimensions
• GROUP BY generates only one aggregate for each new value combination of attributes
• ROLLUP extension enables subtotal for each column listed except for the last one – Last column gets grand total
• Order of column list important
Database Systems, 10th Edition 46
The CUBE Extension
• CUBE extension used with GROUP BY clause to generate aggregates by listed columns – Includes the last column
• Enables subtotal for each column in addition to grand total for last column – Useful when you want to compute all possible
subtotals within groupings
• Cross-tabulations are good candidates for application of CUBE extension
Database Systems, 10th Edition 47
Materialized Views
• A dynamic table that contains SQL query command to generate rows – Also contains the actual rows
• Created the first time query is run and summary rows are stored in table
• Automatically updated when base tables are updated
Database Systems, 10th Edition 48
Summary
• Business intelligence generates information used to support decision making
• BI covers a range of technologies, applications, and functionalities
• Decision support systems were the precursor of current generation BI systems
• Operational data not suited for decision support
Database Systems, 10th Edition 49
Summary (cont’d.)
• Data warehouse provides support for decision making – Usually read-only
– Optimized for data analysis, query processing
• Star schema is a data-modeling technique – Maps multidimensional decision support data
into a relational database
• Star schema has four components: – Facts, dimensions, attributes, and attribute
hierarchies Database Systems, 10th Edition 50
Summary (cont’d.)
• Data analytics – Provides advanced data analysis tools to extract
knowledge from business data
• Data mining – Automates the analysis of operational data to
find previously unknown data characteristics, relationships, dependencies, and trends
• Predictive analytics – Uses information generated in the data-mining
phase to create advanced predictive models
Database Systems, 10th Edition 51
Summary (cont’d.)
• Online analytical processing (OLAP) – Advanced data analysis environment that
supports decision making, business modeling, and operations research
• SQL has been enhanced with extensions that support OLAP-type processing and data generation
Database Systems, 10th Edition 52
- Database Systems: Design, Implementation, and Management Tenth Edition
- Objectives
- Objectives (cont’d.)
- The Need for Data Analysis
- Business Intelligence
- Business Intelligence Architecture
- PowerPoint Presentation
- Business Intelligence Benefits
- Business Intelligence Evolution
- Slide 10
- Business Intelligence Technology Trends
- Decision Support Data
- Operational Data vs. Decision Support Data
- Slide 14
- Decision Support Database Requirements
- Decision Support Database Requirements (cont’d.)
- Slide 17
- The Data Warehouse
- Slide 19
- Data Marts
- Twelve Rules That Define a Data Warehouse
- Star Schemas
- Facts
- Dimensions
- Attributes
- Attribute Hierarchies
- Star Schema Representation
- Performance-Improving Techniques for the Star Schema
- Performance-Improving Techniques for the Star Schema (cont’d.)
- Data Analytics
- Data Mining
- Slide 32
- Predictive Analytics
- Online Analytical Processing
- Multidimensional Data Analysis Techniques
- Advanced Database Support
- Easy-to-Use End-User Interface
- OLAP Architecture
- OLAP Architecture (cont’d.)
- Slide 40
- Relational OLAP
- Multidimensional OLAP
- Relational vs. Multidimensional OLAP
- Slide 44
- SQL Extensions for OLAP
- The ROLLUP Extension
- The CUBE Extension
- Materialized Views
- Summary
- Summary (cont’d.)
- Slide 51
- Slide 52
,
Database Systems: Design, Implementation, and
Management Eighth Edition
Chapter 11 Database Performance Tuning and
Query Optimization
Database Systems, 8th Edition 2
Objectives
• In this chapter, you will learn: – Basic database performance-tuning concepts – How a DBMS processes SQL queries – About the importance of indexes in query processing – About the types of decisions the query optimizer has
to make – Some common practices used to write efficient SQL
code – How to formulate queries and tune the DBMS for
optimal performance – Performance tuning in SQL Server 2005
Database Systems, 8th Edition 3
11.1 Database Performance-Tuning Concepts
• Goal of database performance is to execute queries as fast as possible
• Database performance tuning – Set of activities and procedures designed to
reduce response time of database system
• All factors must operate at optimum level with minimal bottlenecks
• Good database performance starts with good database design
Database Systems, 8th Edition 4
Database Systems, 8th Edition 5
Performance Tuning: Client and Server
• Client side – Generate SQL query that returns correct answer
in least amount of time • Using minimum amount of resources at server
– SQL performance tuning
• Server side – DBMS environment configured to respond to
clients’ requests as fast as possible • Optimum use of existing resources
– DBMS performance tuning
Database Systems, 8th Edition 6
DBMS Architecture
• All data in database are stored in data files • Data files
– Automatically expand in predefined increments known as extends
– Grouped in file groups or table spaces • Table space or file group:
– Logical grouping of several data files that store data with similar characteristics
Database Systems, 8th Edition 7
Basic DBMS architecture
Database Systems, 8th Edition 8
DBMS Architecture (continued)
• Data cache or buffer cache: shared, reserved memory area – Stores most recently accessed data blocks in RAM
• SQL cache or procedure cache: stores most recently executed SQL statements – Also PL/SQL procedures, including triggers and
functions
• DBMS retrieves data from permanent storage and places it in RAM
Database Systems, 8th Edition 9
DBMS Architecture (continued)
• Input/output request: low-level data access operation to/from computer devices, such as memory, hard disks, videos, and printers
• Data cache is faster than data in data files – DBMS does not wait for hard disk to retrieve data
• Majority of performance-tuning activities focus on minimizing I/O operations
• Typical DBMS processes: – Listener, User, Scheduler, Lock manager, Optimizer
Database Systems, 8th Edition 10
Database Statistics
• Measurements about database objects and available resources – Tables, Indexes, Number of processors used,
Processor speed, Temporary space available • Make critical decisions about improving query
processing efficiency • Can be gathered manually by DBA or automatically by
DBMS – UPDATE STATISTICS table_name [index_name] – Auto-Update and Auto-Create Statistics option
• 資料庫屬性 -> 自動更新統計資料 • 資料庫屬性 -> 自動建立統計資料
Database Systems, 8th Edition 11
Database Systems, 8th Edition 12
Ch08: dbcc show_statistics (customer, PK__CUSTOMER__24927208 )
Ch08: dbcc show_statistics (customer, CUS_UI1)
補充 SQL Server 2005
Database Systems, 8th Edition 13
11.2 Query Processing
• DBMS processes queries in three phases – Parsing
• DBMS parses the query and chooses the most efficient access/execution plan
– Execution • DBMS executes the query using chosen
execution plan
– Fetching • DBMS fetches the data and sends the result back
to the client
Database Systems, 8th Edition 14 Query Processing
Database Systems, 8th Edition 15
SQL Parsing Phase
• Break down query into smaller units • Transform original SQL query into slightly
different version of original SQL code – Fully equivalent
• Optimized query results are always the same as original query
– More efficient • Optimized query will almost always execute faster
than original query
Database Systems, 8th Edition 16
SQL Parsing Phase (continued) • Query optimizer analyzes SQL query and finds most
efficient way to access data – Validated for syntax compliance
– Validated against data dictionary • Tables, column names are correct • User has proper access rights
– Analyzed and decomposed into more atomic components
– Optimized through transforming into a fully equivalent but more efficient SQL query
– Prepared for execution by determining the execution or access plan
Database Systems, 8th Edition 17
SQL Parsing Phase (continued)
• Access plans are DBMS-specific – Translate client’s SQL query into series of
complex I/O operations – Required to read the data from the physical data
files and generate result set • DBMS checks if access plan already exists for
query in SQL cache • DBMS reuses the access plan to save time • If not, optimizer evaluates various plans
– Chosen plan placed in SQL cache
Database Systems, 8th Edition 18
Database Systems, 8th Edition 19
SQL Execution and Fetching Phase
• All I/O operations indicated in access plan are executed – Locks acquired – Data retrieved and placed in data cache – Transaction management commands processed
• Rows of resulting query result set are returned to client
• DBMS may use temporary table space to store temporary data – The server may send only the first 100 rows of 9000 rows
Database Systems, 8th Edition 20
Query Processing Bottlenecks
• Delay introduced in the processing of an I/O operation that slows the system – CPU
– RAM
– Hard disk
– Network
– Application code
Database Systems, 8th Edition 21
SQL 敘述 輸入完成 後先不要 執行查 詢 , 請按 下工具列 的顯示估 計執行計
劃鈕 :
Database Systems, 8th Edition 22
11.3 Indexes and Query Optimization
• Indexes – Crucial in speeding up data access
– Facilitate searching, sorting, and using aggregate functions as well as join operations
– Ordered set of values that contains index key and pointers
• More efficient to use index to access table than to scan all rows in table sequentially
Database Systems, 8th Edition 23
Indexes and Query Optimization • Data sparsity: number of different values a column
could possibly have • Indexes implemented using: ( 課本 p. 453)
– Hash indexes
– B-tree indexes: most common index type. Used in tables in which column values repeat a small number of times. The leaves contain pointers to records It is self-balanced.
– Bitmap indexes: 0/1
• DBMSs determine best type of index to use – Ex: CUST_LNAME with B-tree and REGION_CODE with
Bitmap indexes
Database Systems, 8th Edition 24B-tree and bitmap index representation
25
Index Representation for the CUSTOMER table
SELECT CUS_NAME FROM CUSTOMER WHERE CUS_STATE=‘FL’ Requires only 5 accesses to STATE_INDEX, 5 accesses to CUSTOMER
Database Systems, 8th Edition 26
11.4 Optimizer Choices
• Rule-based optimizer – Preset rules and points
– Rules assign a fixed cost to each operation
• Cost-based optimizer – Algorithms based on statistics about objects
being accessed
– Adds up processing cost, I/O costs, resource costs to derive total cost
Example
Database Systems, 8th Edition 27
SELECT P_CODE, P_DESCRIPT, P_PRICE, V_NAME, V_STATE FROM PRODUCT P, VENDOR V WHERE P.V_CODE=V.V_CODE AND V.V_STATE=‘FL’;
• With the following database statistics: – The PRODUCT table has 7000 rows – The VENDOR table has 300 rows – 10 vendors come from Florida – 1000 products come from vendors in Florida
Database Systems, 8th Edition 28
Example
Database Systems, 8th Edition 29
• Assume the PRODUCT table has the index PQOH_NDX in the P_QOH attribute
SELECT MIN(P_QOH) FROM PRODUCT
could be resolved by reading only the first entry in the PQOH_NDX index
Database Systems, 8th Edition 30
Using Hints to Affect Optimizer Choices
• Optimizer might not choose best plan • Makes decisions based on existing statistics
– Statistics may be old – Might choose less efficient decisions
• Optimizer hints: special instructions for the optimizer embedded in the SQL command text
Database Systems, 8th Edition 31
Oracle 版本
Database Systems, 8th Edition 32
MS SQL Server 的語法請參考:
http://msdn.microsoft.com/en-us/library/
