CMU DB System
Storage 1
- 
    

 - 
    
Virtual Memory is an abstraction for memory.
- The OS provide this abstraction to applications. So page fault/swapping is invisible to users.
 - physic memory+swapping area in HD
 
 - 
    
Database Page/Block
- fixed size
 - Unique ID as identifier
 
 - 
    
types of page

- page structure
        
- header
            
- page size
 - checksum
 - DBMS version
 - Transaction visibility
 - Compression Information
 
 - Data
            
- tuple-oriented
                
- Strawman idea: just store tuple sequentially
                    
- Problem: CRUD will be O(n) complexity since you could only traverse the page to find the tuple
 
 - Slotted Pages: split the page into smaller slots and store the infos of slots in the page header
                    
- In the header, the # of used slots will be recorded
 - the # of the first unused slot will also be recored.
 - O(1) complexity
 
 
 - Strawman idea: just store tuple sequentially
                    
 - Log-oriented: just keep appending. Deletion and modification all by appending new logs
                
- problem: read is too slow. we must traverse all the logs to confirm the value.
 - Solution: build index to to allow it to jump to the corresponding locations

 - periodically compact the logs
 - used by HBase, Cassandra, LevelDB
 
 
 - tuple-oriented
                
 
 - header
            
 - tuple layout
        
- essentially just sequence of bytes
 - so it’s the DBMS’s job to determine how to interpret them into attributes, etc.
 - tuple structure:
            
- header
                
- visibility info used for concurrency control
 - Bitmap for null values(0/1 array to represent the location of null value)
 
 - body
                
- attributes usually stored in the order you create them
 - Most DBMS won’t allow a tuple exceeds the size of a page
 
 
 - header
                
 
 - Most DBMSs have an indirection layer that keeps maps a page id to a file path and offset.
 - The most common representation of tuple: pageid+offset/slot.
 - 2 ways for DBMS to locate a DB page in the disk:
        
- Linked List: Header page holds pointers to list for free and data pages
 - Page Directory: DBMS maintains special pages that track locations of data pages.
 
 
 - page structure
        
 
Storage 2
- 
    
Disk-oriented DBMS - the most common way: directory page + slotted page body

 - 
    
If the size of tuple exceeds the page size/certain size, overflow page will be used

 - 
    
DBMS catalog:
- Tables, columns, indexes, views
 - Users, Permission
 - Internal Statistics
 
 - 
    
OLTP和OLAP: an important concept
- OLTP,也叫联机事务处理(Online Transaction Processing),表示事务性非常高的系统,一般都是高可用的在线系统,以小的事务以及小的查询为主,评估其系统的时候,一般看其每秒执行的Transaction以及Execute SQL的数量。在这样的系统中,单个数据库每秒处理的Transaction往往超过几百个,或者是几千个,Select 语句的执行量每秒几千甚至几万个。典型的OLTP系统有电子商务系统、银行、证券等,如美国eBay的业务数据库,就是很典型的OLTP数据库。OLTP系统最容易出现瓶颈的地方就是CPU与磁盘子系统。
 - OLAP,也叫联机分析处理(Online Analytical Processing)系统,有的时候也叫DSS决策支持系统,就是我们说的数据仓库。在这样的系统中,语句的执行量不是考核标准,因为一条语句的执行时间可能会非常长,读取的数据也非常多。所以,在这样的系统中,考核的标准往往是磁盘子系统的吞吐量(带宽),如能达到多少MB/s的流量
 
 - 
    
N Ary storage model
- 杭存储
 - the DBMS stores all attributes for a single tuple contiguously in a page
 - Ideal for OLTP workloads where queries tend to operate only on an individual entity and insertheavy workloads.
 
- Advantage:
        
- fast insert, update and delete
 - good for queries that need the entire tuple
 
 - Disadvantage:
        
- not good for scanning large portion of the table and/or subset of the attributes(lead to a lot of useless data)
 
 
 - 
    
Decomposition storage model
- 列存储模型
 - The DBMS stores the values of a single attribute for all tuples contiguously in a page.
 

