E-Book, Englisch, 852 Seiten
Reihe: The Morgan Kaufmann Series in Data Management Systems
Celko Joe Celko's SQL for Smarties
5. Auflage 2014
ISBN: 978-0-12-800830-0
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Advanced SQL Programming
E-Book, Englisch, 852 Seiten
Reihe: The Morgan Kaufmann Series in Data Management Systems
ISBN: 978-0-12-800830-0
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
SQL for Smarties was hailed as the first book devoted explicitly to the advanced techniques needed to transform an experienced SQL programmer into an expert. Now, 20 years later and in its fifth edition, this classic reference still reigns supreme as the only book written by a SQL master that teaches programmers and practitioners to become SQL masters themselves! These are not just tips and techniques; also offered are the best solutions to old and new challenges. Joe Celko conveys the way you need to think in order to get the most out of SQL programming efforts for both correctness and performance. New to the fifth edition, Joe features new examples to reflect the ANSI/ISO Standards so anyone can use it. He also updates data element names to meet new ISO-11179 rules with the same experience-based teaching style that made the previous editions the classics they are today. You will learn new ways to write common queries, such as finding coverings, partitions, runs in data, auctions and inventory, relational divisions and so forth. SQL for Smarties explains some of the principles of SQL programming as well as the code. A new chapter discusses design flaws in DDL, such as attribute splitting, non-normal forum redundancies and tibbling. There is a look at the traditional acid versus base transaction models, now popular in NoSQL products. You'll learn about computed columns and the DEFERRABLE options in constraints. An overview of the bi-temporal model is new to this edition and there is a longer discussion about descriptive statistic aggregate functions. The book finishes with an overview of SQL/PSM that is applicable to proprietary 4GL vendor extensions. - New to the 5th Edition: - Overview of the bitemporal model - Extended coverage of descriptive statistic aggregate functions - New chapter covers flaws in DDL - Examination of traditional acid versus base transaction models - Reorganized to help you navigate related topics with ease - Expert advice from a noted SQL authority and award-winning columnist Joe Celko, who served on the ANSI SQL standards committee for over a decade - Teaches scores of advanced techniques that can be used with any product, in any SQL environment, whether it is SQL 92 or SQL 2011 - Offers tips for working around deficiencies and gives insight into real-world challenges
Joe Celko served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards. Mr. Celko is author a series of books on SQL and RDBMS for Elsevier/MKP. He is an independent consultant based in Austin, Texas. He has written over 1200 columns in the computer trade and academic press, mostly dealing with data and databases.
Autoren/Hrsg.
Weitere Infos & Material
1;Front Cover;1
2;Joe Celko's sql for Smarties: Advanced sql Programming;4
3;Copyright;5
4;Dedication;6
5;Contents;8
6;Introduction to the Fifth Edition;18
6.1;What is New in this Edition;18
6.2;Corrections and Additions;19
7;Part 1: Data Declaration Features;20
7.1;Chapter 1: Databases Versus File Systems;22
7.1.1;1.1. The Schema Statement;24
7.1.2;1.2. Tables as Entities;29
7.1.3;1.3. Tables as Relationships;29
7.1.3.1;1.3.1. E-R Diagrams;29
7.1.4;1.4. Rows Versus Records;32
7.1.5;1.5. Columns Versus Fields;33
7.2;Chapter 2: Transactions and Concurrency Control;36
7.2.1;2.1. Sessions;37
7.2.2;2.2. Transactions and ACID ;37
7.2.3;2.3. Concurrency Control;40
7.2.3.1;2.3.1. The Transaction Phenomena;40
7.2.4;2.4. The Isolation Levels;41
7.2.5;2.5. Pessimistic Concurrency Control;44
7.2.6;2.6. Snapshot Isolation Optimistic Concurrency;44
7.2.7;2.7. Logical Concurrency Control;47
7.2.8;2.8. Cap Theorem;48
7.2.9;2.9. Base;49
7.2.10;2.10. Server-Side Consistency;51
7.2.11;2.11. Error Handling;52
7.2.12;2.12. Deadlock and Livelocks;52
7.3;Chapter 3: Tables;54
7.3.1;3.1. Create Table Statements;55
7.3.1.1;3.1.1. Base Tables;55
7.3.1.2;3.1.2. [ Global | Local ] Temporary Tables;56
7.3.2;3.2. Column Definitions;57
7.3.2.1;3.2.1. DEFAULT Clause;57
7.3.2.2;3.2.2. NOT NULL Constraint;58
7.3.2.3;3.2.3. CHECK () Constraint;59
7.3.2.4;3.2.4. UNIQUE and PRIMARY KEY Constraints;60
7.3.2.5;3.2.5. REFERENCES Clause;61
7.3.2.6;3.2.6. Referential Actions;61
7.3.2.6.1;3.2.6.1. Nested UNIQUE Constraints;63
7.3.2.6.2;3.2.6.2. Overlapping Keys;67
7.3.3;3.3. Computed Columns;70
7.3.4;3.4. [ NOT ] DEFERRABLE Constraints;71
7.3.5;3.5. CREATE DOMAIN and CREATE SEQUENCE ;73
7.3.5.1;3.5.1. CREATE DOMAIN ;73
7.3.5.2;3.5.2. CREATE SEQUENCE ;73
7.3.5.2.1;3.5.2.1. Using the SEQUENCE ;76
7.3.6;3.6. Character Set Related Constructs;80
7.3.6.1;3.6.1. CREATE CHARACTER SET ;81
7.3.6.2;3.6.2. CREATE COLLATION ;81
7.3.6.3;3.6.3. CREATE TRANSLATION ;82
7.4;Chapter 4: Keys, Locators, and Generated Values;84
7.4.1;4.1. Key Types;85
7.4.1.1;4.1.1. Natural Keys;86
7.4.1.2;4.1.2. Artificial Keys;86
7.4.1.3;4.1.3. Exposed Physical Locators;87
7.4.2;4.2. Practical Hints for Denormalization;88
7.4.2.1;4.2.1. Row Sorting;89
7.5;Chapter 5: Normalization;94
7.5.1;5.1. Functional and Multivalued Dependencies;97
7.5.2;5.2. First Normal Form (1NF);97
7.5.2.1;5.2.1. Note on Repeating Groups;99
7.5.2.1.1;5.2.1.1. Repeating Columns;100
7.5.2.1.2;5.2.1.2. Parsing a List in a String;101
7.5.3;5.3. Second Normal Form (2NF);102
7.5.4;5.4. Third Normal Form (3NF);104
7.5.5;5.5. Elementary Key Normal Form (EKNF);105
7.5.6;5.6. Boyce-Codd Normal Form (BCNF);106
7.5.7;5.7. Fourth Normal Form (4NF);108
7.5.8;5.8. Fifth Normal Form (5NF);109
7.5.9;5.9. Domain-Key Normal Form (DKNF);111
7.5.10;5.10. Practical Hints for Normalization;120
7.5.11;5.11. Non-Normal Form Redundancy;122
7.5.11.1;5.11.1. Aggregation Level Redundancy;122
7.5.11.2;5.11.2. Entire Table Redundancy;123
7.5.11.3;5.11.3. Access Path Redundancy;124
7.5.11.4;5.11.4. Attribute Splitting;125
7.6;Chapter 6: VIEWs, Derived, and Other Virtual Tables;126
7.6.1;6.1. VIEWs in Queries;126
7.6.2;6.2. Updatable and Read-Only VIEWs;127
7.6.3;6.3. Types of VIEWs;129
7.6.3.1;6.3.1. Single-Table Projection and Restriction;130
7.6.3.2;6.3.2. Calculated Columns;130
7.6.3.3;6.3.3. Translated Columns;130
7.6.3.4;6.3.4. Grouped VIEWs;131
7.6.3.5;6.3.5. UNION-ed VIEWs;132
7.6.3.6;6.3.6. JOINs in VIEWs;134
7.6.3.7;6.3.7. Nested VIEWs;134
7.6.4;6.4. How VIEWs are Handled in the Database Engine;135
7.6.4.1;6.4.1. View Column List;136
7.6.4.2;6.4.2. VIEW Materialization;136
7.6.4.3;6.4.3. In-Line Text Expansion;137
7.6.4.4;6.4.4. Pointer Structures;139
7.6.4.5;6.4.5. Indexing and Views;140
7.6.5;6.5. WITH CHECK OPTION Clause;140
7.6.5.1;6.5.1. WITH CHECK OPTION as CHECK() clause;144
7.6.6;6.6. Dropping VIEWs;146
7.6.7;6.7. Materialized Query Tables;147
7.6.7.1;6.7.1. CREATE TABLE;147
7.6.7.2;6.7.2. REFRESH TABLE Statement;148
7.7;Chapter 7: Auxiliary Tables;150
7.7.1;7.1. Series Table;151
7.7.1.1;7.1.1. Enumerating a List;152
7.7.1.2;7.1.2. Mapping a Series into a Cycle;154
7.7.1.3;7.1.3. Building Buckets of Fixed Size;155
7.7.1.4;7.1.4. Replacing an Iterative Loop;156
7.7.2;7.2. Look-up Auxiliary Tables;158
7.7.2.1;7.2.1. Simple Translation Auxiliary Tables;160
7.7.2.2;7.2.2. Multiple Translation Auxiliary Tables;160
7.7.2.3;7.2.3. Multiple Parameter Auxiliary Tables;161
7.7.2.4;7.2.4. Range Auxiliary Tables;164
7.7.2.5;7.2.5. Set Auxiliary Tables;165
7.7.2.6;7.2.6. Hierarchical Auxiliary Tables;166
7.7.2.7;7.2.7. One True Look-up Table;167
7.7.3;7.3. Advance Auxiliary Function Tables;170
7.7.3.1;7.3.1. Inverse Functions with Auxiliary Tables;171
7.7.3.2;7.3.2. Interpolation with Auxiliary Function Tables;181
7.7.4;7.4. Global Constants Tables;183
7.7.4.1;7.4.1. Prime Numbers;183
7.7.4.2;7.4.2. Fibonacci Numbers;187
7.7.4.3;7.4.3. Random Order Values;188
7.7.4.4;Word Length;190
7.7.5;7.5. A Note on Converting Procedural Code to Tables;191
7.8;Chapter 8: Other Schema Objects;198
7.8.1;8.1. CREATE SCHEMA Statement;200
7.8.2;8.2. Schema Tables;203
7.8.3;8.3. Temporary Tables;203
7.8.4;8.4. CREATE ASSERTION Statement;204
7.8.5;8.5. CREATE DOMAIN Statement;204
7.8.6;8.6. CREATE COLLATION Statement;206
7.8.7;8.7. CREATE TRANSLATION Statement;206
7.8.8;8.8. CREATE PROCEDURE Statement;207
7.8.9;8.9. TRIGGERs ;207
7.8.10;8.10. The TRIGGER Model;208
7.8.10.1;8.10.1. DECLARE CURSOR Statement;210
7.8.10.2;8.10.2. Allocate Storage in the Host Program;210
7.8.10.3;8.10.3. DECLARE CURSOR Statement;210
7.8.10.4;8.10.4. OPEN Cursor Statement;212
7.8.10.5;8.10.5. FETCH FROM Cursor Statement;212
7.8.10.6;8.10.6. CLOSE Cursor Statement;214
7.8.10.7;8.10.7. The DEALLOCATE CURSOR Statement;214
7.8.10.8;8.10.8. Remarks About Cursors;215
7.9;Chapter 9: DDL Flaws to Avoid;218
7.9.1;9.1. Tibbling and Related Errors;218
7.9.1.1;9.1.1. ISO -11179 Standards;219
7.9.1.2;9.1.2. Data Type and Constraint Errors;221
7.9.2;9.2. Attribute Splitting;222
7.9.2.1;9.2.1. Schema Level Attribute Splitting;222
7.9.2.2;9.2.2. Column Level Attribute Splitting;222
7.9.2.3;9.2.3. Table Level Attribute Splitting;222
7.9.3;9.3. Overloading Design Flaws;222
7.9.3.1;9.3.1. Schema Level Overloading;223
7.9.3.2;9.3.2. Table Level Overloading;223
7.9.3.3;9.3.3. Column Level Overloading;226
7.9.4;9.4. Non-Normal Form Redundancy;227
7.9.4.1;9.4.1. Conflicting Relationships;228
8;Part 2: Data Types;230
8.1;Chapter 10: Numeric Data in SQL;232
8.1.1;10.1. Exact Numeric Data Types;233
8.1.1.1;10.1.1. Bit, Byte, and Boolean Data Types;233
8.1.2;10.2. Approximate Numeric Data Types;239
8.1.2.1;10.2.1. Float Versus Real Versus Double Precision;239
8.1.2.2;10.2.2. IEEE Floating Point Extensions;240
8.1.3;10.3. Numeric Type Conversions;240
8.1.3.1;10.3.1. Rounding and Truncating;241
8.1.3.2;10.3.2. CAST() Function;242
8.1.4;10.4. Four Function Arithmetic;242
8.1.4.1;10.4.1. Arithmetic and NULLs;244
8.1.5;10.5. Converting Values to and from NULL;245
8.1.5.1;10.5.1. NULLIF() Function;245
8.1.5.2;10.5.2. COALESCE() Function;246
8.1.6;10.6. Mathematical Functions;248
8.1.6.1;10.6.1. Number Theory Operators;248
8.1.6.2;10.6.2. Exponential Functions;250
8.1.6.3;10.6.3. Scaling Functions;251
8.1.6.4;10.6.4. Other Mathematical Functions;251
8.1.6.5;10.6.5. Converting Numbers to Words;251
8.1.7;10.7. IP Addresses;253
8.1.7.1;10.7.1. CHAR(39) Storage;253
8.1.7.2;10.7.2. Binary Storage;253
8.1.7.3;10.7.3. Separate SMALLINTs;253
8.2;Chapter 11: Character Data Types in SQL ;256
8.2.1;11.1. Problems with SQL Strings;257
8.2.1.1;11.1.1. Problems of String Equality;257
8.2.1.2;11.1.2. Problems of String Ordering;258
8.2.1.3;11.1.3. Problems of String Grouping;259
8.2.2;11.2. Standard String Functions;259
8.2.3;11.3. Common Vendor Extensions;261
8.2.3.1;11.3.1. Phonetic Matching;262
8.2.3.1.1;11.3.1.1. Soundex Functions;262
8.2.3.1.2;11.3.1.2. The Original Soundex;263
8.2.3.1.3;11.3.1.3. Metaphone;265
8.2.3.1.4;11.3.1.4. NYSIIS Algorithm;269
8.2.4;11.4. Cutter Tables;270
8.3;Chapter 12: Temporal Data Types in SQL ;272
8.3.1;12.1. Notes on Calendar Standards;272
8.3.2;12.2. The Nature of Temporal Data Models;276
8.3.3;12.3. SQL Temporal Data Types;277
8.3.3.1;12.3.1. Tips for Handling Dates, Timestamps, and Times;278
8.3.3.2;12.3.2. Date Format Standards;278
8.3.3.3;12.3.3. Time Format Standards;279
8.3.3.4;12.3.4. Basic Time;282
8.3.3.5;12.3.5. Time Zones;282
8.3.4;12.4. INTERVAL Data Types;284
8.3.5;12.5. Queries with Date Arithmetic;286
8.3.6;12.6. Use of NULL for “Eternity”;287
8.3.7;12.7. The OVERLAPS () Predicate;288
8.3.8;12.8. State-Transition Constraints;290
8.3.9;12.9. Calendar Tables;296
8.3.9.1;12.9.1. Report Period Tables;298
8.4;Chapter 13: Multiple Column Data Elements;300
8.4.1;13.1. Vector and Coordinate Data Elements;300
8.4.1.1;13.1.1. Longitude and Latitude;301
8.4.1.2;13.1.2. Storing an IPv4 Address in SQL;303
8.4.1.3;13.1.3. A Single VARCHAR(15) Column;303
8.4.1.4;13.1.4. One INTEGER Column;304
8.4.1.5;13.1.5. Four SMALLINT Columns;305
8.4.1.6;13.1.6. Storing an IPv6 Address in SQL;306
8.4.1.7;13.1.7. A Single CHAR(32) Column;306
8.4.1.8;13.1.8. Quantity and Unit Vectors;307
8.4.2;13.2. Hierarchical Data Elements;308
8.4.2.1;13.2.1. Social Security Numbers;308
8.4.2.2;13.2.2. Rational Numbers;311
8.4.2.3;13.2.3. Nondecimal and Mixed Units;312
8.4.2.4;13.2.4. Interrelated Columns;313
8.5;Chapter 14: NULL s—Missing Data in SQL ;314
8.5.1;14.1. Empty and Missing Tables;316
8.5.2;14.2. Missing Values in Columns;316
8.5.3;14.3. Context and Missing Values;318
8.5.4;14.4. Comparing NULLs ;319
8.5.5;14.5. NULLs and Logic;320
8.5.5.1;14.5.1. NULLS in Subquery Predicates;323
8.5.5.2;14.5.2. Logical Value Predicate;325
8.5.6;14.6. Math and NULLs ;325
8.5.7;14.7. Functions for NULLs ;326
8.5.8;14.8. NULLs and Host Languages;326
8.5.9;14.9. Design Advice for NULLs ;328
8.5.9.1;14.9.1. Avoiding NULLs from the Host Programs;330
8.5.10;14.10. A Note on Multiple NULL Values;331
8.6;Chapter 15: Table Operations;334
8.6.1;15.1. DELETE FROM Statement;334
8.6.1.1;15.1.1. The DELETE FROM Clause;335
8.6.1.2;15.1.2. The WHERE Clause;335
8.6.1.3;15.1.3. Deleting Based on Data in a Second Table;337
8.6.1.4;15.1.4. Deleting Within the Same Table;338
8.6.1.5;15.1.5. Redundant Duplicates in a Table;339
8.6.2;15.2. INSERT INTO Statement;341
8.6.2.1;15.2.1. INSERT INTO Clause;341
8.6.2.2;15.2.2. The Nature of Inserts;342
8.6.2.3;15.2.3. Bulk Load and Unload Utilities;342
8.6.3;15.3. The UPDATE Statement;343
8.6.3.1;15.3.1. The UPDATE Clause;343
8.6.3.2;15.3.2. The SET Clause;344
8.6.3.3;15.3.3. The WHERE Clause;346
8.6.3.4;15.3.4. Updating with a Second Table Before MERGE ;347
8.6.3.5;15.3.5. Using the CASE Expression in UPDATEs ;350
8.6.4;15.4. A Note on Flaws in a Common Vendor Extension;352
8.6.5;15.5. MERGE Statement;353
8.7;Chapter 16: Set Operations;358
8.7.1;16.1. UNION and UNION ALL;359
8.7.1.1;16.1.1. Order of Execution;361
8.7.1.2;16.1.2. Mixed UNION and UNION ALL Operators;362
8.7.1.3;16.1.3. UNION of Columns from the Same Table;363
8.7.2;16.2. INTERSECT and EXCEPT;363
8.7.2.1;16.2.1. INTERSECT and EXCEPT Without NULLs and Duplicates;366
8.7.2.2;16.2.2. INTERSECT and EXCEPT with NULLs and Duplicates;367
8.7.3;16.3. A Note on ALL and SELECT DISTINCT;368
8.7.4;16.4. Equality and Proper Subsets;369
9;Part 3: Row and Column Level Features;372
9.1;Chapter 17: Comparison or Theta Operators;374
9.1.1;17.1. Converting Data Types;375
9.1.1.1;17.1.1. Date Display Formats;377
9.1.1.2;17.1.2. Other Display Formats;377
9.1.2;17.2. Row Comparisons in SQL;377
9.1.3;17.3. IS [NOT] DISTINCT FROM Operator;380
9.1.4;17.4. Monadic Operators;381
9.1.4.1;17.4.1. IS NULL;381
9.1.4.2;17.4.2. IS [NOT] {TRUE | FALSE | UNKNOWN};382
9.1.4.3;17.4.3. IS [NOT] NORMALIZED;384
9.2;Chapter 18: Subquery Predicates;386
9.2.1;18.1. The UNIQUE Predicate;386
9.2.2;18.2. The [NOT] IN() Predicate;388
9.2.2.1;18.2.1. Optimizing the IN() Predicate;390
9.2.2.2;18.2.2. Replacing ORs with the IN() Predicate;393
9.2.2.3;18.2.3. NULLs and the IN() Predicate;393
9.2.2.4;18.2.4. IN() Predicate and Referential Constraints;396
9.2.2.5;18.2.5. IN() Predicate and Scalar Queries;397
9.2.3;18.3. [NOT] EXISTS() Predicate;399
9.2.3.1;18.3.1. EXISTS and NULLs;400
9.2.3.2;18.3.2. EXISTS and INNER JOINs;403
9.2.3.3;18.3.3. NOT EXISTS and OUTER JOINs;403
9.2.3.4;18.3.4. EXISTS() and Referential Constraints;404
9.2.3.5;18.3.5. EXISTS() and Quantifiers;405
9.2.3.6;18.3.6. EXISTS and Three Valued Logic;406
9.2.4;18.4. < theta > [SOME | ANY] < subquery > ;408
9.2.5;18.5. < theta > ALL < subquery > ;409
9.2.5.1;18.5.1. The ALL Predicate and Extrema Functions;411
9.3;Chapter 19: BETWEEN and OVERLAPS Predicates;414
9.3.1;19.1. The BETWEEN Predicate;414
9.3.1.1;19.1.1. Results with NULL Values;416
9.3.1.2;19.1.2. Results with Empty Sets;416
9.3.1.3;19.1.3. Programming Tips;416
9.3.2;19.2. OVERLAPS Predicate;417
9.3.2.1;19.2.1. Time Periods and OVERLAPS Predicate;417
9.4;Chapter 20: CASE Expression Family;430
9.4.1;20.1. CASE Expression;430
9.4.1.1;20.1.1. The COALESCE() and NULLIF() Functions;434
9.4.1.2;20.1.2. CASE Expressions with GROUP BY;434
9.4.1.3;20.1.3. CASE, CHECK() Clauses and Logical Implication;436
9.4.2;20.2. Subquery Expressions and Constants;439
9.5;Chapter 21: LIKE and SIMILAR TO Predicates;442
9.5.1;21.1. Tricks with Patterns;443
9.5.2;21.2. Results with NULL Values and Empty Strings;445
9.5.3;21.3. LIKE is Not Equality;445
9.5.4;21.4. Extending the LIKE Predicate with a Join;445
9.5.5;21.5. CASE Expressions and LIKE Predicates;446
9.5.6;21.6. SIMILAR TO Predicates;447
9.5.7;21.7. Tricks with Strings;448
9.5.7.1;21.7.1. String Character Content;448
9.5.7.2;21.7.2. Searching Versus Declaring a String;449
9.5.7.3;21.7.3. Creating an Index on a String;450
9.6;Chapter 22: Basic SELECT Statement;452
9.6.1;22.1. CTEs;452
9.6.2;22.2. FROM Clause;454
9.6.3;22.3. WHERE Clause;454
9.6.4;22.4. GROUP BY Clause;455
9.6.5;22.5. HAVING Clause;455
9.6.6;22.6. SELECT Clause;455
9.6.7;22.7. ORDER BY Clause;456
9.6.8;22.8. Nested Query Expressions and Orthogonality;456
9.7;Chapter 23: Basic Aggregate Functions;458
9.7.1;23.1. COUNT() Functions;459
9.7.1.1;23.1.1. Optimizing Aggregates with DISTINCT;462
9.7.2;23.2. SUM() Function;463
9.7.3;23.3. AVG() Function;465
9.7.3.1;23.3.1. Averages with Empty Groups;466
9.7.3.2;23.3.2. Averages Across Columns;469
9.7.4;23.4. Extrema Functions;470
9.7.4.1;23.4.1. Simple Extrema Functions;470
9.7.4.2;23.4.2. Generalized Extrema Functions;472
9.7.4.3;23.4.3. Multiple Criteria Extrema Functions;475
9.7.4.4;23.4.4. GREATEST() and LEAST() Functions;475
9.7.5;23.5. The LIST() Aggregate Function;478
9.7.5.1;23.5.1. LIST Aggregate with Recursive CTE;479
9.7.5.2;23.5.2. The LIST() Function by Crosstabs;480
9.7.6;23.6. The Mode Aggregate Function;481
9.7.7;23.7. The Median Aggregate Function;482
9.7.7.1;23.7.1. The Weighted Median;483
9.7.7.2;23.7.2. Modern Median;484
9.7.8;23.8. The PRD() Aggregate Function;484
9.7.8.1;23.8.1. PRD() Function by Expressions;485
9.7.8.2;23.8.2. The PRD() Aggregate Function by Logarithms;487
9.8;Chapter 24: Advance Descriptive Statistics;490
9.8.1;24.1. Binary Table Functions;490
9.8.2;24.2. Correlation;492
9.9;Chapter 25: OLAP Aggregation in SQL;494
9.9.1;25.1. Querying Versus Reporting;494
9.9.2;25.2. GROUPING Operators;495
9.9.2.1;25.2.1. GROUP BY GROUPING SET;495
9.9.2.2;25.2.2. ROLLUP;496
9.9.2.3;25.2.3. CUBES;497
9.9.2.4;25.2.4. OLAP Examples of SQL;497
9.9.3;25.3. The Window Clause;498
9.9.3.1;25.3.1. PARTITION BY Subclause;499
9.9.3.2;25.3.2. ORDER BY Subclause;499
9.9.3.3;25.3.3. Window Frame Subclause;501
9.9.4;25.4. Windowed Aggregate Functions;502
9.9.5;25.5. Ordinal Functions;503
9.9.5.1;25.5.1. Row Numbering;503
9.9.5.2;25.5.2. RANK() and DENSE_RANK();503
9.9.5.3;25.5.3. PERCENT_RANK() and CUME_DIST;504
9.9.5.4;25.5.4. Some Examples;505
9.9.6;25.6. Vendor Extensions;507
9.9.6.1;25.6.1. LEAD and LAG Functions;507
9.9.6.1.1;25.6.1.1. Example with Gaps;508
9.9.6.2;25.6.2. FIRST and LAST Functions;510
9.9.7;25.7. A Bit of History;511
9.10;Chapter 26: Advanced SELECT Statements;514
9.10.1;26.1. Correlated Subqueries;514
9.10.2;26.2. Infixed INNER JOINs ;519
9.10.3;26.3. OUTER JOINs ;521
9.10.3.1;26.3.1. A Bit of History;522
9.10.3.2;26.3.2. NULLs and OUTER JOINs ;527
9.10.3.3;26.3.3. NATURAL Versus Searched OUTER JOINs ;529
9.10.3.4;26.3.4. Self OUTER JOINs ;530
9.10.3.5;26.3.5. Two or More OUTER JOINs ;531
9.10.3.6;26.3.6. OUTER JOINs and Aggregate Functions;533
9.10.3.7;26.3.7. FULL OUTER JOIN ;533
9.10.4;26.4. UNION JOIN Operators;534
9.10.5;26.5. Scalar SELECT Expressions;536
9.10.6;26.6. Old Versus New JOIN Syntax;537
9.10.7;26.7. Constrained Joins;538
9.10.7.1;26.7.1. Inventory and Orders;539
9.10.7.2;26.7.2. Stable Marriages;540
9.10.7.3;26.7.3. Ball and Box Packing;545
9.10.8;26.8. Dr. Codd's T-Join;548
9.10.8.1;26.8.1. A Procedural Approach;552
9.10.9;26.9. Missing Values in Data;555
9.10.9.1;26.9.1. Last Known Value;556
9.10.9.2;26.9.2. Sequence Missing Readings;557
9.10.9.3;26.9.3. Smoothed Result;558
9.10.10;26.10. Missing and Mixed Data in Rows;560
10;Part 4: Data Structures in SQL;564
10.1;Chapter 27: Graphs in SQL ;566
10.1.1;27.1. Basic Graph Characteristics;567
10.1.1.1;27.1.1. All Nodes in the Graph;567
10.1.1.2;27.1.2. Path Endpoints;567
10.1.1.3;27.1.3. Reachable Nodes;568
10.1.1.4;27.1.4. Edges;568
10.1.1.5;27.1.5. Indegree and Outdegree;569
10.1.1.6;27.1.6. Source, Sink, Isolated, and Internal Nodes;570
10.1.2;27.2. Paths in a Graph;571
10.1.2.1;27.2.1. Length of Paths;572
10.1.2.2;27.2.2. Shortest Path;572
10.1.2.3;27.2.3. Paths by Iteration;573
10.1.2.4;27.2.4. Listing the Paths;576
10.1.3;27.3. Acyclic Graphs as Nested Sets;580
10.1.4;27.4. Adjacency Matrix Model;582
10.1.5;27.5. Points Inside Polygons;583
10.1.6;27.6. Taxicab Geometry;585
10.1.6.1;27.6.1. Taxi Versus Euclidean Distance;586
10.1.6.2;27.6.2. Taxi Shapes;587
10.1.7;27.7. Equivalence Classes and Cliques;588
10.1.7.1;27.7.1. Graph for Equivalence Relations;589
10.1.7.2;27.7.2. Reflexive Rows;590
10.1.7.3;27.7.3. Symmetric Rows;591
10.1.7.4;27.7.4. Transitive Rows;593
10.1.7.5;27.7.5. Cliques;594
10.1.7.6;27.7.6. Adding Members;597
10.1.8;27.8. Conclusion;597
10.2;Chapter 28: Trees and Hierarchies in SQL ;598
10.2.1;28.1. Adjacency List Model;599
10.2.2;28.2. Finding the Root Node;600
10.2.3;28.3. Finding Leaf Nodes;601
10.2.4;28.4. Finding Levels in a Tree;601
10.2.5;28.5. Tree Operations;602
10.2.5.1;28.5.1. Subtree Deletion;602
10.2.5.2;28.5.2. Subtree Insertion;603
10.2.6;28.6. Nested Sets Model;604
10.2.7;28.7. Finding Root and Leaf Nodes;606
10.2.8;28.8. Finding Subtrees;607
10.2.9;28.9. Finding Levels and Paths in a Tree;608
10.2.9.1;28.9.1. Finding the Height of a Tree;608
10.2.9.2;28.9.2. Finding Immediate Subordinates;608
10.2.9.3;28.9.3. Finding Oldest and Youngest Subordinates;609
10.2.9.4;28.9.4. Finding a Path;611
10.2.10;28.10. Functions in the Nested Sets Model;611
10.2.11;28.11. Deleting Nodes and Subtrees;612
10.2.11.1;28.11.1. Deleting Subtrees;613
10.2.11.2;28.11.2. Deleting a Single Node;613
10.2.11.3;28.11.3. Closing Gaps in the Tree;614
10.2.12;28.12. Summary Functions on Trees;615
10.2.13;28.13. Inserting and Updating Trees;622
10.2.13.1;28.13.1. Moving a Subtree Within a Tree;622
10.2.14;28.14. Converting Adjacency List to Nested Sets Model;625
10.2.15;28.15. Converting Nested Sets to Adjacency List Model;626
10.2.16;28.16. Comparing Nodes and Structure;626
10.3;Chapter 29: Queues;630
10.3.1;29.1. Basic DDL;630
10.3.2;29.2. Enqueue, Dequeue, and Empty Procedures;631
10.3.3;29.3. Rearrangement;632
10.3.4;29.4. Queues and Math;634
10.3.5;29.5. Priority Queues;635
10.3.6;29.6. FIFO and LIFO Queues;636
10.4;Chapter 30: Matrices in SQL;640
10.4.1;30.1. Arrays via Named Columns;641
10.4.2;30.2. Arrays via Subscript Columns;645
10.4.3;30.3. Matrix Operations in SQL;646
10.4.3.1;30.3.1. Matrix Equality;647
10.4.3.2;30.3.2. Matrix Addition;647
10.4.3.3;30.3.3. Matrix Multiplication;648
10.4.3.4;30.3.4. Other Matrix Operations;650
10.4.4;30.4. Flattening a Table into an Array;651
10.4.5;30.5. Comparing Arrays in Table Format;653
10.4.6;30.6. Other Matrix Operations;655
11;Part 5: Typical Queries;656
11.1;Chapter 31: Partitioning and Aggregating Data in Queries;658
11.1.1;31.1. Coverings and Partitions;658
11.1.1.1;31.1.1. Partitioning by Ranges;658
11.1.1.2;31.1.2. Partition by Functions;660
11.1.1.3;31.1.3. Partition by Sequential Order;661
11.1.2;31.2. Advanced Grouping, Windowed Aggregation, and OLAP in SQL;663
11.1.2.1;31.2.1. GROUPING Operators;665
11.1.2.2;31.2.2. GROUP BY GROUPING SET;666
11.1.2.3;31.2.3. ROLLUP;666
11.1.2.4;31.2.4. CUBES;667
11.1.2.5;31.2.5. OLAP Examples of SQL;668
11.1.2.6;31.2.6. The Window Clause;668
11.1.2.6.1;31.2.6.1. Partition by Subclause;669
11.1.2.6.2;31.2.6.2. ORDER BY Subclause;669
11.1.2.6.3;31.2.6.3. Window Frame Subclause;671
11.1.2.7;31.2.7. Windowed Aggregate Functions;673
11.1.2.8;31.2.8. Ordinal Functions;673
11.1.2.8.1;31.2.8.1. Row Numbering;673
11.1.2.8.2;31.2.8.2. RANK() and DENSE_RANK();673
11.1.2.8.3;31.2.8.3. PERCENT_RANK() and CUME_DIST;674
11.1.2.8.4;31.2.8.4. Some Examples;675
11.1.2.9;31.2.9. Vendor Extensions;677
11.1.2.9.1;31.2.9.1. LEAD and LAG Functions;677
11.1.2.9.2;31.2.9.2. FIRST and LAST Functions;678
11.1.2.9.3;31.2.9.3. NTILE Function;680
11.1.2.10;31.2.10. A Bit of History;682
11.2;Chapter 32: Sub-sequences, Regions, Runs, Gaps, and Islands;684
11.2.1;32.1. Finding Subregions of Size ( n);685
11.2.2;32.2. Numbering Regions;686
11.2.3;32.3. Finding Regions of Maximum Size;688
11.2.4;32.4. Bound Queries;692
11.2.5;32.5. Run and Sequence Queries;693
11.2.5.1;32.5.1. Filling in Missing Numbers;696
11.2.6;32.6. Summation of a Handmade Series;698
11.2.7;32.7. Swapping and Sliding Values in a List;701
11.2.8;32.8. Condensing a List of Numbers;703
11.2.9;32.9. Folding a List of Numbers;703
11.2.10;32.10. Coverings;704
11.2.11;32.11. Equivalence Classes and Cliques;710
11.2.11.1;32.11.1. Definition by Extension and Intention;711
11.2.11.2;32.11.2. Graphs in SQL;712
11.2.11.3;32.11.3. Reflexive Rows;713
11.2.11.4;32.11.4. Symmetric Rows;714
11.2.11.5;32.11.5. Transitive Rows;715
11.2.11.6;32.11.6. Cliques;717
11.3;Chapter 33: Auctions;720
11.3.1;33.1. General Types of Bidding;720
11.3.2;33.2. Types of Auctions;721
11.3.2.1;33.2.1. English Auctions;721
11.3.2.2;33.2.2. Japanese Auctions;721
11.3.2.3;33.2.3. Dutch Auctions;721
11.3.2.4;33.2.4. Vickrey Auctions;722
11.3.2.5;33.2.5. Auction Schema;722
11.3.3;33.3. LIFO and FIFO Inventory;723
11.3.3.1;33.3.1. LIFO Cost as a VIEW;725
11.3.3.2;33.3.2. CASE Expressions;726
11.3.3.3;33.3.3. Updating Inventory;727
11.3.4;33.4. Bin Packing;729
11.4;Chapter 34: Relational Division;732
11.4.1;34.1. Division with a Remainder;734
11.4.2;34.2. Exact Division;735
11.4.3;34.3. Note on Performance;736
11.4.4;34.4. Todd’s Division;737
11.4.5;34.5. Division with JOINs;740
11.4.6;34.6. Division with Set Operators;741
11.4.7;34.7. Romley’s Division;741
11.4.8;34.8. Boolean Expressions in Relational Division;745
11.5;Chapter 35: Temporal Queries;748
11.5.1;35.1. Temporal Math;749
11.5.2;35.2. Calendars;754
11.5.2.1;35.2.1. Holidays;755
11.5.2.2;35.2.2. Personal Calendars;756
11.5.3;35.3. Time Series;758
11.5.3.1;35.3.1. Gaps in a Time Series;758
11.5.3.2;35.3.2. Continuous Time Periods;761
11.5.3.2.1;35.3.2.1. Background;765
11.5.3.3;35.3.3. Missing Times in Contiguous Events;766
11.5.3.4;35.3.4. Locating Dates;771
11.5.3.5;35.3.5. Temporal Starting and Ending Points;772
11.5.3.5.1;35.3.5.1. Starting and Ending Times;773
11.5.4;35.4. Julian Dates;774
11.5.5;35.5. Other Temporal Functions;778
11.5.6;35.6. Multi-day Periods;779
11.5.6.1;35.6.1. Weeks;779
11.5.6.2;35.6.2. Report Periods;782
11.5.6.2.1;35.6.2.1. Report Period Table;782
11.5.7;35.7. Modeling Time in Tables;783
11.5.7.1;35.7.1. Using Duration Pairs;784
11.5.8;35.8. LEAD() and LAG() Functions;786
11.5.9;35.9. Problems with the Year 2000: A Historical Overview;786
11.5.9.1;35.9.1. The Zeros;787
11.5.9.2;35.9.2. Leap Year;788
11.5.9.3;35.9.3. The Millennium;789
11.5.9.4;35.9.4. Weird Dates in Legacy Data;790
11.5.9.5;35.9.5. The Aftermath;792
12;Part 6: Implementation and Coding Issues;794
12.1;Chapter 36: Procedural Semi-Procedural and Declarative Programming in SQL;796
12.1.1;36.1. Words Matter;797
12.1.2;36.2. Cleaning Code;797
12.2;Chapter 37: Nesting Levels in SQL;804
12.2.1;37.1. Derived Tables;805
12.2.2;37.2. Column Naming Rules;806
12.2.3;37.3. Scoping Rules;807
12.2.4;37.4. Exposed Table Names;809
12.2.5;37.5. Common Table Expressions (CTEs);810
12.2.6;37.6. LATERAL Tables;811
12.2.7;37.7. Programming Tips;812
12.3;Chapter 38: Embedded SQL, CLI Dynamic SQL, and SQL/PSM;814
12.3.1;38.1. Embedded SQL;814
12.3.2;38.2. SQL/CLI;815
12.3.3;38.3. Dynamic SQL;816
12.3.4;38.4. SQL/PSM History;816
12.3.4.1;38.4.1. SQL Statements;817
12.3.4.2;38.4.2. Compound Statements;818
12.3.4.3;38.4.3. SIGNAL and RESIGNAL Statements;819
12.3.4.4;38.4.4. Assignment Statements;820
12.3.4.5;38.4.5. Conditional Statements;820
12.3.4.6;38.4.6. Loops;822
12.3.4.7;38.4.7. PRINT Statement;823
12.3.4.8;38.4.8. CALL Statements;823
12.3.4.9;38.4.9. CREATE PROCEDURE and CREATE FUNCTION;824
12.3.5;38.5. CSV Parameters;826
13;Index;834
Chapter 1 Databases Versus File Systems
Abstract
A comparison of relational databases versus traditional file systems is discussed. Rows are not records; columns are not fields, and tables are not files. Keywords Database File system US standard railroad gauge COBOL FORTRAN C BASIC PL/I Java Procedural programming language OO programming language E-R diagrams Peter Chen Data Declaration Language (DDL) Data Control Language (DCL) It ain’t so much the things we don’t know that get us in trouble. It’s the things we know that ain’t so. —Artemus Ward (William Graham Sumner), American Writer and Humorist, 1834-1867 Perfecting oneself is as much unlearning as it is learning —Edsgar Dijkstra If you already have a background in data processing with traditional file systems, the first things to unlearn are (0) Databases are not file sets. (1) Tables are not files. (2) Rows are not records. (3) Columns are not fields. (4) Values in RDBMS are scalar, not structured (arrays, lists, meta-data). Do not feel ashamed of getting stuck in a conceptual rut; every new technology has this problem. The US standard railroad gauge (distance between the rails) is 4 ft, 8.5 in. This gauge is used because the English built railroads to that gauge and US railroads were built by English expatriates. Why did the English build railroads to that gauge? Because the first rail lines were built by the same people who built the pre-railroad tramways, and that’s the gauge they used. Why did those wheelwrights use that gauge then? Because the people who built the horse-drawn trams used the same jigs and tools that they used for building wagons, which used that wheel spacing. Why did the wagons use that odd wheel spacing? For the practical reason that any other spacing would break an axle on some of the old, long distance roads, because this is the measure of the old wheel ruts. So who built these old rutted roads? The first long distance roads in Europe were built by Imperial Rome for their legions and used ever since. The initial ruts were first made by Roman war chariots, which were of uniform military issue. The Imperial Roman chariots were made to be just wide enough to accommodate the back-ends of two war horses (this example is originally due to Professor Tom O’Hare, Germanic Languages, University of Texas at Austin; email: tohare@mail.utexas.edu). This story does not end there, however. Look at a NASA Space Shuttle and the two big booster rockets attached to the sides of the main fuel tank. These are solid rocket boosters or SRBs. The SRBs are made by Thiokol at their factory at Utah. The engineers who designed the SRBs might have preferred to make them a bit fatter, but the SRBs had to be shipped by train from the factory to the launch site in Florida. The railroad line from the factory runs through a tunnel in the mountains and the SRBs have to fit through that tunnel. The tunnel is slightly wider than the railroad track. So, the major design feature of what is arguably the world’s most advanced transportation system was determined by the width of a horse’s ass. In a similar fashion, modern data processing began with punch cards (Hollerith cards if you are really old) used by the Bureau of the Census. Their original size was that of a US dollar bill. This was set by their inventor, Herman Hollerith, because he could get furniture to store the cards from the US Treasury Department, just across the street. Likewise, physical constraints limited each card to 80 columns of holes in which to record a symbol. The influence of the punch card lingered on long after the invention of magnetic tapes and disk for data storage. This is why early video display terminals were 80 columns across. Even today, files which were migrated from cards to magnetic tape files or disk storage still use 80 column physical records. But the influence was not just on the physical side of data processing. The methods for handling data from the prior media were imitated in the new media. Data processing first consisted of sorting and merging decks of punch cards (later, sequential magnetic tape files) in a series of distinct steps. The result of each step feed into the next step in the process. Think of the assembly line in a factory. Databases and RDBMS in particular are nothing like the file systems that came with COBOL, FORTRAN, C, BASIC, PL/I, Java, or any of the procedural and OO programming languages. We used to say that SQL means “Scarcely Qualifies as a Language” because it has no I/O of its own. SQL depends on a host language to get and receive data to and from end users. 1.1 The Schema Statement
Programming languages are usually based on some underlying model; if you understand the model, the language makes much more sense. For example, FORTRAN is based on algebra. This does not mean that FORTRAN is exactly like algebra. But if you know algebra, FORTRAN does not look all that strange to you the way that LISP or APL would. You can write an expression in an assignment statement or make a good guess as to the names of library functions you have never seen before. Likewise, COBOL is based on English narratives of business processes. The design of COBOL files (and almost every other early programming language) was derived from paper forms. The most primitive form of a file is a sequence of records that are ordered within the file and referenced by physical position. You open a file (think file folder or in-basket on your desk) and then read a first record (think of the first paper form on the stack), followed by a series of next records (process the stack of paperwork, one paper form at a time) until you come to the last record to raise the end-of-file condition (put the file folder in the out-basket). Notice the work flow: 1. The records (paper forms) have to physically exist to be processed. Files are not virtual by nature. In fact, this mindset is best expressed by a quote from Samuel Goldwyn “a verbal contract ain’t worth the paper it is written on!” 2. You navigate among these records and perform actions, one record at a time. You can go backward or forward in the stack but nowhere else. 3. The actions you take on one file (think of a clerk with rubber stamps) have no effect on other files that are not in the same program. The files are like file folders in another in-basket. 4. Only programs (the clerk processing the paperwork) can change files. The in-basket will not pick up a rubber stamp and mark the papers by itself. The model for SQL is data kept in abstract sets, not in physical files. The “unit of work” in SQL is the whole schema, not individual tables. This is a totally different model of work! Sets are those mathematical abstractions you studied in school. Sets are not ordered and the members of a set are all of the same type. When you do an operation on a set, the action happens “all at once” to the entire membership. That is, if I ask for the subset of odd numbers from the set of positive integers, I get all them back as a single set. I do not build the set of odd numbers by sequentially inspecting one element at a time. I define odd numbers with a rule—“If the remainder is ± 1 when you divide the number by 2, it is odd”—that could test any integer and classify it. Parallel processing is one of many, many advantages of having a set-oriented model. In RDBMS, everything happens all at once. The Data Declaration Language (DDL) in SQL is what defines and controls access to the database content and maintains the integrity of that data for all programs that access the database. Data in a file is passive. It has no meaning until a program reads it. In COBOL, each program has a DATA DIVISION; in FORTRAN, each program has the FORMAT/READ statements; in Pascal, there is a RECORD declaration that serves the same purpose. Pick your non-SQL language. These constructs provide a template or parsing rules to overlay upon the records in the file, split them into fields and get the data into the host program. Each program can split up the sequence of characters in a record anyway it wishes name and name the fields as it wished. This can lead to “job security” programming; I worked in a shop in the 1970’s where one programmer would pick a theme (nations of the world, flowers, etc.) and name his fields “Afghanistan” or “Chrysanthemum” or worse. Nobody could read his code, so we could not fire him. Likewise, the Data Control Language (DCL) controls access to the schema objects that a user can create. Standard SQL divides the database users into USER and ADMIN roles. These schema objects require ADMIN privileges to be created, altered, or dropped (CREATE, ALTER, DROP, etc.). Those with...