E-Book, Englisch, 416 Seiten
Zhang / Shmulevich Computational and Statistical Approaches to Genomics
2. Auflage 2006
ISBN: 978-0-387-26288-8
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 416 Seiten
ISBN: 978-0-387-26288-8
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
The second edition of this book adds eight new contributors to reflect a modern cutting edge approach to genomics. It contains the newest research results on genomic analysis and modeling using state-of-the-art methods from engineering, statistics, and genomics. These tools and models are then applied to real biological and clinical problems. The book's original seventeen chapters are also updated to provide new initiatives and directions.
Autoren/Hrsg.
Weitere Infos & Material
1;Contents;6
2;Preface;8
3;MICROARRAY IMAGE ANALYSIS AND GENE EXPRESSION RATIO STATISTICS;9
3.1;1. Introduction;9
3.2;2. Microarray Image Analysis;11
3.2.1;2.1 Target Segmentation and Clone Information Assignment;12
3.2.2;2.2 Background Detection;14
3.2.3;2.3 Target Detection;16
3.2.4;2.4 Intensity Measurement and Ratio Calculation;18
3.3;3. Ratio Statistics;19
3.3.1;3.1 Constant Coefficient of Variation;19
3.3.2;3.2 Ratio Confidence Interval;21
3.3.3;3.3 Ratio Normalization;22
3.3.4;3.4 Ratio Statistics for Low Signal-to-Noise Ratio;23
3.3.5;3.5 Measurement Quality Assessment;23
3.4;4. Conclusions;25
3.5;References;25
4;STATISTICAL CONSIDERATIONS IN THE ASSESSMENT OF CDNA MICROARRAY DATA OBTAINED USING AMPLIFICATION;28
4.1;1. Introduction;28
4.2;2. Amplification Methods;29
4.2.1;2.1 RNA Amplification;29
4.2.2;2.2 Fluorescent Signal Amplification;30
4.3;3. Data Analysis Strategy;31
4.4;4. An Example;35
4.4.1;4.1 Data Preprocessing;36
4.4.2;4.2 Data Analysis;37
4.5;5. Discussion;40
4.6;Acknowledgments;42
4.7;References;42
5;SOURCES OF VARIATION IN MICROARRAY EXPERIMENTS;44
5.1;Introduction;44
5.2;1. The Experiment;45
5.3;2. Experimental Design;45
5.4;3. Data Analysis;47
5.5;4. Discussion;52
5.6;Acknowledgments;53
5.7;References;53
6;STUDENTIZING MICROARRAY DATA;55
6.1;1. Introduction;55
6.2;2. Fold Differences and Error Models;56
6.3;3. A Case Study ;56
6.3.1;3.1 Array Layout and Preprocessing;56
6.3.2;3.2 Single Channel Images;58
6.3.3;3.3 Replicate Ratios;60
6.3.4;3.4 Variance Fitting and Studentization;61
6.4;4. Discussion;65
6.5;Acknowledgments;65
6.6;References;65
7;EXPLORATORY CLUSTERING OF GENE EXPRESSION PROFILES OF MUTATED YEAST STRAINS;66
7.1;1. Introduction;66
7.2;2. The Data;67
7.3;3. Choosing the Metric;68
7.3.1;3.1 Methods;70
7.3.2;3.2 Results;70
7.4;4. Self-Organizing Map-Based Exploratory Clustering ;71
7.4.1;4.1 Self-Organizing Maps;71
7.4.2;4.2 Overview of the Cluster Structure of the Data;72
7.4.3;4.3 Interpretation of the Clusters;74
7.5;5. Conclusions;77
7.6;Acknowledgments;77
7.7;Notes;77
7.8;References;78
8;SELECTING INFORMATIVE GENES FOR CANCER CLASSIFICATION USING GENE EXPRESSION DATA;80
8.1;1. Introduction;80
8.2;2. Selection of Informative Genes;82
8.3;3. Algorithms for the Selection Problem;84
8.3.1;3.1 The WINNOWAlgorithm;84
8.3.2;3.2 A Simple Greedy Algorithm;85
8.3.3;3.3 Prediction by the Majority Voting;86
8.4;4. Computational Results;87
8.4.1;4.1 Comparison of Prediction Methods;87
8.4.2;4.2 Comparison of Selection Methods;88
8.5;5. Discussions;90
8.6;Acknowledgments;91
8.7;References;92
9;FINDING FUNCTIONAL STRUCTURES IN GLIOMA GENE- EXPRESSIONS USING GENE SHAVING CLUSTERING AND MDL PRINCIPLE;94
9.1;1. Introduction;94
9.2;2. Description of Processing Glioma Data Set;96
9.3;3. A Brief Review of “Gene Shaving” (GS);98
9.4;4. The GS-MDL Clustering Algorithm;100
9.4.1;4.1 Background on Mixture Models for Gene Expression Data and Traditional Estimation Methods;101
9.4.2;4.2 MDL Estimation of the Number of Clusters;104
9.5;5. Functional Insights in Clustering Glioma Gene- Expression;107
9.6;Note;119
9.7;References;119
10;DESIGN ISSUES AND COMPARISON OF METHODS FOR MICROARRAY- BASED CLASSIFICATION;124
10.1;1. Introduction;124
10.2;2. Classification Rules;125
10.3;3. Some Specific Classification Rules;126
10.4;4. Constrained Classifiers;129
10.5;5. Perceptrons and Neural Networks;130
10.6;6. Error Estimation;134
10.7;7. Feature Selection;136
10.8;8. Illustration of Classification Techniques on Microarray Data;138
10.9;9. Conclusion;139
10.10;References;140
11;ANALYZING PROTEIN SEQUENCES USING SIGNAL ANALYSIS TECHNIQUES;142
11.1;1. Introduction;142
11.2;2. Frequency Analysis of Proteins;144
11.3;3. Time-Frequency Analysis;147
11.3.1;3.1 Non-Stationary Signals;147
11.3.2;3.2 Wavelet Transform;148
11.3.3;3.3 Wigner-Ville Distribution;149
11.3.4;3.4 Interference Terms;151
11.4;4. Application of Time-Frequency Analysis to Protein Families ;153
11.4.1;4.1 Fibroblast Growth Factors;153
11.4.2;4.2 Homeodomain Proteins;155
11.5;5. Selection of Amino Acid Mappings;155
11.5.1;5.1 Amino Acid Indices;155
11.5.2;5.2 Information Theory;159
11.5.3;5.3 Analysis;162
11.6;6. Conclusions;164
11.7;Notes;165
11.8;References;165
12;SCALE-DEPENDENT STATISTICS OF THE NUMBERS OF TRANSCRIPTS AND PROTEIN SEQUENCES ENCODED IN THE GENOME;167
12.1;1. Introduction;167
12.2;2. Distributions of the Gene Expression Levels ;169
12.2.1;2.1 Empirical Distributions;169
12.2.2;2.2 Effect of Sample Size on the Distribution Shape;172
12.3;3. Probability Distribution and an Estimator of the Total Number of Expressed Genes;174
12.4;4. Determination of the Number of Expressed Genes and GELPF in a Single Cell ;176
12.4.1;4.1 The Number of Expressed Genes and GELPF in a Single Yeast Cell;176
12.4.2;4.2 Estimate of the Number of Expressed Genes and the GELPF in a Human Cell;178
12.5;5. Global Transcription Response to Damaging Factors;180
12.6;6. Stochastic and Descriptive Models of Gene Expression Process;183
12.7;7. Probability Distributions of the Numbers of Putative Proteins by Their Cluster Sizes and DNA- binding Proteins by the Regulated Promoters;186
12.8;8. Protein Domain Statistics in Proteomes;188
12.8.1;8.1 Statistical Analysis of Proteome Complexity;188
12.8.2;8.2 Prediction of the Numbers of Protein-Coding Genes in the Genome and of Protein Domains in the Entire Proteome;197
12.9;9. Conclusion;201
12.10;Acknowledgments;203
12.11;Appendix A: Infinity Limit for Population Growth Associated with the Generalized Pareto Probability Distribution;203
12.12;Appendix B: Population Growth Curve for the Number of Human Expressed Genes;205
12.13;Notes;205
12.14;References;207
13;STATISTICAL METHODS IN SERIAL ANALYSIS OF GENE EXPRESSION ( SAGE);213
13.1;1. Introduction;213
13.2;2. Biology and Bioinformatics Background;214
13.3;3. Estimation;216
13.3.1;3.1 Point Estimation (Counts, Errors, Size);217
13.3.2;3.2 Estimation by Interval (Error-Bars);232
13.4;4. Differential Expression Detection;233
13.4.1;4.1 Single-Library or “Pseudo-library”;234
13.4.2;4.2 Replicated Libraries in One Class;238
13.4.3;4.3 Multiple Libraries Outlier Finding;241
13.5;5. Illustration of Methods Application;241
13.6;6. Conclusions;244
13.7;Acknowledgments;245
13.8;References;245
14;NORMALIZED MAXIMUM LIKELIHOOD MODELS FOR BOOLEAN REGRESSION WITH APPLICATION TO PREDICTION AND CLASSIFICATION IN GENOMICS;248
14.1;1. Introduction;248
14.2;2. The NML Model for Bernoulli Strings;250
14.3;3. The NML Model for Boolean Regression;252
14.3.1;3.1 The NML Model for the Boolean ClassM(, k, f );253
14.3.2;3.2 The NML Model for the Boolean ClassM(., k, f );255
14.3.3;3.3 A Two Part Code for the Boolean ClassM(, k, f );259
14.3.4;3.4 A Two Part Code for the Boolean ClassM(., k, f );260
14.4;4. Experimental Results;260
14.4.1;4.1 The NML Model for the Boolean Regression Models With k = 1;261
14.4.2;4.2 The NML Model for the Boolean Regression Models with k = 2;262
14.4.3;4.3 The NML Model for the Boolean Regression Models with k = 3;264
14.4.4;4.4 Extension of the Classification for Unseen Cases of the Boolean Regressors;266
14.4.5;4.5 Estimation of Classification Errors Achieved with Boolean Regression Models with k = 3;267
14.5;5. Conclusions;268
14.6;References;270
15;INFERENCE OF GENETIC REGULATORY NETWORKS VIA BEST- FIT EXTENSIONS;272
15.1;1. Introduction;272
15.2;2. Boolean Networks;274
15.3;3. The Best-Fit Extension Problem;275
15.4;4. Simulation Analysis;282
15.5;5. Conclusions;287
15.6;References;287
16;REGULARIZATION AND NOISE INJECTION FOR IMPROVING GENETIC NETWORK MODELS;292
16.1;1. Introduction;292
16.2;2. Current Approaches to Tackling the Dimensionality Problem;293
16.3;3. Learning Genetic Network Models;294
16.4;4. Robust Methods;297
16.5;5. Noise Injection is Equivalent to Regularization;299
16.6;6. Comparison with Other Models;301
16.7;7. Discussion;305
16.8;Acknowledgments;306
16.9;References;306
17;PARALLEL COMPUTATION AND VISUALIZATION TOOLS FOR CODETERMINATION ANALYSIS OF MULTIVARIATE GENE EXPRESSION RELATIONS;309
17.1;1. Introduction;309
17.2;2. Codetermination Algorithm;310
17.3;3. Prediction System Design;311
17.4;4. Parallel Analysis of Gene Expression (PAGE);312
17.4.1;4.1 The Three Sequential Algorithms and Motivation for Parallel Implementation;313
17.4.2;4.2 Parallel implementation;313
17.4.3;4.3 Parallelization Methods;314
17.4.4;4.4 Parallel Versions of Algorithms;316
17.5;5. Visualization of Gene Expression (VOGE);318
17.6;6. Summary and Conclusions;321
17.7;Acknowledgments;321
17.8;References;321
18;SINGLE NUCLEOTIDE POLYMORPHISMS AND THEIR APPLICATIONS;323
18.1;1. Introduction;323
18.2;2. SNPs and Genotype-Phenotype Association;326
18.3;3. SNPs, Haplotypes and Genetic Association;330
18.3.1;3.1 Haplotype Methods for Genetic Association;332
18.3.2;3.2 Estimating Haplotypes with SNPs;334
18.4;4. SNPs and Haplotype Blocks;337
18.4.1;4.1 Linkage Disequilibrium;338
18.4.2;4.2 Haplotype Blocks;340
18.4.3;4.3 Simulations;345
18.4.4;4.4 Applications;348
18.4.5;4.5 Tagging SNPs;349
18.5;5. Conclusions;351
18.6;6. Resources;351
18.6.1;6.1 Selected Haplotype Reconstruction Software;351
18.6.2;6.2 Tagging SNP Software;352
18.7;Acknowledgments;353
18.8;References;353
19;THE CONTRIBUTION OF ALTERNATIVE TRANSCRIPTION AND ALTERNATIVE SPLICING TO THE COMPLEXITY OF MAMMALIAN TRANSCRIPTOMES;362
19.1;1. Introduction;362
19.2;2. Alternative Splicing in Mouse;363
19.3;3. Impact of Alternative Splicing on the Coding Potential;368
19.4;4. Alternative Splicing and Alternative Transcription;371
19.5;5. Regulation of Splicing;374
19.5.1;5.1 Length Distribution is Different Between Constitutive and Cryptic Exons;374
19.5.2;5.2 Constitutive and Cryptic Exons Differ in their Flanking Splice Signals;376
19.5.3;5.3 Constitutive Exons are Enriched in Known Splice Enhancer Motifs;377
19.5.4;5.4 Recruitment of Repeat Sequences in Alternative Splicing;380
19.6;6. Alternative Splicing of Regulatory Factors;382
19.6.1;6.1 Alternative Splicing of Zinc Finger-Containing Proteins;383
19.7;7. Conclusions;384
19.8;Notes;386
19.9;References;386
20;COMPUTATIONAL IMAGING, AND STATISTICAL ANALYSIS OF TISSUE MICROARRAYS: QUANTITATIVE AUTOMATED ANALYSIS OF TISSUE MICROARRAYS;392
20.1;1. Introduction;392
20.2;2. Oxidation and Storage;395
20.3;3. Fixation and Antigen Retrieval;396
20.4;4. Standardization of Immunohistochemistry;398
20.5;5. Quantitative Immunohistochemistry;400
20.6;6. Fluorescence-Based Platforms for Quantitative Analysis;403
20.7;References;408
21;Index;415




