Jambu | Exploratory and Multivariate Data Analysis | E-Book | sack.de
E-Book

E-Book, Englisch, 432 Seiten

Reihe: Statistical Modeling and Decision Science

Jambu Exploratory and Multivariate Data Analysis


1. Auflage 1991
ISBN: 978-0-08-092367-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark

E-Book, Englisch, 432 Seiten

Reihe: Statistical Modeling and Decision Science

ISBN: 978-0-08-092367-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark



With a useful index of notations at the beginning, this book explains and illustrates the theory and application of data analysis methods from univariate to multidimensional and how to learn and use them efficiently. This book is well illustrated and is a useful and well-documented review of the most important data analysis techniques. - Describes, in detail, exploratory data analysis techniques from the univariate to the multivariate ones - Features a complete description of correspondence analysis and factor analysis techniques as multidimensional statistical data analysis techniques, illustrated with concrete and understandable examples - Includes a modern and up-to-date description of clustering algorithms with many properties which gives a new role of clustering in data analysis techniques

Jambu Exploratory and Multivariate Data Analysis jetzt bestellen!

Zielgruppe


Adult: General. Academic/professional/technical: Undergraduate. Academic/professional/technical: Postgraduate


Autoren/Hrsg.


Weitere Infos & Material


1;Front Cover
;1
2;Exploratory and Multivariate Data Analysis;4
3;Copyright Page;5
4;Table of Contents;8
5;Dedication;6
6;Preface;12
7;Chapter 1. General Presentation;16
7.1;1. Introduction;16
7.2;2. Examples of Applications;20
7.3;3. Steps in Data Exploration: Management, Analysis, Synthesis;29
7.4;4. Computer Aspects;32
8;Chapter 2. Statistical Data Exploration;34
8.1;1. Statistics;34
8.2;2. Fields of Statistical Data Exploration;34
8.3;3. Statistics and Experiments;35
8.4;4. Data Analysis, Inductive and Deductive Statistics;36
8.5;5. Variables, Statistical Sets, and Data Sets;36
9;Chapter 3. 1-D Statistical Data Analysis;42
9.1;1. Introduction;42
9.2;2. 1-D Analysis of a Quantitative Variable;42
9.3;3. 1-D Analysis of a Categorical Variable;65
9.4;4. 1-D Analysis of a Categorical Variable with Multiple Forms;68
9.5;5. 1-D Analysis of Time Series or Chronological Variables;68
9.6;6. Statistical Maps or Cartograms;76
10;Chapter 4. 2-D Statistical Data Analysis;78
10.1;1. Introduction;78
10.2;2. 2-D Analysis of Two Categorical Variables;79
10.3;3. 2-D Analysis of Two Quantitative Variables;88
10.4;4. 2-D Analysis of a Quantitative Variable and a Categorical Variable;105
10.5;5. 2-D Analysis of a Quantitative Variable and a Categorical Variable with Multiple Forms;109
10.6;6. Conclusion;109
11;Chapter 5. .-D Statistical Data Analysis;110
11.1;1. Introduction;110
11.2;2. Joint 3-D Statistical Data Analysis;110
11.3;3. Joint .-D Statistical Data Analysis;117
11.4;4. Cartograms and .-D Analysis;127
12;Chapter 6. Factor Analysis of Individuals–Variables Data Sets;128
12.1;1. Introduction;128
12.2;2. From Linear Adjustment to Factor Analysis;128
12.3;3. From the Origin of Factor Analysis to Modern Factor Analysis Techniques;132
12.4;4. Mathematical Description of Modern Factor Analysis;132
12.5;5. Factor Analysis Formulas;139
13;Chapter 7. Principal Components Analysis;140
13.1;1. Basic Data Sets;140
13.2;2. Different Patterns of Principal Components Analysis;140
13.3;3. Standardized Principal Components Analysis;142
13.4;4. Interpretation of Principal Components Analysis;146
13.5;5. Classifying Supplementary Points into Graphics;158
13.6;6. Rules for Selecting Significant Axes and Elements;168
13.7;7. Standardized Principal Components Analysis Formulas;174
13.8;8. Applications and Case Studies;175
14;Chapter 8. 2-D Correspondence Analysis;184
14.1;1. Introduction;184
14.2;2. Basic Correspondence Data Sets;185
14.3;3. Mathematical Description of Correspondence Analysis;186
14.4;4. Geometric Representation of the Sets I and J;191
14.5;5. Interpretation of the 2-D Correspondence Analysis;204
14.6;6. Factor Graphics;215
14.7;7. Classifying Supplementary Points into Graphics;218
14.8;8. Rules for Selecting Significant Axes and Elements;225
14.9;9. 2-D Correspondence Analysis Formulas;231
14.10;10. Patterns of Clouds of Points;236
14.11;11. Patterns of Acceptable Data Sets;240
14.12;12. Case Studies;247
15;Chapter 9. .-D Correspondence Analysis;256
15.1;1. Introduction;256
15.2;2. Basic Data Sets;257
15.3;3. Equivalence between Analyses of bJJ and kIJ;263
15.4;4. Interpretation of .-D Correspondence Analysis;277
15.5;5. Factor Graphics;288
15.6;6. Classifying Supplementary Points into Graphics;293
15.7;7. Rules for Selecting Significant Axes and Points of .(I), N(J), and N(Q);300
15.8;8. .-D Correspondence Analysis Formulas;302
15.9;9. Patterns of Acceptable Data Sets;303
15.10;10. Case Studies;309
16;Chapter 10. Classification of Individuals–Variables Data Sets;320
16.1;1. Introduction;320
16.2;2. Basic Data Sets;321
16.3;3. The Mathematical Description of Classifications;321
16.4;4. Partitioning Methods;325
16.5;5. Hierarchical Classification Methods;346
16.6;6. Specific Applications;409
16.7;7. Case Studies;413
17;Chapter 11. Classification and Analysis of Proximities Data Sets;422
17.1;1. Introduction;422
17.2;2. Proximities Data Sets;422
17.3;3. Proximities Data Sets from Individuals–Variables Data Sets;423
17.4;4. Elementary Description of Proximities Data Sets;427
17.5;5. Factor Analysis of Proximities Data Sets;427
17.6;6. Classification of Proximities Data Sets;431
17.7;7. Computation of Contributions;432
17.8;8. Conclusion;433
18;Chapter 12. Computer Aspects of Exploratory and Multivariate Data Analysis;434
18.1;1. Place of Exploratory and Multivariate Data Analysis in Statistics;434
18.2;2. Basic Factors for Exploratory and Multivariate Data Analysis Software;437
18.3;3. Data Analysis Libraries;438
18.4;4. Future Prospects;440
19;Appendix 1. List of Notations;442
20;Appendix 2. Reference Data Sets;454
21;References;480
22;Author Index;484
23;Subject Index;486


Chapter 2 Statistical Data Elaboration
1 Statistics
Statistics has a double meaning. First, Statistics is concerned with scientific methods for collecting, organizing, summarizing, presenting, and analyzing data, as well as drawing valid conclusions and making relevant decisions on the basis of such analysis. In another sense, statistics is used to denote the data themselves. We can speak of economic statistics, geophysical statistics, employment statistics, accident statistics, financial statistics, population statistics, etc. To say that data are statistics, the data sets must be capable of being compared, and must be representative and coherent, and must have been systematically produced so that relevant or significant comparisons or computations can be made. Not all data are statistical data, i.e., able to be analyzed by a statistical method. Keep in mind that a statistical study does not stop, however, at data elaboration; that is made with the objective of future data analysis. Data must be processed to highlight the most significant or most particular features. 2 Fields of Statistical Data Exploration
At the beginning, statistics was employed in economics. Remember the example of nilometers built along the Nile in Egypt, which were used to measure the height of the Nile floods at different points along the river; this allowed the estimation of the harvest size, and therefore the collection of equivalent income taxes. In the 17th century, some applications appeared in different fields: botanies, systematics, natural sciences, taxonomy (Linné, Buffon, Adanson). In the nineteenth century, statistics grew rapidly in importance because of the progress in biology, then psychometry and agronomics (Fisher). Later, statistics was used in physics, astronomy, thermodynamics, and meteorology. Finally, in the twentieth century, statistics has been extended to studying industrial problems such as reliability, quality control, and production control. Statistics has since become an accepted tool in business management control, marketing studies, quality of service, opinion surveys, planning, and forecasting. Thus, statistics is now a decision-making tool as well as a specific method for improving fundamental knowledge. 3 Statistics and Experiments
Statistics and experimental methods are concerned with objective data based on observations. An experimental method is only applied, however, on specific observations, resulting from experiments, whereas statistics uses a larger class of observations; an experiment aims to replace the system of possible causes by a simpler system in which only one cause varies at a time. Consider the study of gas under the action of three variables: temperature, volume, and pressure. At constant temperature, observations are made to highlight the relationship between volume and pressure, and then observations are made at constant pressure to study the relationship between volume and temperature. Generally, an experimental method can be applied any time that the conditions of observations can be fixed by the experimenter, and can be continuously modified, where it is possible to repeat conditions. In some of areas mentioned above, it is obvious that an experimental method cannot be used. An observer of economic facts, or a manager, cannot experiment; he records facts as they are. For instance, to study the consumption of a product with respect to its price, the analyst cannot make the price vary to see how the consumption level varies. The only solution is to observe from time to time the price level and the related consumption. In contrast to experimental methods, data processed by statistics involve many factors, and so factors must be identified, recorded, and processed in a different way. The economist, the sociologist, or the meteorologist cannot experiment. They cannot have any influence on data, which are recorded independently of any action. The agronomist knows that several factors can influence the yield of a plant, but knows that he cannot make these factors vary. Therefore, he must study how combinations of factors vary simultaneously. Hence, statistics and data analysis are scientific methods that are able to draw larger and more valid conclusions than those drawn from only observed facts. 4 Data Analysis, Inductive and Deductive Statistics
Two main processes may be distinguished in statistics: the inductive process and the deductive process. The inductive process is based on mathematical models using the technique of probability calculus. Given mathematical hypotheses, the models are applied to concrete cases. The fundamental question for the scientist (and also the user) is the question of the validity of these mathematical models, which are too often wrongly considered as suitable (they can be correct by construction, but cannot be made relevant to the situation being studied). In the past, models were built because of the impossibility of computing and studying many different situations, but this may be less so now. The second process is the deductive process, which is based on deduction from observed facts only. This may result in building or discovering a mathematical model, derived from the data and not vice versa. Data analysis is concerned the deductive process: from the data to the model. That is the way followed by the well-known modern data analysts such as Benzecri (1972, 1973, 1980, 1981, 1982, 1984), Tukey (1977), and Hayashi (1988). Great progress has been made due to modern computers, which are able to process millions of data and to visualize simultaneously numerous data, allowing a scientific dialogue between the data and the analyzer. The inductive process can be applied to restricted categories of events; the deductive process allows the rapid synthesis of large data sets. This explains the importance of data analysis. 5 Variables, Statistical Sets, and Data Sets
5.1 Variables
Three main types of data can be distinguished: continuous variables, more often called quantitative variables; discrete variables, more often called categorical or qualitative variables; and chronological variables, which in fact involve quantitative or qualitative variables taken at specified times, usually at equal time intervals. 5.1.1 Continuous or Quantitative Variables A quantitative variable, denoted by X, is a variable capable of having an infinite number of values. Measurements, ratios, and percentages can give quantitative variables. For example, the size, weight, or cranium perimeter of babies at birth are three quantitative variables. The following data sets involve quantitative variables: car models (Appendix 2, §1); measurements of skulls (Appendix 2, §6), samples of steel (Appendix 2, §8), indicators of quality of service in a telephone network (Appendix 2, §14). A counter-example: the number of children of a family is not a quantitative variable as it cannot take all possible values; a family can have 2, 3, or 4 children but not 2.4. 5.1.2 Discrete or Categorical Variables A discrete or categorical variable, denoted by X, is a variable that takes on a finite number of numerical values, categories, or codes. For example, the number of children in a family is a categorical variable; sex; marital status; class of income taxes; or color of eyes. The responses to a questionnaire generally give categorical variables. Among these categorical variables, different subtypes can be distinguished: variables with multiple forms; logical variables; categorical variables determined as sums of variables from the quantitative ones; or preference variables: (a) Variables with multiple forms. These often occur in the use of questionnaires. Generally, a question has only one response, which is called a form. But the following situation can happen, however, where several responses can be given to one question; these are called variables with multiple forms. Here is an example of such a situation. In a survey on new services in telecommunications, one question is: What are the reasons for using the Minitel services? Response 1—To look for precise information (code 1). Response 2—For curiosity (code 2). Response 3—For fun (code 3). Response 4—To learn how to use it (code 4). Response 5—To show it to people (code 5). Response 6—For rapidity (code 6). Response 7—Because it is practical (code 7). The interviewed can give as many responses as he wants. (b) Dummy variables from quantitative variables. From a quantitative variable a categorical variable can be built as follows: The range is divided into equal sized intervals, each of which is assigned a code. Each original value of the quantitative variable is replaced by its associated code. In this way, the categorical variable is built. Consider the following example: A quantitative variable has a range of [0,100]; the variable is divided into the subintervals: [0, 25], ]25, 50], ]50,75], ]75,100]. The values of the associated categorical variable, called dummy variables, are 1, 2, 3, 4: ,25gives1;]25,50]gives2;]50,75]gives3;]75,100]gives4. (c) Logical variables. A logical variable is a discrete variable whose only values are one or zero. Generally, they correspond to the presence (one) or absence (zero) of an attribute. They occur in specific domains such as archaeology, psychology, economics, and marketing as dummy...



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.