Krause / Olson The Basics of S-PLUS
4th Auflage 2005
ISBN: 978-0-387-28390-6
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 444 Seiten, eBook
Reihe: Statistics and Computing
ISBN: 978-0-387-28390-6
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Zielgruppe
Professional/practitioner
Autoren/Hrsg.
Weitere Infos & Material
Graphical User Interface.- A First Session.- A Second Session.- Graphics.- Trellis Graphics.- Exploring Data.- Statistical Modeling.- Programming.- Object-Oriented Programming.- Input and Output.- Tips and Tricks.- S-Plus Internals.- Information Sources on and Around S-Plus.- R.
7 Exploring Data (p. 193)
In the preceding chapters, we have laid the foundation for understanding the concepts and ideas of the S-Plus system. We explored basic ideas and how to use S-Plus for performing calculations, and we have seen how data can be generated, stored, and accessed. Furthermore, we also looked at how data can be displayed graphically. All this will be useful as we explore real data sets in this chapter. We will explore data sets that come with S-Plus, speci.cally the Barley and Geyser data sets.
Rather than presenting a list of available statistical functions, we will go through a typical data analysis as a way of introducing the more useful and common commands and the kind of output we’ll encounter. We chose to use S-Plus data sets so you can follow along with the analysis we present and complete the exercises at the end of this chapter. We divide the data analysis into two categories: "descriptive" and "graphical" exploration. Further sections cover distributions and related functions, con.rmatory statistics and hypothesis testing, and missing and in.nite values.
7.1 Descriptive Data Exploration
We will now explore the di.erent variables contained in the Barley data set. We will first analyze the variables in one dimension, or, in other words, we will take a univariate approach. The analysis of the dependence between the variables and the exploration of higher-dimensional structure follows later.
The Barley Data Set
The Barley data are measurements of yield in bushels per acre at di.erent sites. The analysis comprises 6 sites planting 10 di.erent varieties of barley in 2 successive years, 1931 and 1932. The data set therefore contains 120 measurements of barley yield. Our main goal will be to investigate di.erences in barley yields given by the di.erent variable constellations, such as the 1931 harvest of the .fth variety on site 4 and the 1932 harvest of the seventh variety at the same site.
Just enter
> barley
to see the data.Exploratory data analysis (EDA) is an approach to investigating data that stresses the need to know more about the structure and information inherent in the data. The methods used with this approach are referred to as descriptive, as opposed to con.rmatory. Descriptive simply means that simple summaries are used to describe the data: their shapes, sizes, relationships, and the like. Examples of descriptive statistics are means, medians, standard deviations, ranges, and so on.
Given the basic information about the Barley data, the following analysis is intended to gain more information and structural knowledge about the numbers we have.
A typical place to begin is, of course, looking at the data. If the data set is small, we can easily look at it simply by printing it out. We check the data size by entering
> dim(barley)
120 4