E-Book, Englisch, 150 Seiten, Format (B × H): 152 mm x 229 mm
Talia / Trunfio / Marozzo Data Analysis in the Cloud
1. Auflage 2015
ISBN: 978-0-12-802914-5
Verlag: Academic Press
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Models, Techniques and Applications
E-Book, Englisch, 150 Seiten, Format (B × H): 152 mm x 229 mm
Reihe: Computer Science Reviews and Trends
ISBN: 978-0-12-802914-5
Verlag: Academic Press
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud.
Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis.
- Introduces data analysis techniques and cloud computing concepts
- Describes cloud-based models and systems for Big Data analytics
- Provides examples of the state-of-the-art in cloud data analysis
- Explains how to develop large-scale data mining applications on clouds
- Outlines the main research trends in the area of scalable Big Data analysis
Autoren/Hrsg.
Fachgebiete
Weitere Infos & Material
- Introduction to Data Mining and Cloud Computing
- Introduction to Cloud Computing
- Models and Techniques for Cloud-based Data Analysis
- Designing and Supporting Scalable Data Analytics
- Research Trends in Big Data Analysis
Chapter 2 Introduction to Cloud Computing
Abstract
This chapter introduces the basic concepts of cloud computing, which provides scalable storage and processing services that can be used for extracting knowledge from big data repositories. Section 2.1 defines cloud computing and discusses the main service and deployment models adopted by cloud providers. The section also describes some cloud platforms that can be used to implement applications and frameworks for distributed data analysis. Section 2.2 discusses more specifically how cloud computing technologies can be used to implement distributed data analysis systems. The section identifies the main requirements that should be satisfied by a distributed data analysis system, and then discusses how a cloud platform can be used to fulfill such requirements. Keywords
cloud computing cloud service models cloud deployment models Microsoft Azure Amazon Web Services OpenNebula OpenStack cloud models for distributed data analysis This chapter introduces the basic concepts of cloud computing, which provides scalable storage and processing services that can be used for extracting knowledge from big data repositories. Section 2.1 defines cloud computing and discusses the main service and deployment models adopted by cloud providers. The section also describes some cloud platforms that can be used to implement applications and frameworks for distributed data analysis. Section 2.2 discusses more specifically how cloud computing technologies can be used to implement distributed data analysis systems. The section identifies the main requirements that should be satisfied by a distributed data analysis system, and then discusses how a cloud platform can be used to fulfill such requirements. 2.1. Cloud computing: definition, models, and architectures
As discussed in the previous chapter, an effective solution to extract useful knowledge from big data repositories in reasonable time is exploiting parallel and distributed data mining techniques. It is also necessary and helpful to work with data analysis environments allowing the effective and efficient access, management and mining of such repositories. For example, a scientist can use a data analysis environment to run complex data mining algorithms, validate models, and compare and share results with colleagues located worldwide. In the past few years, clouds have emerged as effective computing platforms to face the challenge of extracting knowledge from big data repositories, as well as to provide effective and efficient data analysis environments to both researchers and companies. From a client perspective, the cloud is an abstraction for remote, infinitely scalable provisioning of computation and storage resources. From an implementation point of view, cloud systems are based on large sets of computing resources, located somewhere “in the cloud”, which are allocated to applications on demand (Barga et al., 2011). Thus, cloud computing can be defined as a distributed computing paradigm in which all the resources, dynamically scalable and often virtualized, are provided as services over the Internet. Virtualization is a software-based technique that implements the separation of physical computing infrastructures and allows creating various “virtual” computing resources on the same hardware. It is a basic technology that powers cloud computing by making possible to concurrently run different operating environments and multiple applications on the same server. Differently from other distributed computing paradigms, cloud users are not required to have knowledge of, expertise in, or control over the technology infrastructure in the “cloud” that supports them. A number of features define cloud applications, services, data, and infrastructure: • Remotely hosted: Services and/or data are hosted on remote infrastructure. • Ubiquitous: Services or data are available from anywhere. • Pay-per-use: The result is a utility computing model similar to that of traditional utilities, like gas and electricity, where you pay for what you use. We can also use the popular National Institute of Standards and Technology (NIST) definition of cloud computing to highlight its main features (Mell and Grance, 2011): “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction”. From the NIST definition, we can identify five essential characteristics of cloud computing systems, which are on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. Cloud systems can be classified on the basis of their service model (Software as a Service, Platform as a Service, Infrastructure as a Service) and their deployment model (public cloud, private cloud, hybrid cloud). 2.1.1. Service Models
Cloud computing vendors provide their services according to three main models: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Software as a Service defines a delivery model in which software and data are provided through Internet to customers as ready-to-use services. Specifically, software and associated data are hosted by providers, and customers access them without need to use any additional hardware or software. Moreover, customers normally pay a monthly/yearly fee, with no additional purchase of infrastructure or software licenses. Examples of common SaaS applications are Webmail systems (e.g., Gmail), calendars (Yahoo Calendar), document management (Microsoft Office 365), image manipulation (Photoshop Express), customer relationship management (Salesforce), and others. In Platform as a Service model, cloud vendors deliver a computing platform typically including databases, application servers, development environment for building, testing, and running custom applications. Developers can just focus on deploying of applications since cloud providers are in charge of maintenance and optimization of the environment and underlying infrastructure. Hence, customers are helped in application development as they use a set of “environment” services that are modular and can be easily integrated. Normally, the applications are developed as ready-to-use SaaS. Google App Engine, Microsoft Azure, Salesforce.com are some examples of PaaS cloud environments. Finally, Infrastructure as a Service is an outsourcing model under which customers rent resources like CPUs, disks, or more complex resources like virtualized servers or operating systems to support their operations (e.g., Amazon EC2, RackSpace Cloud). Users of an IaaS have normally skills on system and network administration, as they must deal with configuration, operation, and maintenance tasks. Compared to the PaaS approach, the IaaS model has a higher system administration costs for the user; on the other hand, IaaS allows a full customization of the execution environment. Developers can scale up or down its services adding or removing virtual machines, easily instantiable from virtual machine images. Table 2.1 describes how the three service models satisfy the requirements of developers and final users, in terms of flexibility, scalability, portability, security, maintenance, and costs. Table 2.1 How SaaS, PaaS, and IaaS Satisfy the Requirements of Developers and Final Users Requirements SaaS PaaS IaaS Flexibility Users can customize the application interface and control its behavior, but cannot decide which software and hardware components are used to support its execution. Developers write, customize, test their application using libraries and supporting tools compatible with the platform. Users can choose what kind of virtual storage and compute resources are used for executing their application. Developers have to build the servers that will host their applications, and configure operating system and software modules on top of such servers. Scalability The underlying computing and storage resources normally scale automatically to match application demand, so that users do not have to allocate resources manually. The result depends only on the level of elasticity provided by the cloud system. Like the SaaS model, the underlying computing and storage resources normally scale automatically. Developers can use new storage and compute resources, but their applications must be scalable and allow the dynamic inclusion of new resources. Portability There can be problems to move applications to other providers, since some software and tools could not work on different systems. For example, application data may be in a format that cannot be read by another provider. Applications can be moved to another provider only if the new provider shares with the old one the required platform tools and services. If a provider allows to download a virtual...