Talia / Trunfio / Marozzo Data Analysis in the Cloud

Models, Techniques and Applications
1. Auflage 2015
ISBN: 978-0-12-802914-5
Verlag: Academic Press
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)

Häufig gestellte Fragen zu E-Books

E-Book, Englisch, 150 Seiten, Format (B × H): 152 mm x 229 mm

Reihe: Computer Science Reviews and Trends

Data Analysis in the Cloud
Erscheinungsjahr 2015, 978-0-12-802881-0, Buch

Keine Vorlesefunktionen des Lesesystems deaktiviert (bis auf)
Navigierbares Inhaltsverzeichnis
Navigierbarer Index
Logische Lesereihenfolge eingehalten
Mathematische Inhalte in MathML
Seitennummerierung folgt dem gedruckten Werk
E-Mail eines vertrauenswürdigen Mittlers bei Fragen zur Barrierefreiheit: t.narup@elsevier.com
E-Mail des Verlages bei Fragen zur Barrierefreiheit: t.narup@elsevier.com

Models, Techniques and Applications

E-Book, Englisch, 150 Seiten, Format (B × H): 152 mm x 229 mm

Reihe: Computer Science Reviews and Trends

ISBN: 978-0-12-802914-5
Verlag: Academic Press
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)

Häufig gestellte Fragen zu E-Books

28,95 €

(inkl. MwSt.)

versandkostenfreie Lieferung
sofort verfügbar

Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud.

Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis.

- Introduces data analysis techniques and cloud computing concepts
- Describes cloud-based models and systems for Big Data analytics
- Provides examples of the state-of-the-art in cloud data analysis
- Explains how to develop large-scale data mining applications on clouds
- Outlines the main research trends in the area of scalable Big Data analysis

Talia / Trunfio / Marozzo Data Analysis in the Cloud jetzt bestellen!

Autoren/Hrsg.

Talia, Domenico

Trunfio, Paolo

Marozzo, Fabrizio

Fachgebiete

Weitere Infos & Material

Inhaltsverzeichnis

- Introduction to Data Mining and Cloud Computing
- Introduction to Cloud Computing
- Models and Techniques for Cloud-based Data Analysis
- Designing and Supporting Scalable Data Analytics
- Research Trends in Big Data Analysis

Leseproben

Chapter 2 Introduction to Cloud Computing
Abstract
This chapter introduces the basic concepts of cloud computing, which provides scalable storage and processing services that can be used for extracting knowledge from big data repositories. Section 2.1 defines cloud computing and discusses the main service and deployment models adopted by cloud providers. The section also describes some cloud platforms that can be used to implement applications and frameworks for distributed data analysis. Section 2.2 discusses more specifically how cloud computing technologies can be used to implement distributed data analysis systems. The section identifies the main requirements that should be satisfied by a distributed data analysis system, and then discusses how a cloud platform can be used to fulfill such requirements. Keywords
cloud computing cloud service models cloud deployment models Microsoft Azure Amazon Web Services OpenNebula OpenStack cloud models for distributed data analysis This chapter introduces the basic concepts of cloud computing, which provides scalable storage and processing services that can be used for extracting knowledge from big data repositories. Section 2.1 defines cloud computing and discusses the main service and deployment models adopted by cloud providers. The section also describes some cloud platforms that can be used to implement applications and frameworks for distributed data analysis. Section 2.2 discusses more specifically how cloud computing technologies can be used to implement distributed data analysis systems. The section identifies the main requirements that should be satisfied by a distributed data analysis system, and then discusses how a cloud platform can be used to fulfill such requirements. 2.1. Cloud computing: definition, models, and architectures
As discussed in the previous chapter, an effective solution to extract useful knowledge from big data repositories in reasonable time is exploiting parallel and distributed data mining techniques. It is also necessary and helpful to work with data analysis environments allowing the effective and efficient access, management and mining of such repositories. For example, a scientist can use a data analysis environment to run complex data mining algorithms, validate models, and compare and share results with colleagues located worldwide. In the past few years, clouds have emerged as effective computing platforms to face the challenge of extracting knowledge from big data repositories, as well as to provide effective and efficient data analysis environments to both researchers and companies. From a client perspective, the cloud is an abstraction for remote, infinitely scalable provisioning of computation and storage resources. From an implementation point of view, cloud systems are based on large sets of computing resources, located somewhere “in the cloud”, which are allocated to applications on demand (Barga et al., 2011). Thus, cloud computing can be defined as a distributed computing paradigm in which all the resources, dynamically scalable and often virtualized, are provided as services over the Internet. Virtualization is a software-based technique that implements the separation of physical computing infrastructures and allows creating various “virtual” computing resources on the same hardware. It is a basic technology that powers cloud computing by making possible to concurrently run different operating environments and multiple applications on the same server. Differently from other distributed computing paradigms, cloud users are not required to have knowledge of, expertise in, or control over the technology infrastructure in the “cloud” that supports them. A number of features define cloud applications, services, data, and infrastructure: • Remotely hosted: Services and/or data are hosted on remote infrastructure. • Ubiquitous: Services or data are available from anywhere. • Pay-per-use: The result is a utility computing model similar to that of traditional utilities, like gas and electricity, where you pay for what you use. We can also use the popular National Institute of Standards and Technology (NIST) definition of cloud computing to highlight its main features (Mell and Grance, 2011): “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction”. From the NIST definition, we can identify five essential characteristics of cloud computing systems, which are on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. Cloud systems can be classified on the basis of their service model (Software as a Service, Platform as a Service, Infrastructure as a Service) and their deployment model (public cloud, private cloud, hybrid cloud). 2.1.1. Service Models
Cloud computing vendors provide their services according to three main models: Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Software as a Service defines a delivery model in which software and data are provided through Internet to customers as ready-to-use services. Specifically, software and associated data are hosted by providers, and customers access them without need to use any additional hardware or software. Moreover, customers normally pay a monthly/yearly fee, with no additional purchase of infrastructure or software licenses. Examples of common SaaS applications are Webmail systems (e.g., Gmail), calendars (Yahoo Calendar), document management (Microsoft Office 365), image manipulation (Photoshop Express), customer relationship management (Salesforce), and others. In Platform as a Service model, cloud vendors deliver a computing platform typically including databases, application servers, development environment for building, testing, and running custom applications. Developers can just focus on deploying of applications since cloud providers are in charge of maintenance and optimization of the environment and underlying infrastructure. Hence, customers are helped in application development as they use a set of “environment” services that are modular and can be easily integrated. Normally, the applications are developed as ready-to-use SaaS. Google App Engine, Microsoft Azure, Salesforce.com are some examples of PaaS cloud environments. Finally, Infrastructure as a Service is an outsourcing model under which customers rent resources like CPUs, disks, or more complex resources like virtualized servers or operating systems to support their operations (e.g., Amazon EC2, RackSpace Cloud). Users of an IaaS have normally skills on system and network administration, as they must deal with configuration, operation, and maintenance tasks. Compared to the PaaS approach, the IaaS model has a higher system administration costs for the user; on the other hand, IaaS allows a full customization of the execution environment. Developers can scale up or down its services adding or removing virtual machines, easily instantiable from virtual machine images. Table 2.1 describes how the three service models satisfy the requirements of developers and final users, in terms of flexibility, scalability, portability, security, maintenance, and costs. Table 2.1 How SaaS, PaaS, and IaaS Satisfy the Requirements of Developers and Final Users Requirements SaaS PaaS IaaS Flexibility Users can customize the application interface and control its behavior, but cannot decide which software and hardware components are used to support its execution. Developers write, customize, test their application using libraries and supporting tools compatible with the platform. Users can choose what kind of virtual storage and compute resources are used for executing their application. Developers have to build the servers that will host their applications, and configure operating system and software modules on top of such servers. Scalability The underlying computing and storage resources normally scale automatically to match application demand, so that users do not have to allocate resources manually. The result depends only on the level of elasticity provided by the cloud system. Like the SaaS model, the underlying computing and storage resources normally scale automatically. Developers can use new storage and compute resources, but their applications must be scalable and allow the dynamic inclusion of new resources. Portability There can be problems to move applications to other providers, since some software and tools could not work on different systems. For example, application data may be in a format that cannot be read by another provider. Applications can be moved to another provider only if the new provider shares with the old one the required platform tools and services. If a provider allows to download a virtual...

Über Autor(innen)

Marozzo, Fabrizio
Fabrizio Marozzo received a Laurea degree in computer engineering and a Ph.D. in systems and computer engineering from University of Calabria in 2009 and 2013, respectively. In 2011-2012 he visited the Barcelona SuperComputing Center (BSC) for a research internship. His research interests include distributed systems, software engineering, Cloud computing, data mining and peer-to-peer networks. He co-authored several papers in conference proceedings, edited volumes and international journals. He has been a member of the program committee of some scientific conferences and reviewer for several international journals. He was the recipient of two national awards for best master thesis in the ICT area: Javaday award 2010 and AICA/Confindustria thesis award 2010

Talia, Domenico
Domenico Talia is a professor of computer engineering at University of Calabria and the director of ICAR-CNR (Institute of Calculation and High-Performance Networks). His research interests include parallel and distributed data mining algorithms, cloud computing, Grid services, distributed knowledge discovery, mobile computing, green computing systems, peer-to-peer systems, and parallel programming. He is the author of several books including; Service-Oriented Distributed Knowledge Discovery (CRC 2012) and Grid Middleware and Services: Challenges and Solutions (Springer 2010) and more than 300 papers in archival journals such as IEEE Micro, ACM Computing Surveys, FGCS, Parallel Computing, IEEE Internet Computing and international conference proceedings. He is a member of the editorial boards of many journals including IEEE Transactions on Computers, the IEEE Transactions on Cloud Computing, the Future Generation Computer Systems journal and The International Journal on Web and Grid Services.

Trunfio, Paolo
Paolo Trunfio is an assistant professor in information processing systems at University of Calabria. Previously, he was a research collaborator at the Institute of Systems and Computer Science of the Italian National Research Council (ISI-CNR) and a visiting researcher at the Swedish Institute of Computer Science (SICS), Stockholm, Sweden. Since 2012, he is a member of the panel of experts of the Italian Ministry of University, Education and Research for the evaluation of ICT research projects. His research activities have focused on Cloud computing, Grid computing, distributed data mining, serviceoriented architectures, and peer-to-peer systems. He co-authored two books including; The Internet of Things Based on Smart Objects (Springer 2014) one book and more than 80 papers in conference proceedings, edited volumes and international journals, including Communications of the ACM, Journal of Computer and System Sciences, Journal of Parallel and Distributed Computing, Concurrency and Computation: Practice and Experience. He has been a member of the program committee of many
scientific conferences, and reviewer for several journals.

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

28,95 € (inkl. MwSt.)

sofort verfügbar

Webcode: sack.de/9lka4