what is apache spark-Apache-php.cn

Home

Operation and Maintenance

Apache

what is apache spark

步履不停

Jun 28, 2019 pm 01:52 PM

apache spark

what is apache spark

Spark is an open source cluster computing system based on memory computing, which aims to make data analysis faster. Spark is very small and exquisite, and was developed by a small team led by Matei from the AMP Laboratory at the University of California, Berkeley. The language used is Scala, and the code for the core part of the project only has 63 Scala files, which is very short and concise.

Spark is an open source cluster computing environment similar to Hadoop, but there are some differences between the two. These useful differences make Spark superior in certain workloads. In other words, Spark enables in-memory distributed datasets that can optimize iterative workloads in addition to being able to provide interactive queries.

Spark is implemented in the Scala language and uses Scala as its application framework. Unlike Hadoop, Spark and Scala are tightly integrated, with Scala making it possible to manipulate distributed data sets as easily as local collection objects.

Although Spark was created to support iterative jobs on distributed data sets, it is actually complementary to Hadoop and can run in parallel on the Hadoop file system. This behavior is supported through a third-party cluster framework called Mesos. Developed by the UC Berkeley AMP Lab (Algorithms, Machines, and People Lab), Spark can be used to build large-scale, low-latency data analysis applications.

Spark Cluster Computing Architecture
Although Spark has similarities with Hadoop, it provides a new cluster computing framework with useful differences. First, Spark is designed for a specific type of workload in cluster computing, namely those that reuse working data sets (such as machine learning algorithms) between parallel operations. To optimize these types of workloads, Spark introduces the concept of in-memory cluster computing, where data sets are cached in memory to reduce access latency.

Spark also introduces an abstraction called Resilient Distributed Dataset (RDD). An RDD is a collection of read-only objects distributed across a set of nodes. These collections are resilient and can be reconstructed if part of the data set is lost. The process of reconstructing a partial dataset relies on a fault-tolerant mechanism that maintains "lineage" (i.e., information that allows partial reconstruction of the dataset based on data derivation processes). An RDD is represented as a Scala object, which can be created from a file; a parallelized slice (spread across nodes); another transformed form of the RDD; and ultimately a complete change to the persistence of the existing RDD, such as requests Cached in memory.

Applications in Spark are called drivers, and these drivers implement operations that are performed on a single node or in parallel on a set of nodes. Like Hadoop, Spark supports single-node clusters or multi-node clusters. For multi-node operation, Spark relies on the Mesos cluster manager. Mesos provides an efficient platform for resource sharing and isolation for distributed applications. This setup allows Spark and Hadoop to coexist in a shared pool of nodes.

For more technical articles related to Apache, please visit the Apache Tutorial column to learn!

The above is the detailed content of what is apache spark. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7478

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

Application of algorithms in the construction of 58 portrait platform May 09, 2024 am 09:01 AM

1. Background of the Construction of 58 Portraits Platform First of all, I would like to share with you the background of the construction of the 58 Portrait Platform. 1. The traditional thinking of the traditional profiling platform is no longer enough. Building a user profiling platform relies on data warehouse modeling capabilities to integrate data from multiple business lines to build accurate user portraits; it also requires data mining to understand user behavior, interests and needs, and provide algorithms. side capabilities; finally, it also needs to have data platform capabilities to efficiently store, query and share user profile data and provide profile services. The main difference between a self-built business profiling platform and a middle-office profiling platform is that the self-built profiling platform serves a single business line and can be customized on demand; the mid-office platform serves multiple business lines, has complex modeling, and provides more general capabilities. 2.58 User portraits of the background of Zhongtai portrait construction

How to conduct concurrency testing and debugging in Java concurrent programming? May 09, 2024 am 09:33 AM

Concurrency testing and debugging Concurrency testing and debugging in Java concurrent programming are crucial and the following techniques are available: Concurrency testing: Unit testing: Isolate and test a single concurrent task. Integration testing: testing the interaction between multiple concurrent tasks. Load testing: Evaluate an application's performance and scalability under heavy load. Concurrency Debugging: Breakpoints: Pause thread execution and inspect variables or execute code. Logging: Record thread events and status. Stack trace: Identify the source of the exception. Visualization tools: Monitor thread activity and resource usage.

How to add a server in eclipse May 05, 2024 pm 07:27 PM

To add a server to Eclipse, follow these steps: Create a server runtime environment Configure the server Create a server instance Select the server runtime environment Configure the server instance Start the server deployment project

How to leverage Kubernetes Operator simplifiy PHP cloud deployment? May 06, 2024 pm 04:51 PM

KubernetesOperator simplifies PHP cloud deployment by following these steps: Install PHPOperator to interact with the Kubernetes cluster. Deploy the PHP application, declare the image and port. Manage the application using commands such as getting, describing, and viewing logs.

How to implement PHP security best practices May 05, 2024 am 10:51 AM

How to Implement PHP Security Best Practices PHP is one of the most popular backend web programming languages used for creating dynamic and interactive websites. However, PHP code can be vulnerable to various security vulnerabilities. Implementing security best practices is critical to protecting your web applications from these threats. Input validation Input validation is a critical first step in validating user input and preventing malicious input such as SQL injection. PHP provides a variety of input validation functions, such as filter_var() and preg_match(). Example: $username=filter_var($_POST['username'],FILTER_SANIT

Java Data Structures and Algorithms: A Practical Guide to Cloud Computing May 09, 2024 am 08:12 AM

The use of data structures and algorithms is crucial in cloud computing for managing and processing massive amounts of data. Common data structures include arrays, lists, hash tables, trees, and graphs. Commonly used algorithms include sorting algorithms, search algorithms and graph algorithms. Leveraging the power of Java, developers can use Java collections, thread-safe data structures, and Apache Commons Collections to implement these data structures and algorithms.

What are the commonly used protocols and libraries in Java network programming? May 09, 2024 pm 06:21 PM

Commonly used protocols and libraries for Java network programming: Protocols: TCP, UDP, HTTP, HTTPS, FTP Libraries: java.net, java.nio, ApacheHttpClient, Netty, OkHttp

A complete guide to containerized deployment of PHP microservices May 08, 2024 pm 05:06 PM

A Complete Guide to PHP Microservice Containerization Deployment Introduction Microservice architecture has become a hot trend in modern software development, which decomposes applications into independent, loosely coupled services. Containerization provides an effective way to deploy and manage these microservices. This article will provide a step-by-step guide to help you containerize and deploy microservices using PHPDocker. Docker Basics Docker is a lightweight containerization platform that packages an application and all its dependencies into a portable container. The following steps describe how to use Docker: #Install Dockersudoapt-getupdatesudoapt-getinstalldock

See all articles