#What does apache spark mean?
Apache Spark is an open source cluster computing system based on memory computing, which aims to make data analysis faster. Spark is very small and exquisite, and was developed by a small team led by Matei from the AMP Laboratory at the University of California, Berkeley. The language used is Scala, and the code for the core part of the project only has 63 Scala files, which is very short and concise.
5 major advantages of Apache Spark:
1. Higher performance because the data is loaded into the distributed memory of the cluster host. Data can be quickly iterated and cached for subsequent frequent access needs. Many friends who are interested in Spark may have heard this sentence - Spark can be 100 times faster than Hadoop when all the data is loaded into the memory, and 10 times faster than Hadoop when the memory is not enough to store all the data.
2. Through standard APIs established in Java, Scala, Python, and SQL (for interactive queries), it is convenient for use in all walks of life. It also contains a large number of machine learning libraries that can be used out of the box.
3. Compatible with the existing Hadoop v1 (SIMR) and 2.x (YARN) ecosystem, so organizations can migrate seamlessly.
4. Easy to download and install. The convenient shell (REPL: Read-Eval-Print-Loop) allows interactive learning of the API.
5. Improve productivity with the help of high-level architecture, so that you can focus on computing.
At the same time, Apache Spark is implemented by Scala, and the code is very concise.
The above is the detailed content of What does apache spark mean?. For more information, please follow other related articles on the PHP Chinese website!