1. Java programming
Java programming is the foundation of big data development. Many technologies in big data are written in Java, such as Hadoop and Spark. , mapreduce, etc. Therefore, if you want to learn big data well, Java programming is a necessary skill!
(Recommended learning: java introductory program)
2. Linux operation and maintenance
Enterprise big data development is often done in It is completed under the Linux operating system. Therefore, if you want to engage in big data related work, you need to master the Linux system operating methods and related commands.
3. Hadoop
Hadoop is a software framework capable of distributed processing of large amounts of data. HDFS and MapReduce are its core designs. HDFS provides services for massive amounts of data. In addition to storage, MapReduce provides calculations for massive data and is an essential framework skill for big data development.
4. Zookeeper
ZooKeeper is a distributed, open source distributed application coordination service. It is an open source implementation of Google's Chubby and is an integration of Hadoop and An important component of Hbase. It is a software that provides consistent services for distributed applications. The functions provided include: configuration maintenance, domain name services, distributed synchronization, group services, etc.
5. Hive
hive is a data warehouse tool based on Hadoop, which can map structured data files into a database table and provide simple sql The query function can convert SQL statements into MapReduce tasks for running, which is very suitable for statistical analysis of data warehouses.
6, Hbase
This is the NOSQL database in the Hadoop ecosystem. Its data is stored in the form of key and value and the key is unique, so It can be used to deduplicate data. Compared with MYSQL, it can store a much larger amount of data.
7. Kafka
Kafka is a high-throughput A distributed publish-subscribe messaging system that can process all action flow data in consumer-scale websites, unify online and offline message processing through Hadoop's parallel loading mechanism, and provide real-time messages through clusters.
8. Spark
Spark is a fast and general computing engine designed for large-scale data processing. It has the advantages of Hadoop MapReduce, but what is different from MapReduce is the intermediate output result of the job. It can be stored in memory, eliminating the need to read and write HDFS, so Spark can be better suited to MapReduce algorithms that require iteration, such as data mining and machine learning.
The above is the detailed content of What to learn from big data. For more information, please follow other related articles on the PHP Chinese website!