Zhihu is a real online question and answer community with a friendly and rational community atmosphere that connects elites from all walks of life. Users share each other's professional knowledge, experience and insights, continuously providing high-quality information for the Chinese Internet.
Perhaps many people don’t know that Zhihu is the largest UGC (user-generated content) community on the Chinese Internet after Baidu Tieba and Douban. In the three years since Zhihu started its business, it started from scratch and now has more than 100 servers. Currently, Zhihu has more than 11 million registered users, and more than 80 million people use it every month; the website has more than 220 million PVs every month, and nearly 2,500 dynamic requests per second.
At the ArchSummit Beijing 2014 conference, Zhihu co-founder and CTO Li Shenshen brought Zhihu’s first comprehensive technology sharing in more than three years since its founding.
Initial architecture selection
When we actually started working on the Zhihu product in October 2010, there were initially only two engineers, including Li Shenshen; by the time it was launched in December 2010, there were four engineers.
Zhihu’s main development language is Python. Because Python is simple and powerful, you can get started quickly, have high development efficiency, and the community is active, so team members also like it.
Zhihu uses the Tornado framework. Because it supports asynchronous, it is very suitable for real-time comet applications, is simple and lightweight, and has low learning costs. In addition, it has mature cases from FriendFeed and community support from Facebook. Zhihu's product has a characteristic that it hopes to establish a long connection with the browser to facilitate real-time push feeds and notifications, so Tornado is more suitable.
Initially, the entire team's energy was focused on the development of product functions. In other aspects, they basically used the simplest method to save time and save money. Of course, this also caused some problems in the later stage.
The initial idea was to use cloud hosting to save costs. Zhihu’s first server was a Linode host with 512MB of memory. However, after the website was launched, the popularity of the closed beta exceeded expectations, and many users reported that the website was very slow. Cross-border network delays are larger than expected, especially since the domestic network is unbalanced and the access conditions of users across the country are not the same. With this problem, coupled with the need to register domain names at that time, Zhihu went back to the old path of buying machines and finding computer rooms.
After I bought the machine and found a computer room, I encountered new problems, and the service was often down. At that time, the service provider's machine always had memory problems and was restarted at every turn. Finally, there was a time when the machine crashed and could not be restored. At this time, Zhihu made high availability for the web and database. Entrepreneurship is such a situation, you never know what kind of problems you will face when you wake up tomorrow morning.
This is the architecture diagram at that stage, with both the Web and the database being master-slave. The image service at that time was hosted on Youpaiyun. In addition to master-slave, read and write separation is also done for better performance. In order to solve the synchronization problem, a server was added to run offline scripts to avoid response delays to online services. In addition, in order to improve the throughput delay of the intranet, equipment was replaced, increasing the throughput of the entire intranet by 20 times.
In the first half of 2011, Zhihu was already very dependent on Redis. In addition to the initial use of queues and searches, Cache was also used later. Single-machine storage became a bottleneck, so sharding was introduced and consistency was implemented.
The Zhihu team is a team that believes in tools and believes that tools can improve efficiency. Tools are actually a process. There is no so-called best tool, only the most suitable tool. And it is constantly changing during the entire process, as the entire state changes and the environment changes. Tools developed or used by Zhihu include Profiling (function-level tracking requests, analysis and tuning), Werkzeug (a tool for convenient debugging), Puppet (configuration management) and Shipit (one-click online or rollback), etc.
Log system
Zhihu was originally an invitation-only system. In the second half of 2011, Zhihu launched application registration. Users without an invitation code can also apply to register Zhihu by filling in some information. The number of users has reached a new level. At this time, there are some accounts that post advertisements, and advertisements need to be eliminated. The need for a logging system is on the agenda.
This log system must support distributed collection, centralized storage, real-time, subscriberability, and simplicity. At that time, I investigated some open source systems. For example, Scribe was generally good, but it did not support subscriptions. Kafka was developed in Scala, but the team has less accumulation in Scala. Flume is similar and heavier. So the development team chose to develop a logging system by itself - Kids (Kids Is Data Stream). As the name suggests, Kids is used to aggregate various data streams.
Kids draws on Scribe’s ideas. Kdis can be configured as Agent or Server on each server. The Agent directly accepts messages from the application. After gathering the messages, it can call the next Agent or directly call the central server. When subscribing to logs, it can be obtained from the Server or from some Agents in the central node.
The specific details are shown in the figure below:
Zhihu also made a web gadget (Kids Explorer) based on Kids, which supports real-time viewing of online logs. It has now become the most important tool for debugging online problems.
Kids has been open sourced and put on Github.
Event-driven architecture
There is a characteristic of Zhihu product. After adding an answer at the earliest, the subsequent operations are actually update notifications and updates. However, with the increase of the entire function, there are more operations such as updating index, updating count, content review, etc., and the subsequent operations are diverse. If we follow the traditional approach, the maintenance logic will become increasingly large and the maintainability will be very poor. This scenario is very suitable for the event-driven approach, so the development team adjusted the entire architecture and built an event-driven architecture.
The first thing needed at this time is a message queue, which should be able to obtain a variety of events and have high requirements for consistency. In response to this demand, Zhihu developed a small tool called Sink. After it gets the message, it first saves and persists it locally, and then distributes the message. If that machine hangs up, it can be fully restored when restarting to ensure that messages will not be lost. Then it puts the message into the task queue through the Miller development framework. Sink is more like a serial message subscription service, but tasks need to be processed in parallel, so Beanstalkd comes in handy and manages the entire task cycle. The architecture is shown below:
For example, if a user answers a question now, the system will first write the question into MySQL, stuff the message into Sink, and then return the question to the user. Sink sends the task to Beanstalkd through Miller, and the Worker can find the task and process it by itself.
When it first came online, there were 10 messages per second, and then 70 tasks were generated. There are now 100 events per second and 1,500 tasks generated, which are supported by the current event-driven architecture.
Page rendering optimization
Zhihu had millions of PVs every day in 2013. Page rendering is actually computationally intensive. In addition, because it needs to obtain data, it is also IO-intensive. At this time, the development team componentized the page and upgraded the data acquisition mechanism. Zhihu obtains data hierarchically from top to bottom according to the structure of the entire page component tree. When the upper layer data has been obtained, there is no need to go down to the lower layer data. There are basically several data acquisitions for several layers.
Combining this idea, Zhihu made a template rendering development framework - ZhihuNode.
After a series of improvements, the performance of the page has been greatly improved. The question page was reduced from 500ms to 150ms, and the feed page was reduced from 1s to 600ms.
Service-oriented architecture (SOA)
As Zhihu's functions become more and more complex, the entire system becomes larger and larger. How does Zhihu implement servitization?
First of all, a basic RPC framework is needed. The RPC framework has also experienced several versions of evolution.
The first version was Wish, which was a model that strictly defined serialization. The transport layer uses STP, which is a very simple transport protocol written by myself and runs on TCP. It was pretty good at first because I only wrote one or two services at the beginning. However, as the number of services increases, some problems begin to arise. First, ProtocolBuffer will generate some description codes, which are very lengthy and look ugly when placed in the entire library. Also the strict definition makes it inconvenient to use. At this time, an engineer developed a new RPC framework-Snow. It uses simple JSON for data serialization. However, the problem faced by loose data definition is that, for example, if a service needs to be upgraded or the data structure needs to be rewritten, it is difficult to know which services are being used, and it is also difficult to notify them, and errors often occur. So a third RPC framework was released. The engineers who wrote the RPC framework hoped to combine the characteristics of the previous two frameworks, firstly keeping Snow simple, and secondly requiring a relatively strict serialization protocol. This release introduces Apache Avro. At the same time, a special mechanism has been added to make the transport layer and serialization protocol layer pluggable. Either JSON or Avro can be used, and the transport layer can use STP or binary protocol.
Then, by setting up a service registration and discovery, you only need to simply define the name of the service to find which machine the service is on. At the same time, Zhihu also has corresponding tuning tools and developed its own Tracing system based on Zipkin.
According to the calling relationship, Zhihu's services are divided into three layers: aggregation layer, content layer and basic layer. It can be divided into three categories according to attributes: data services, logical services and channel services. Data services are mainly storage of special data types, such as picture services. Logic services are more CPU-intensive and calculation-intensive operations, such as the definition and parsing of answer formats, etc. The characteristic of channel service is that there is no storage, but more of a forwarding, such as Sink.
This is the overall architecture after the introduction of servitization.
product service
There are roughly four functional areas on the Zhihu homepage. On the left side, there is "Latest News", which accounts for about 70% of the home page. It mainly displays the latest questions and answers from the people the user follows. In this section, in addition to viewing the latest questions and answers, users can also
You can participate in issues that interest you through functions such as "Settings", "Follow Issues", "Add Comments", "Share", "Thank You" and "Favorites". For example, using the "Settings" function, users can choose to block topics. Under the issue of concern to the user you are following, you can also add attention to the issue, add comments, etc.
In the upper right section of the home page, there is information related to user behavior management on Zhihu.com. There are "My Drafts," "My Collections," "All Questions," "Questions I Follow," and "Questions Invited to Me." In the middle on the right side is the off-network invitation function - "Invite friends to join Zhihu". In this section, users can invite their friends to join the Zhihu community through email and Sina Weibo. In the middle and lower part of the right side, there are topics that users are concerned about or interested in or sections that are recommended for users. In terms of topic and user recommendations, Zhihu operators may, on the one hand, summarize information on topics that users pay attention to, and on the other hand, may achieve fairly accurate recommendations and summaries through recording and statistics of relevant user behavior data on the Zhihu network. At the same time, it is particularly worth mentioning that in the "Topic Square" section on the lower right, Zhihu.com presents all topic classification tags, providing users with a good way to obtain information in addition to search and navigation.
The Zhihu topic page can be divided into two sections, as shown in Figure 2, one is "Topic Updates" and the other is "Frequently Visited Topics". On the left is "topic updates" information, which takes up about 70% of the page. In this section, users can click to view the questions (presented in chronological order) under the topics they are interested in, and they can also "pin" and "unfollow" the topics they are interested in.
At the bottom right, there is the "Frequently Frequently Asked Topics" section. In this page, users can learn about the specific information of the topic they are concerned about, such as sub-topics, number of followers, and dynamics.
The Zhihu notification page can be divided into four layouts, as shown in Figure 3. "All notifications" on the left are information about questions that users pay attention to and answers to other users (presented in chronological order). On the right side, user behavior data summary, "invite friends to join Zhihu", topics and topic recommendation sections, etc. are the same as the homepage introduction, so I won't go into details here.
Zhihu's personal homepage is roughly divided into five sections: "Personal Information", "Personal Answers", "Personal Homepage", "Search User Questions and Answers", "Followers and Followed Information" and "Following Topics". The details are shown in Figure 4.
In the "Profile" section, users can click "View Details" to view the user's "Personal Achievements" (including the number of "Likes", "Thanks", "Collections" and "Shares"), "Professional Experience" , "residential information", "educational experience", and "skills". If you are a Zhihu user, you can complete the above five aspects of information by clicking "Edit My Profile".
The lower left is the "Personal Answers" section, which contains information about users' answers to relevant questions (arranged in descending order by the number of approvals or from nearest to furthest in response time order). The above two sections of "Personal Information" and "Personal Answers" can account for 70% of the entire position.
On the upper right is the "Personal Homepage" page, which is a summary of Zhihu's latest developments, questions, answers, collections and log information raised by users.
In the middle on the right is a search box. Users can query specific user questions and answers through this search box.
In the middle and lower part of the right side, there are information about the user's personal followers or followed topics. Users can click on relevant icons to connect to specific sections with one click.
Zhihu question page - is the most important page of Zhihu. Here users can understand, edit, and answer specific questions and information,
Zhihu's page can be roughly divided into six parts according to its functions, namely "Question Answers", "Follow Function", "Invitation Function", "Related Question Links", "Sharing Function" and "Question Status".
On the left side is the "Question and Answer" section, which accounts for about 70% of this section. In this section, users can modify, comment, report and manage votes on related issues. Users can modify questions, question labels and question supplements that they feel are inappropriate. At the same time, users can also comment or report if they find something inappropriate or of interest to them. In terms of answering questions, users can answer questions in a way that suits them
Row sorting operation (Zhihu provides three content presentation methods: sorting by voting, sorting by time, and displaying by user followers).
In addition, it is worth mentioning that there are two triangles on the left side of each answer, representing approval and disapproval, one above and one below, as shown in Figure 6. Users can personalize their answers to questions based on their own knowledge, understanding, or interests.
On the right side of this section, starting from top to bottom is the "Follow" function. In this functional section, users can pay attention to issues, which is a bit like the follow function of Sina Weibo. The difference is that Zhihu’s follow mainly focuses on specific issues, while Sina Weibo mainly focuses on specific users.
Further down on the right side is the "Invite Others to Answer Questions" section. This is the same as the functions introduced in the "Zhihu Homepage" and "Zhihu Notification" sections previously, so I won’t go into details here.
Further down, there are various questions related to the problem. This is also a method recommended by most website systems. Although this recommendation method is relatively mature in terms of technology and experience, its effect is not uncritical. In terms of question-related question links, Zhihu mainly targets specific question characteristics and makes machine recommendations through corresponding algorithms. It does not achieve personalized recommendation effects for different users' hobbies (this is also the future development trend of the Internet, and e-commerce platforms pay more attention to this technology).
Further down, there is the question sharing function. Users can share Zhihu questions outside the site through "Weibo" and "email" and within the site through "private messages within the site."
At the bottom on the right is the problem status. In this page, users can learn about the time when the recent activity of the question occurred, the number of times it was viewed, the number of followers of the related topic, and the number of people following the question.
user experience
1. To be precise, Zhihu is more like a forum: users have discussions around a topic of interest, and you can follow people who have the same interests as you. For conceptual explanations, the online encyclopedia covers almost all your questions; but the integration of divergent thinking is a major feature of Zhihu. Zhihu encourages discussion during the Q&A process to broaden the divergence of questions. Encourage non-specific answers and encourage wiki referenceability of answers.
2. It is more exclusive than the forum. Every registered user on Zhihu has a PR (Person Rank). Every operation you make will directly affect your personal PR value. When answering, the answers are sorted by the number of approval votes. If the number of approval votes is the same, they are sorted by the personal PR value, and answers considered invalid are hidden. This filters out quite a bit of spam to a certain extent.
3. Zhihu once insisted on a strict invitation system, firstly to ensure the authenticity of users’ quasi-real-name identities, and secondly to avoid generating too much spam. Quasi-real names can make it easier for users to ask targeted questions to people you are interested in. This is a very interesting column in Han Han's aborted "Solo Group", "Everyone Asks Everyone", in other words, this is the real-life version Zhihu. At the same time, Zhihu's strict invitation system has also given Zhihu a strong atmosphere of rigor, represented by keso, who can convince others without saying anything.
Since March 2013, Zhihu has opened registration to the public.
4. Credit-based SNS relationship. Perhaps simply as an integration of SNS and Q&A, domestic Renren should be able to develop more quickly; but as mentioned above, the strict invitation system excludes a considerable amount of invalid information; if Renren also launches social Q&A, it will inevitably Integrate your original friends, and these friends obviously cannot be all people who are interested in your concerns. This also almost negates the possibility of any large Internet company entering Quora-type Q&A.
Because large Internet companies generally have a wide audience, Quora-type questions and answers are not simply based on popularity, but on the value-to-information ratio (value information/total amount of information), which is the amount of elite information generated.
However, Thousand Oaks launched Jingwei.com in a low-key manner. As a vertical SNS, it has gathered a considerable number of professional people. If Thousand Oaks uses this as a fitting point to integrate Quora-like Q&A, it still has considerable potential.
5. Compared with Quora, Zhihu uses blue as its tone. Compared with Quora, Zhihu’s functions still need to be improved, such as the best topic under a certain topic.