


What LinkedIn learned from using large language models to serve its billion users
With more than 1 billion users worldwide, LinkedIn continues to challenge the limits of today’s enterprise technology. Few companies operate quite like LinkedIn, or have similarly vast data resources.
This business and employment-focused social media platform connects qualified candidates with potential employers, and helping fill job vacancies is its core business. It is also important to ensure that posts on the platform reflect the needs of employers and consumers. Under LinkedIn's model, these matching processes have always relied on technology.
By the summer of 2023, when GenAI was first gaining steam, LinkedIn began to consider whether to leverage large language models (LLMs) to match candidates with employers and make the flow of information more useful.
So the social media giant embarked on a GenAI journey and is now reporting the results of its experience leveraging Microsoft’s Azure OpenAI service. CIOs across all industries can learn some lessons from LinkedIn along the way. As most CIOs experience, adopting emerging technologies comes with trials and setbacks. The situation at LinkedIn is no different, and according to Juan Bottaro, the company's principal software engineer and head of technology, its road to LLM collaboration has been anything but smooth.
Bottaro said the initial results "feel incomplete" and "don't connect enough dots."
The initial wave of hype surrounding GenAI didn't help.
"LLM is something new, and it feels like it can solve all problems," Bottaro said. “We didn’t start out with a very clear idea of what LLM could do.”
For example, early versions of the improved job matching effort were pretty, to use a loose word, Rude. Or at least too literal.
"It's not practical to click 'Evaluate my suitability for this job' and get 'You're not a good fit at all,'" Bottaro said. "We want [responses] to be factually accurate but also empathetic. Some members may be considering a career change for which they are not currently well suited and need help understanding the gaps and what to do next."
So an important initial lesson learned at LinkedIn is to adjust LLM to meet audience expectations—and to help LLM understand how to respond in a way that may not be human, but at least human.
SPEED MATTER
Although LinkedIn has over a billion members, much of the job search functionality for LLM jobs that rely on LinkedIn was initially targeted at premium members, a relatively small group. (LinkedIn declined to say how many premium members it has.)
When operating at such a scale, speed is crucial, especially in something as nuanced as matching candidates to relevant positions. Here, it was thought that LLMs would help because an oft-cited advantage of LLMs is their speed, allowing them to complete complex steps quickly. But that's not the case with LinkedIn's deployment, Bottaro said.
"I wouldn't say LLM is fast. I don't think speed is an advantage," he said.
Speed can be defined in many ways. While operationally LLM may not be as fast as hoped, Bottaro said the acceleration of the overall deployment process is astounding. "The superpower of this new technology is that you can create prototypes very quickly, somewhere between two and three months. Before this technology, that was not possible," he said.
When asked how long various aspects of the project would take without an LLM, Bottaro said some might not be completed at all, while other elements "could take several years."
As an example , Bottaro mentioned the part of the system aimed at understanding intent. Without LLM, this would have taken two to three months, but LLM mastered it in "less than a week."
Cost Considerations
One aspect Bottaro calls a "barrier" is cost. Likewise, cost means different things at different stages of a project, as LinkedIn's experience shows.
"The amount of money we spend on development is minuscule," Bottaro said. But when it comes to providing data to LinkedIn's customers, the costs skyrocket.
"Even if it's just for a few million members," Bottaro said, possibly hinting at the number of premium members, prices have soared. That's because LLM pricing - at least LinkedIn's licensing agreement with Microsoft (its LLM provider and parent company) - is based on usage, specifically the usage of input and output tokens.
Tarun Thummala, CEO of an AI vendor, explained in a LinkedIn post unrelated to the project that LLM’s input and output tokens are roughly equivalent to 0.75 words. LLM providers typically sell tokens by the thousands or millions. For example, Azure OpenAI used by LinkedIn charges $30 per 1 million 8K GPT-4 input tokens and $60 per 1 million 8K GPT-4 output tokens in the US East region.
Assessment Challenge
Another feature goal LinkedIn has set for its projects is automated assessment. The evaluation of LLM in terms of accuracy, relevance, safety, and other concerns has always been a challenge. Leading organizations and LLM manufacturers have been trying to automate some work, but according to LinkedIn, this capability is "still a work in progress."
Without automated assessment, LinkedIn reports that “engineers can only rely on visual inspection of results and testing on a limited sample set, often with a delay of more than 1 day before the metrics are known.”
The company is building a model-based evaluator to help estimate key LLM metrics such as overall quality score, hallucination rate, coherence, and responsible AI violations. Doing so will speed up experiments, and while LinkedIn's engineers have had some success with hallucination detection, they're not done yet in this area, the company's engineers said.
Data Quality
Part of the challenges LinkedIn encounters with its job matching efforts come down to data quality issues on both sides: the employer and the potential employee.
LLM can only use the data provided to it, and sometimes job postings do not accurately or comprehensively describe the skills employers are seeking. On the other hand, some job seekers post poor resumes that do not effectively reflect their extensive experience in problem solving and other areas.
In this regard, Bottaro sees the potential for LLMs to help employers and potential employees. By improving writing, both employers and LinkedIn users benefit, as the company's Job Matching LLM is able to work more efficiently when data entry is of higher quality.
USER EXPERIENCE
When dealing with such a large membership base, accuracy and relevance metrics can "give a false sense of comfort," Bottaro said. For example, if LLM "gets it right 90 percent of the time, that means 1 in 10 people will have a bad experience," he said.
What makes this deployment even more difficult is the extreme nuance and judgment involved in providing useful, helpful, and accurate answers.
"How do you define what is good and what is bad? We spent a lot of time working with linguists to develop guidance on how to provide comprehensive representation. We also did a lot of user research," Bottaro explain. "How do you train people to write the right response? How do you define the task, dictate what the response should look like? The product might try to be constructive or helpful. It doesn't try to assume too much, because that's where the illusion starts. We're very interested in responses We take great pride in our consistency.”
real-time operations
LinkedIn’s sheer scale creates another challenge for job matching. With a billion members, a job ad may receive hundreds or even thousands of responses within minutes of being posted. Many job seekers may not bother applying if they see that hundreds of people have already applied. This requires LLM to find matching members very quickly and respond before less qualified applicants submit materials. After that, whether members see the notification and respond in a timely manner remains a question.
On the employer’s side, the challenge is finding the most suitable candidates – not necessarily the ones who are quickest to respond. Some companies are reluctant to publish salary ranges, further complicating efforts on both sides because the most qualified candidates may not be interested in how much the position will pay. This is a problem that LLM cannot solve.
API AND RAG
LinkedIn’s vast database contains a lot of unique information about individuals, employers, skills, and courses, but its LLMs haven’t been trained on this data. Therefore, according to LinkedIn engineers, they are currently unable to use these assets for any inferencing or response-generating activities due to how these assets are stored and served.
Here, Retrieval Augmented Generation (RAG) is a typical solution. By building pipelines to internal APIs, enterprises can "enhance" LLM prompts with additional context to better guide and constrain LLM's responses. Most of LinkedIn's data is exposed through the RPC API, which company engineers say is "convenient for humans to call programmatically" but "is not LLM friendly."
To solve this problem, LinkedIn engineers "wrapped skills" around its API, giving them an "LLM-friendly description of what the API does and when to use it," along with configuration details, inputs and outputs schema and all the logic needed to map the LLM version of each API to its underlying (actual) RPC version.
LinkedIn engineers wrote in a statement: “Skills like this enable LLM to perform a variety of actions related to our products, such as viewing profiles, searching for articles/people/jobs/companies, and even Querying internal analytics systems. “The same technology is also used to call non-LinkedIn APIs such as Bing search and news.” This approach not only improves LLM’s functionality but also enhances its integration with existing technologies. The ability to integrate infrastructure enables LLM to be more widely used in all aspects of the enterprise.
The above is the detailed content of What LinkedIn learned from using large language models to serve its billion users. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

Improve developer productivity, efficiency, and accuracy by incorporating retrieval-enhanced generation and semantic memory into AI coding assistants. Translated from EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG, author JanakiramMSV. While basic AI programming assistants are naturally helpful, they often fail to provide the most relevant and correct code suggestions because they rely on a general understanding of the software language and the most common patterns of writing software. The code generated by these coding assistants is suitable for solving the problems they are responsible for solving, but often does not conform to the coding standards, conventions and styles of the individual teams. This often results in suggestions that need to be modified or refined in order for the code to be accepted into the application

To learn more about AIGC, please visit: 51CTOAI.x Community https://www.51cto.com/aigc/Translator|Jingyan Reviewer|Chonglou is different from the traditional question bank that can be seen everywhere on the Internet. These questions It requires thinking outside the box. Large Language Models (LLMs) are increasingly important in the fields of data science, generative artificial intelligence (GenAI), and artificial intelligence. These complex algorithms enhance human skills and drive efficiency and innovation in many industries, becoming the key for companies to remain competitive. LLM has a wide range of applications. It can be used in fields such as natural language processing, text generation, speech recognition and recommendation systems. By learning from large amounts of data, LLM is able to generate text

Large Language Models (LLMs) are trained on huge text databases, where they acquire large amounts of real-world knowledge. This knowledge is embedded into their parameters and can then be used when needed. The knowledge of these models is "reified" at the end of training. At the end of pre-training, the model actually stops learning. Align or fine-tune the model to learn how to leverage this knowledge and respond more naturally to user questions. But sometimes model knowledge is not enough, and although the model can access external content through RAG, it is considered beneficial to adapt the model to new domains through fine-tuning. This fine-tuning is performed using input from human annotators or other LLM creations, where the model encounters additional real-world knowledge and integrates it

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

Editor | KX In the field of drug research and development, accurately and effectively predicting the binding affinity of proteins and ligands is crucial for drug screening and optimization. However, current studies do not take into account the important role of molecular surface information in protein-ligand interactions. Based on this, researchers from Xiamen University proposed a novel multi-modal feature extraction (MFE) framework, which for the first time combines information on protein surface, 3D structure and sequence, and uses a cross-attention mechanism to compare different modalities. feature alignment. Experimental results demonstrate that this method achieves state-of-the-art performance in predicting protein-ligand binding affinities. Furthermore, ablation studies demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within this framework. Related research begins with "S

Machine learning is an important branch of artificial intelligence that gives computers the ability to learn from data and improve their capabilities without being explicitly programmed. Machine learning has a wide range of applications in various fields, from image recognition and natural language processing to recommendation systems and fraud detection, and it is changing the way we live. There are many different methods and theories in the field of machine learning, among which the five most influential methods are called the "Five Schools of Machine Learning". The five major schools are the symbolic school, the connectionist school, the evolutionary school, the Bayesian school and the analogy school. 1. Symbolism, also known as symbolism, emphasizes the use of symbols for logical reasoning and expression of knowledge. This school of thought believes that learning is a process of reverse deduction, through existing

According to news from this site on August 1, SK Hynix released a blog post today (August 1), announcing that it will attend the Global Semiconductor Memory Summit FMS2024 to be held in Santa Clara, California, USA from August 6 to 8, showcasing many new technologies. generation product. Introduction to the Future Memory and Storage Summit (FutureMemoryandStorage), formerly the Flash Memory Summit (FlashMemorySummit) mainly for NAND suppliers, in the context of increasing attention to artificial intelligence technology, this year was renamed the Future Memory and Storage Summit (FutureMemoryandStorage) to invite DRAM and storage vendors and many more players. New product SK hynix launched last year
