Supervised vs. unsupervised learning: Experts define the gap
What needs to be rewritten is: Understand the characteristics of supervised learning, unsupervised learning and semi-supervised learning, and how they are applied in machine learning projects
When discussing artificial intelligence technology, supervised learning is often the method that gets the most attention because it is usually the last step in creating an artificial intelligence model and can be used for image recognition, better predictions, Aspects such as product recommendations and lead scoring
In contrast, unsupervised learning tends to work behind the scenes early in the AI development life cycle: it is often used to provide a basis for supervised learning. The magic unfolds to lay the foundation, just like the grunt work that allows a manager to shine. As explained later, both machine learning models can be effectively applied to business problems.
On a technical level, the difference between supervised and unsupervised learning is whether the raw data used to create the algorithm is pre-labeled (supervised learning) or not (unsupervised learning) ).
Let’s get started
What is supervised learning?
In supervised learning, data scientists provide the algorithm with labeled training data and define the variables they want the algorithm to evaluate for relevance
The input data and output variables of the algorithm are specified through the training data. For example, if you want to use supervised learning to train an algorithm to determine whether an image contains a cat, you can create a label for each image used in the training data to indicate whether the image contains a cat
As we explain in our definition of supervised learning: “[A] computer algorithm is trained on input data labeled for a specific output. The model is trained until it Ability to detect underlying patterns and relationships between input data and output labels, enabling it to produce accurate labeling results when presenting never-before-seen data. Common types of supervised algorithms include classification, decision trees, regression, and predictive modeling , which you can learn about in Arcitura Education's machine learning tutorials.
Supervised machine learning techniques are used in a variety of business applications, including the following:
- Personalized Marketing.
- Insurance/Credit Underwriting Decisions.
- Fraud detection.
- Spam filtering.
What is unsupervised learning?
In unsupervised learning, there is an algorithm suitable for this method (such as K-means clustering), which is on unlabeled data is trained. The algorithm scans the data set looking for any meaningful associations within it. In other words, unsupervised learning identifies patterns and similarities in the data rather than correlating them with some external measure
This approach is useful when you don’t know what you are looking for, but less useful when you do. If you show an unsupervised algorithm the number Thousands or millions of images, it might classify a subset of images that humans identify as felines. In contrast, supervised algorithms trained on labeled data of cats vs. canines can achieve a high degree of confidence can accurately identify images of cats. But this approach comes with a trade-off: If a supervised learning project requires millions of labeled images to develop a model, machine-generated predictions require a lot of human effort.
There is a middle ground: semi-supervised learning.
What is semi-supervised learning?
Semi-supervised learning is the combination of unsupervised learning and supervised learning An effective method of learning combination. It uses an unsupervised learning algorithm to automatically generate labels through a certain workflow, and then inputs these labels into the supervised learning algorithm. In this method, humans manually label some images, while unsupervised learning The algorithm guesses the labels of other images, and finally inputs all labels and images into the supervised learning algorithm to create an AI model
One benefit of semi-supervised learning is that it can reduce the cost of machine learning The cost of working with large-scale data sets. According to Aaron Kalb, co-founder and chief innovation officer of enterprise data catalog platform Alation, if humans can label 0.01% of millions of samples, computers can take advantage of them. tags to significantly improve its prediction accuracy
What is reinforcement learning?
Another machine learning method is reinforcement learning. Reinforcement learning is typically used to teach a machine to complete a sequence of steps, and differs from supervised and unsupervised learning. Data scientists program algorithms to perform tasks, giving positive or negative cues or reinforcements when determining how to complete tasks. The programmer sets the rules for the reward, but lets the algorithm decide what steps it needs to take to maximize the reward to complete the task.
When should you use supervised learning versus unsupervised learning?
Shivani Rao, machine learning manager at LinkedIn, said best practices for taking a supervised or unsupervised machine learning approach often depend on the environment, the assumptions you can make about the data and the application.
The choice of using supervised versus unsupervised machine learning algorithms will also change over time, Rao said. In the early stages of the model building process, data is often unlabeled, while labeled data can emerge in later stages of modeling.
For example, for the problem of predicting whether LinkedIn members will watch course videos, the first model uses unsupervised techniques. After these suggestions are provided, a metric that records whether someone clicks on the suggestion will provide new data to generate the tag
LinkedIn also uses this technique to tag skills that students may want to acquire. online courses. Human taggers, such as authors, publishers, or students, can provide a precise and accurate list of skills taught in a course, but they are unlikely to provide an exhaustive list of such skills. Therefore, these data labels can be considered incomplete. These types of problems can use semi-supervised techniques to help build a more exhaustive set of labels.
Bharath Thota, an expert in data science and advanced analytics and a partner at consulting firm Kearney, said his team chose to use supervised learning or When doing unsupervised learning, practical factors are often taken into consideration.
Thota said: "We choose supervised learning as the application when there is available labeled data, with the goal of predicting or classifying future observations. When there is no labeled data available, We use unsupervised learning and the goal is to develop strategies by identifying patterns or snippets from the data.” Kalb said Alation data scientists use unsupervised learning internally for a variety of applications program. For example, they developed a collaborative human-machine process for translating obscure data object names into human language—for example, “na_gr_rvnu_ps” into “North American total professional services revenue.” In this case, machine guesses, humans confirm, machine learning
"You can think of it as semi-supervised learning in an iterative loop, creating a virtuous cycle of improved accuracy ," Kalb said.
5 Unsupervised Learning Techniques
At a high level, supervised learning techniques tend to focus on linear regression (fitting a model to a group data points to make predictions) or classification problems (Does the image have a cat?
Unsupervised learning techniques often employ multiple ways of slicing and dicing the original data set to supplement supervision Learning works in these ways including:
Data clustering.
Data points with similar characteristics are grouped together to help understand and explore the data more effectively. For example, a company might use data clustering methods to segment customers into groups based on their demographics, interests, purchasing behaviors, and other factors.Dimensionality reduction.
Each variable in the dataset is treated as a separate dimension. However, many models work better by analyzing specific relationships between variables. A simple example of dimensionality reduction is using profit as a single dimension, which Represents income minus expenses—two separate dimensions. However, new, more complex variable types can be generated using algorithms such as principal component analysis, autoencoders, algorithms that convert text into vectors, or T-distributed stochastic neighborhood embedding.Dimensionality reduction can help reduce the problem of overfitting, in which a model works well for small data sets but does not generalize well to new data. The technique also enables companies to model models in 2D or 3D Forms visualize high-dimensional data that humans can easily understand.
# Anomaly or outlier detection.
Transfer learning. These algorithms utilize models trained on related but different tasks. For example, transfer learning techniques make it easy to fine-tune a classifier trained on Wikipedia articles to label any type of new text with the correct topics. LinkedIn’s Rao says this is one of the most effective and quickest ways to solve the problem of unlabeled data. Graph-based algorithm. These techniques try to build a graph that captures the relationship between data points, Rao said. For example, if each data point represents a LinkedIn member with a skill, you can represent the members using a graph, where edges represent skill overlap between members. Graph algorithms can also help transfer labels from known data points to unknown but closely related data points. Unsupervised learning can also be used to build graphs between different types of entities (sources and targets). The stronger the edge, the higher the affinity of the source node to the target node. For example, LinkedIn uses them to match members with skills-based courses.
The above is the detailed content of Supervised vs. unsupervised learning: Experts define the gap. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

Improve developer productivity, efficiency, and accuracy by incorporating retrieval-enhanced generation and semantic memory into AI coding assistants. Translated from EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG, author JanakiramMSV. While basic AI programming assistants are naturally helpful, they often fail to provide the most relevant and correct code suggestions because they rely on a general understanding of the software language and the most common patterns of writing software. The code generated by these coding assistants is suitable for solving the problems they are responsible for solving, but often does not conform to the coding standards, conventions and styles of the individual teams. This often results in suggestions that need to be modified or refined in order for the code to be accepted into the application

Large Language Models (LLMs) are trained on huge text databases, where they acquire large amounts of real-world knowledge. This knowledge is embedded into their parameters and can then be used when needed. The knowledge of these models is "reified" at the end of training. At the end of pre-training, the model actually stops learning. Align or fine-tune the model to learn how to leverage this knowledge and respond more naturally to user questions. But sometimes model knowledge is not enough, and although the model can access external content through RAG, it is considered beneficial to adapt the model to new domains through fine-tuning. This fine-tuning is performed using input from human annotators or other LLM creations, where the model encounters additional real-world knowledge and integrates it

To learn more about AIGC, please visit: 51CTOAI.x Community https://www.51cto.com/aigc/Translator|Jingyan Reviewer|Chonglou is different from the traditional question bank that can be seen everywhere on the Internet. These questions It requires thinking outside the box. Large Language Models (LLMs) are increasingly important in the fields of data science, generative artificial intelligence (GenAI), and artificial intelligence. These complex algorithms enhance human skills and drive efficiency and innovation in many industries, becoming the key for companies to remain competitive. LLM has a wide range of applications. It can be used in fields such as natural language processing, text generation, speech recognition and recommendation systems. By learning from large amounts of data, LLM is able to generate text

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

Machine learning is an important branch of artificial intelligence that gives computers the ability to learn from data and improve their capabilities without being explicitly programmed. Machine learning has a wide range of applications in various fields, from image recognition and natural language processing to recommendation systems and fraud detection, and it is changing the way we live. There are many different methods and theories in the field of machine learning, among which the five most influential methods are called the "Five Schools of Machine Learning". The five major schools are the symbolic school, the connectionist school, the evolutionary school, the Bayesian school and the analogy school. 1. Symbolism, also known as symbolism, emphasizes the use of symbols for logical reasoning and expression of knowledge. This school of thought believes that learning is a process of reverse deduction, through existing

Editor | KX In the field of drug research and development, accurately and effectively predicting the binding affinity of proteins and ligands is crucial for drug screening and optimization. However, current studies do not take into account the important role of molecular surface information in protein-ligand interactions. Based on this, researchers from Xiamen University proposed a novel multi-modal feature extraction (MFE) framework, which for the first time combines information on protein surface, 3D structure and sequence, and uses a cross-attention mechanism to compare different modalities. feature alignment. Experimental results demonstrate that this method achieves state-of-the-art performance in predicting protein-ligand binding affinities. Furthermore, ablation studies demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within this framework. Related research begins with "S

According to news from this website on July 5, GlobalFoundries issued a press release on July 1 this year, announcing the acquisition of Tagore Technology’s power gallium nitride (GaN) technology and intellectual property portfolio, hoping to expand its market share in automobiles and the Internet of Things. and artificial intelligence data center application areas to explore higher efficiency and better performance. As technologies such as generative AI continue to develop in the digital world, gallium nitride (GaN) has become a key solution for sustainable and efficient power management, especially in data centers. This website quoted the official announcement that during this acquisition, Tagore Technology’s engineering team will join GLOBALFOUNDRIES to further develop gallium nitride technology. G
