Many people feel that machine learning is unattainable and that it is a mysterious technology that only a few professional scholars understand.
After all, you are letting a machine running in a binary world come up with its own understanding of the real world. You are teaching them how to think. However, this article is hardly what you might think of as an obscure, complex, mathematical formula-filled article. Just like all basic common sense that helps us understand the world (for example: Newton's laws of motion, work needs to be completed, supply and demand relationships, etc.), the best methods and concepts of machine learning should also be concise and clear. Unfortunately, the vast majority of literature on machine learning is filled with complex symbols, obscure mathematical formulas, and unnecessary nonsense. It is this that surrounds the simple and basic idea of machine learning with a thick wall.
Now looking at a practical example, we need to add a "you may like" recommendation function at the end of an article, so how do we implement it?
To implement this idea we have a simple solution:
def
similar_posts(post)
title_keywords = post.title.split(
' '
)
Post.all.to_a.sort |post1, post2|
post1_title_intersection = post1.body.split(
' '
) & title_keywords
post2_title_intersection = post2.body.split(
' '
) & title_keywords
post2_title_intersection.length <=> post1_title_intersection.length
end
[
0
..
9
]
end
Using this method to find articles similar to the blog post "How Support Teams Improve Product Quality", we got the following top ten most relevant articles:
As you can see, the benchmark article is about how to provide team support efficiently, and this has nothing to do with analyzing customer groups and discussing the advantages of design. In fact, we can also take a better approach.
Now, we try to solve this problem using a true machine learning method. Proceed in two steps:
1. Express the article in mathematical form
If we could display the articles in mathematical form, we could plot the similarity between previous articles and identify different clusters:
As shown in the figure above, it is not difficult to map each article into a coordinate point on the coordinate system. It can be achieved through the following two steps:
The Ruby code is as follows:
@posts
= Post.all
@words
=
@posts
.map
do
|p|
p.body.split(
' '
)
end
.flatten.uniq
@vectors
=
@posts
.map
do
|p|
@words
.map
do
|w|
p.body.include?(w) ?
1
:
0
end
end
Suppose the value of @words is:
["Hello","Internal","Internal Communication","Reader","Blog","Publish"]
If the content of an article is "Hello Blog Post Reader", then its corresponding array is:
[1,0,0,1,1,1]
Of course, we currently cannot use simple tools to display this six-dimensional coordinate point like a two-dimensional coordinate system, but the basic concepts involved, such as the distance between two points, are all interoperable and can be generalized to Higher dimensions (so using a two-dimensional example to illustrate the problem still works).
2. Use K-means clustering algorithm to perform cluster analysis on data points
Now that we have the coordinates of a series of articles, we can try to find clusters of similar articles. Here we use a fairly simple clustering algorithm - K-means algorithm, which can be summarized in five steps:
We next visualize these steps in diagram form. First we randomly select two points (K=2) from a series of article coordinates:
We assign each article to the cluster closest to it:
We calculate the mean coordinate of all objects in each cluster as the new center of the cluster.
In this way, we have completed the first data iteration, and now we reassign the articles to the corresponding clusters based on the new cluster centers.
At this point, we have found the cluster corresponding to each article! Obviously, even if the cluster center continues to iterate, the cluster center will not change, and the cluster corresponding to each article will not change either.
The Ruby code for the above process is as follows:
@cluster_centers
= [rand_point(), rand_point()]
15
.times
do
@clusters
= [[], []]
@posts
.
each
do
|post|
min_distance, min_point =
nil
,
nil
@cluster_centers
.
each
.with_index
do
|center, i|
if
distance(center, post) < min_distance
min_distance = distance(center, post)
min_point = i
end
end
@clusters
[min_point] << post
end
@cluster_centers
=
@clusters
.map
do
|post|
average(posts)
end
end
The following are the top ten articles obtained by this method that are similar to the blog post "How Support Teams Improve Product Quality":
The results speak for themselves.
We only used less than 40 lines of code and a simple algorithm introduction to implement this idea. However, if you read academic papers, you will never know how simple this should be. The following is an abstract of a paper introducing the K-means algorithm (I don’t know who proposed the K-means algorithm, but this is the first time the term "K-means" was proposed).
If you like to use mathematical symbols to express your ideas, there is no doubt that academic papers are very useful. However, there are actually more high-quality resources that can replace these complicated mathematical formulas, which are more practical and approachable.
Give it a try
How to apply recommended tags for your project management? How to design your customer support tools? Or how users are grouped in social networks? These can all be implemented through simple codes and simple algorithms, which is a good opportunity to practice! So, if you think the problem you’re facing in your project can be solved with machine learning, why hesitate?
Machine learning is actually simpler than you think!
Original link: Intercom Translation: Bole Online - zhibinzeng
Translation link: http://blog.jobbole.com/53546/
================================================== ====
The PPC WeChat platform is launched!
Search "PHPChina" on WeChat and click the follow button to get the latest and most professional industry information pushed by PPC for you, and there are more special columns for you
【PPC Mining】: From time to time, we will provide you with stories about classic products and product people.
[PPC Foreign Language]: Share a foreign language translation article every day
【PPCoder】: Focus on replying to questions from following users every day