A crude implementation of magnetic lasso in Photoshop (based on Python)
高洛峰
Release: 2017-02-18 13:38:09
Original
3227 people have browsed it
Those who frequently use Photoshop should be familiar with the Magnetic Lasso function, which is a human-guided cutout auxiliary tool. It is generally not called this in the R&D field. This edge extraction method is usually called Intelligent Scissors or Livewire.
It was originally a Python framework for algorithm evaluation of an image segmentation project. I thought it was a bit interesting, so I expanded it a little, added a shell with PyQt, and simulated the magnetic lasso to a very crude degree. Function. Why it is simple: 1) Only the most basic edge search is implemented. There is no path cooling, dynamic training, mouse position correction, let alone curve closing, cutout, Alpha Matting, etc.; 2) No performance specifications are considered, just for the convenience of writing; 3) I have a very shallow understanding of Qt, and I still can’t write Signal-Slot, I don’t know if the GUI is written reasonably; 4) No debugging.
Basic Algorithm
I have not done in-depth research on related algorithms, but I believe that this type of algorithm The most influential algorithm in application comes from [1], which is also the main reference of this article. The basic idea is to regard the picture as an undirected graph, and a local cost can be calculated between adjacent pixels, so it is converted into Now that the shortest path problem has been solved, the next step is to generate a path based on the Dijkstra algorithm, which is the edge that needs to be extracted. There are two main parts of the algorithm involved: 1) cost calculation of adjacent pixels; 2) shortest path algorithm.
Edge detection
The ultimate purpose of calculating the cost of adjacent pixels is to find edges, so the essence is still edge detection. The basic idea is to detect edges through various means, and calculate a weighted value based on the detected intensity as the cost. From the perspective of the shortest path, the more obvious the edge, the smaller the cost value. The suggestion in [1] is to use three indicators for weighting: 1) edge detection operator; 2) gradient strength (Gradient Magnitude); 3) gradient direction (Gradient Direction). The method in this article is somewhat different from [1]. Because of laziness, the Canny operator in OpenCV is used to detect edges instead of the Laplacian Zero-Crossing Operator. The expression is as follows:
The basic idea is to first detect many connected pixels based on gradient information, and then for each Only the largest and connected parts of the connected pixels are taken, and the surrounding areas are set to zero to obtain the initial edges (Edges). This process is called Non-Maximum Suppression. Then use a two-threshold method to divide these detected initial edges into three levels: Strong, Weak, and None. As the name suggests, Strong means that it is definitely an edge, and None is discarded, and then the ones connected to Strong are selected from Weak. As the edge is retained, the final result is obtained. This process is called Hysteresis Thresholding. This algorithm is so classic, I won’t go into details as soon as Google comes up with a lot of details. The formula is as follows:
\[{{f}_{E}}\left( q \right)=\left\{ \begin{matrix} 0;\text{ if }q\text { is on a edge} \\ 1;\text{ if }q\text{ is not on a edge} \\ \end{matrix} \right.\]
In fact The calculation of weights has some overlap with the maximum gradient, because if the path found along the direction of the maximum gradient is basically an edge, my understanding of the role of this item is mainly to 1) avoid the occurrence of separation in areas with large gradients. Obvious edge deviation; 2) Ensure the continuity of extracted edges, and to a certain extent, ensure smoothness.
Gradient strength
is just the modulus of the gradient. The squares of the gradient values in the x and y directions are added to the square root. The formula is as follows :
This item is actually a smoothing item, which will assign a relatively high cost to the edges that change drastically, allowing extraction edges to avoid the influence of noise. The specific formula is as follows:
\[{{l}_{D}}\left( p,q \right)=\left\{ \begin{matrix} q-p;\text{ if }\left\langle {{d}_{\bot }}\left( p \right),q-p \right\rangle \ge 0 \\ p-q;\text{ if }\left\langle {{ d}_{\bot }}\left( p \right),q-p \right\rangle <0 \\ \end{matrix} \right.\]
\({{d}_{\bot }}\left( p \right)\) is to take the vertical direction of p. In addition, please note that the judgment of the sign in the above formula will change \({{d}_{\ The values of bot }}\left( p \right)\) and \({{l}_{D}}\left( p,q \right)\) are limited to π/2 .
\[{{d}_{\bot }}\left( p \right)=\left( {{p}_{y}},-{{p}_{x}} \ right)\]
cost correction in the diagonal direction
In a two-dimensional image, adjacent pixels are usually spaced according to the Euclidean distance There are two types: 1) Adjacent up, down, left and right, the interval is the pixel side length; 2) Diagonally adjacent, the interval is \(\sqrt{2}\) times the pixel side length. When calculating the local cost, the impact of this distance difference should usually be taken into account, such as the following picture:
##2
3
4
##5
6
6
##7
8
9
If the influence of pixel position is not considered, then when finding the minimum cost, the cost=2 in the upper left corner will be considered the smallest. However, if we consider the influence of pixel spacing, let's look at the direction of the upper left corner. The difference from the center is 6-2. If we do a linear interpolation, the value of the upper left corner per unit distance from the center should be\(6-\left( 6-2 \right)\times 1/\sqrt{2}\ =3.17>3\), so the one directly above is the correct direction of the minimum cost.
Shortest path search
In magnetic lasso, the general usage is to click a point first, and then move the mouse, between the mouse and the initially clicked point A line automatically close to the edge will appear between them. Here we define the pixel point clicked at the beginning as the seed point (seed), and the magnetic lasso actually finds the seed point taking into account the edge-related cost mentioned in the previous part. The shortest path of the current mouse. As shown in the picture below, the red ones are seed points, and when you move the mouse, the connection between the seed point closest to the edge and the mouse coordinates will be displayed in real time. This is why the magnetic lasso is also called Livewire.
There are many ways to achieve the shortest path. Generally speaking, it is dynamic programming. What is introduced here is an implementation based on Dijkstra's algorithm. The basic idea is that, given a seed point Finally, Dijkstra's algorithm is executed to traverse all pixels of the image to obtain the shortest path from each pixel to the seed point. Take the picture below as an example. In a cost matrix, after using the Dijkstra algorithm to traverse each element, each element will point to an adjacent element, so that any pixel can find a path to the seed, such as the upper right corner. The pixels corresponding to 42 and 39 follow the arrow to 0.
The algorithm is as follows:
输入:
s // 种子点
l(q,r) // 计算局部cost
数据结构:
L // 当前待处理的像素
N(q) // 当前像素相邻的像素
e(q) // 标记一个像素是否已经做过相邻像素展开的Bool函数
g(q) // 从s到q的总cost
输出:
p // 记录所有路径的map
算法:
g(s)←0; L←s; // 将种子点作为第一点初始化
while L≠Ø: // 遍历尚未结束
q←min(L); // 取出最小cost的像素并从待处理像素中移除
e(q)←TRUE; // 将当前像素记录为已经做过相邻像素展开
for each r∈N(q) and not e(r):
gtemp←g(q)+l(q,r); // 计算相邻像素的总cost
if r∈L and gtemp<g(r): // 找到了更好的路径
r←L; { from list.} // 舍弃较大cost的路径
else if r∉L:
g(r)←gtemp; // 记录当前找到的最小路径
p(r)←q;
L←r; // 加入待处理以试图寻找更短的路径
Copy after login
The traversal process will give priority to the area with the lowest cost, as shown below:
After the shortest paths to the seed pixels corresponding to all pixels are found, just draw the shortest path to the seed directly when moving the mouse.
Python implementation
The algorithm part directly calls OpenCV’s Canny function and Sobel function (for gradient), and the processing of RGB is also very simple. , directly approximated by the maximum value of the gradient. In addition, because of laziness, both cost map and path map directly use dictionaries (dict), while recording expanded pixels directly uses sets (set). The GUI part uses Python's threading because I don't know how to use QThread. Only the image displays the interactive area and the status bar prompt. Left-click to set the seed point, and right-click to end. The extracted edge is a green line, and the edge being extracted is a blue line.
Code
Algorithm part
from __future__ import division
import cv2
import numpy as np
SQRT_0_5 = 0.70710678118654757
class Livewire():
"""
A simple livewire implementation for verification using
1. Canny edge detector + gradient magnitude + gradient direction
2. Dijkstra algorithm
"""
def __init__(self, image):
self.image = image
self.x_lim = image.shape[0]
self.y_lim = image.shape[1]
# The values in cost matrix ranges from 0~1
self.cost_edges = 1 - cv2.Canny(image, 85, 170)/255.0
self.grad_x, self.grad_y, self.grad_mag = self._get_grad(image)
self.cost_grad_mag = 1 - self.grad_mag/np.max(self.grad_mag)
# Weight for (Canny edges, gradient magnitude, gradient direction)
self.weight = (0.425, 0.425, 0.15)
self.n_pixs = self.x_lim * self.y_lim
self.n_processed = 0
@classmethod
def _get_grad(cls, image):
"""
Return the gradient magnitude of the image using Sobel operator
"""
rgb = True if len(image.shape) > 2 else False
grad_x = cv2.Sobel(image, cv2.CV_64F, 1, 0, ksize=3)
grad_y = cv2.Sobel(image, cv2.CV_64F, 0, 1, ksize=3)
if rgb:
# A very rough approximation for quick verification...
grad_x = np.max(grad_x, axis=2)
grad_y = np.max(grad_y, axis=2)
grad_mag = np.sqrt(grad_x**2+grad_y**2)
grad_x /= grad_mag
grad_y /= grad_mag
return grad_x, grad_y, grad_mag
def _get_neighbors(self, p):
"""
Return 8 neighbors around the pixel p
"""
x, y = p
x0 = 0 if x == 0 else x - 1
x1 = self.x_lim if x == self.x_lim - 1 else x + 2
y0 = 0 if y == 0 else y - 1
y1 = self.y_lim if y == self.y_lim - 1 else y + 2
return [(x, y) for x in xrange(x0, x1) for y in xrange(y0, y1) if (x, y) != p]
def _get_grad_direction_cost(self, p, q):
"""
Calculate the gradient changes refer to the link direction
"""
dp = (self.grad_y[p[0]][p[1]], -self.grad_x[p[0]][p[1]])
dq = (self.grad_y[q[0]][q[1]], -self.grad_x[q[0]][q[1]])
l = np.array([q[0]-p[0], q[1]-p[1]], np.float)
if 0 not in l:
l *= SQRT_0_5
dp_l = np.dot(dp, l)
l_dq = np.dot(l, dq)
if dp_l < 0:
dp_l = -dp_l
l_dq = -l_dq
# 2/3pi * ...
return 0.212206590789 * (np.arccos(dp_l)+np.arccos(l_dq))
def _local_cost(self, p, q):
"""
1. Calculate the Canny edges & gradient magnitude cost taking into account Euclidean distance
2. Combine with gradient direction
Assumption: p & q are neighbors
"""
diagnol = q[0] == p[0] or q[1] == p[1]
# c0, c1 and c2 are costs from Canny operator, gradient magnitude and gradient direction respectively
if diagnol:
c0 = self.cost_edges[p[0]][p[1]]-SQRT_0_5*(self.cost_edges[p[0]][p[1]]-self.cost_edges[q[0]][q[1]])
c1 = self.cost_grad_mag[p[0]][p[1]]-SQRT_0_5*(self.cost_grad_mag[p[0]][p[1]]-self.cost_grad_mag[q[0]][q[1]])
c2 = SQRT_0_5 * self._get_grad_direction_cost(p, q)
else:
c0 = self.cost_edges[q[0]][q[1]]
c1 = self.cost_grad_mag[q[0]][q[1]]
c2 = self._get_grad_direction_cost(p, q)
if np.isnan(c2):
c2 = 0.0
w0, w1, w2 = self.weight
cost_pq = w0*c0 + w1*c1 + w2*c2
return cost_pq * cost_pq
def get_path_matrix(self, seed):
"""
Get the back tracking matrix of the whole image from the cost matrix
"""
neighbors = [] # 8 neighbors of the pixel being processed
processed = set() # Processed point
cost = {seed: 0.0} # Accumulated cost, initialized with seed to itself
paths = {}
self.n_processed = 0
while cost:
# Expand the minimum cost point
p = min(cost, key=cost.get)
neighbors = self._get_neighbors(p)
processed.add(p)
# Record accumulated costs and back tracking point for newly expanded points
for q in [x for x in neighbors if x not in processed]:
temp_cost = cost[p] + self._local_cost(p, q)
if q in cost:
if temp_cost < cost[q]:
cost.pop(q)
else:
cost[q] = temp_cost
processed.add(q)
paths[q] = p
# Pop traversed points
cost.pop(p)
self.n_processed += 1
return paths
livewire.py
Copy after login
livewire.py
GUI part
from __future__ import division
import time
import cv2
from PyQt4 import QtGui, QtCore
from threading import Thread
from livewire import Livewire
class ImageWin(QtGui.QWidget):
def __init__(self):
super(ImageWin, self).__init__()
self.setupUi()
self.active = False
self.seed_enabled = True
self.seed = None
self.path_map = {}
self.path = []
def setupUi(self):
self.hbox = QtGui.QVBoxLayout(self)
# Load and initialize image
self.image_path = ''
while self.image_path == '':
self.image_path = QtGui.QFileDialog.getOpenFileName(self, '', '', '(*.bmp *.jpg *.png)')
self.image = QtGui.QPixmap(self.image_path)
self.cv2_image = cv2.imread(str(self.image_path))
self.lw = Livewire(self.cv2_image)
self.w, self.h = self.image.width(), self.image.height()
self.canvas = QtGui.QLabel(self)
self.canvas.setMouseTracking(True)
self.canvas.setPixmap(self.image)
self.status_bar = QtGui.QStatusBar(self)
self.status_bar.showMessage('Left click to set a seed')
self.hbox.addWidget(self.canvas)
self.hbox.addWidget(self.status_bar)
self.setLayout(self.hbox)
def mousePressEvent(self, event):
if self.seed_enabled:
pos = event.pos()
x, y = pos.x()-self.canvas.x(), pos.y()-self.canvas.y()
if x < 0:
x = 0
if x >= self.w:
x = self.w - 1
if y < 0:
y = 0
if y >= self.h:
y = self.h - 1
# Get the mouse cursor position
p = y, x
seed = self.seed
# Export bitmap
if event.buttons() == QtCore.Qt.MidButton:
filepath = QtGui.QFileDialog.getSaveFileName(self, 'Save image audio to', '', '*.bmp\n*.jpg\n*.png')
image = self.image.copy()
draw = QtGui.QPainter()
draw.begin(image)
draw.setPen(QtCore.Qt.blue)
if self.path_map:
while p != seed:
draw.drawPoint(p[1], p[0])
for q in self.lw._get_neighbors(p):
draw.drawPoint(q[1], q[0])
p = self.path_map[p]
if self.path:
draw.setPen(QtCore.Qt.green)
for p in self.path:
draw.drawPoint(p[1], p[0])
for q in self.lw._get_neighbors(p):
draw.drawPoint(q[1], q[0])
draw.end()
image.save(filepath, quality=100)
else:
self.seed = p
if self.path_map:
while p != seed:
p = self.path_map[p]
self.path.append(p)
# Calculate path map
if event.buttons() == QtCore.Qt.LeftButton:
Thread(target=self._cal_path_matrix).start()
Thread(target=self._update_path_map_progress).start()
# Finish current task and reset
elif event.buttons() == QtCore.Qt.RightButton:
self.path_map = {}
self.status_bar.showMessage('Left click to set a seed')
self.active = False
def mouseMoveEvent(self, event):
if self.active and event.buttons() == QtCore.Qt.NoButton:
pos = event.pos()
x, y = pos.x()-self.canvas.x(), pos.y()-self.canvas.y()
if x < 0 or x >= self.w or y < 0 or y >= self.h:
pass
else:
# Draw livewire
p = y, x
path = []
while p != self.seed:
p = self.path_map[p]
path.append(p)
image = self.image.copy()
draw = QtGui.QPainter()
draw.begin(image)
draw.setPen(QtCore.Qt.blue)
for p in path:
draw.drawPoint(p[1], p[0])
if self.path:
draw.setPen(QtCore.Qt.green)
for p in self.path:
draw.drawPoint(p[1], p[0])
draw.end()
self.canvas.setPixmap(image)
def _cal_path_matrix(self):
self.seed_enabled = False
self.active = False
self.status_bar.showMessage('Calculating path map...')
path_matrix = self.lw.get_path_matrix(self.seed)
self.status_bar.showMessage(r'Left: new seed / Right: finish')
self.seed_enabled = True
self.active = True
self.path_map = path_matrix
def _update_path_map_progress(self):
while not self.seed_enabled:
time.sleep(0.1)
message = 'Calculating path map... {:.1f}%'.format(self.lw.n_processed/self.lw.n_pixs*100.0)
self.status_bar.showMessage(message)
self.status_bar.showMessage(r'Left: new seed / Right: finish')
gui.py
Copy after login
gui.py
Main function
import sys
from PyQt4 import QtGui
from gui import ImageWin
def main():
app = QtGui.QApplication(sys.argv)
window = ImageWin()
window.setMouseTracking(True)
window.setWindowTitle('Livewire Demo')
window.show()
window.setFixedSize(window.size())
sys.exit(app.exec_())
if __name__ == '__main__':
main()
main.py
Copy after login
main.py
It was painfully uploaded to Github (portal), welcome to fork.
Improvements in efficiency
Because the prototype of this code is only for Python evaluation and verification before development in C++, efficiency is not considered at all, and the execution speed is completely unacceptable. , basically 400x400 pictures cannot be tolerated... As for the efficiency improvement based on the Python version, I have not thought about it carefully, but it seems that there are a few obvious places:
1) Take out the current minimum cost Pixel operation
p = min(cost, key=cost.get)
Although this is fun to write, it is obviously not possible. At least a min heap must be used. What. Because I used dict to represent both the pixels to be processed and the cost, and I was too lazy to think about how to combine it with Python's heapq, so I just used the crude and trouble-free min().
2) Calculation of gradient direction
The calculation of trigonometric functions should be avoided as much as possible. In addition, the original article may be to extend the value range to >π, so q-p is also used. In fact, this item has a small weight, so I'm afraid it's okay to just do a dot product of the gradient direction vectors of the two pixels and then normalize the result. Even if you want to use arccos, you can consider writing a look-up table approximation. Of course, the last thing I want to say is that I personally feel that this is really not necessary. Direct adaptive spilne or even three-point mean smoothing and denoising effect will be good.
3) Calculate the position of adjacent pixels
If two pixels are adjacent, the 8 adjacent pixels around them will also coincide. My method is relatively primitive and can calculate the rate directly without modularization.
4) Replace some data structures
For example, path map is essentially a matrix that gives the previous pixel on the shortest path for each pixel. In fact, you can consider using a linear data structure instead, but if you do this, it will generally be in C/C++ code.
5) numpy
In my impression, the order in which numpy is called will also affect efficiency. Continuously calling numpy's built-in methods seems to bring about an overall improvement in efficiency, but then again, If it reaches this point in actual application, it should also fall into the category of C/C++ code.
6) Improvements at the algorithm level
This area has not been studied in depth. The first impression is that in practical applications, it is not necessary to calculate the entire image at the beginning. Some blocks can be divided according to the seed position. , the mouse itself will also leave a trajectory, and you may also consider performing a heuristic search only in the direction of the mouse trajectory. In addition, when calculating the path, you may consider borrowing ideas similar to Image Pyramid. There is no need to search for the path at full resolution right from the start. Since the projects I did later did not use this algorithm, I did not continue to study it. Although I was very curious, there are actually many ready-made codes, such as GIMP, but I didn't have the energy to look at it.
More improvements
Although none of them have been done, I will briefly introduce them and consider the practical improvements.
Path Cooling
Anyone who has used Photoshop and GIMP Magnetic Lasso knows that even if the mouse does not click on the image, it will automatically move during the movement. Generate some points that fix the cutout trajectory. These points are actually new seed points. This method of automatically generating new seed points during use is called Path cooling. The basic idea of this method is as follows: as the mouse moves, if a path remains fixed for a certain period of time, then the point farthest from the seed in this path is set as the new seed. In fact, what is hidden behind it is dynamic. The idea of planning, Bellman optimality. The name is also quite vivid, path cooling.
Dynamic Training(Interactive Dynamic Training)
When using simple shortest path search, the edges often found are not what you want. For example, in the picture above, the green line is the edge extracted in the previous section, and the blue line is the edge currently being extracted. In the picture on the left, the edge of Lena's hat outside the mirror is what we want to extract. However, since the edge of Lena's hat in the mirror has a lower cost, the actual extracted blue line segment is pasted to the right as in the picture on the right. So the idea of Interactive Dynamic Training is to consider the green line segment as the correctly extracted edge, and then use the green line segment as training data to add a correction value to the cost function of the current edge extraction.
The method used in
[1] is to count the histogram of the gradient intensity of the points on the edge of the previous segment, and then weight the pixels in the current image according to the frequency of gradient occurrence. For example, if the gradient intensity corresponding to all pixels in the green line segment is between 50 and 100, then 50 to 100 can be divided into 5 bins in units of 10, and the frequency of occurrence in each bin can be counted. It is a histogram, and then linearly weights the currently detected gradient intensity. For example, if there are at most 10 corresponding pixels in the range of 50~60, then 10 is regarded as the maximum value, and the pixels with currently detected gradient strengths between 50 and 60 are multiplied by a coefficient of 1.0; if the training data is 70~ There are 5 between 80, then the cost weighting coefficient is 5/10=0.5, then all pixels with currently detected gradient strengths between 70 and 80 are multiplied by the coefficient 0.5; if there are none above 100 in the training data, so The cost is added to 0/10=0, and the weighting coefficient is 0, so that even if a stronger edge is detected, it will not deviate from the previous edge. This is the basic idea. Of course, the actual implementation is not that simple. In addition to the pixels on the edge, the two pixels on the left and right on the vertical edge must also be considered, thus ensuring the edge pattern. In addition, as the mouse gets farther and farther away from the training edge, the pattern of the detected edge may appear different, so Training may have a counterproductive effect. Therefore, the scope of this Training also needs to take into account the distance between the mouse and the seed point. Finally, There needs to be some smoothing and denoising processing. The details are mentioned in [1]. It is quite cumbersome (it seems that there was no SIFT at that time), so I won’t go into details.
Correction of seed point position (Cursor Snap)
Although this algorithm can automatically find the path closest to the edge between the seed point and the mouse, however, human My hands often shake, so the seed points may not be well set to the edge. Therefore, after the user sets the seed point position, the pixel with the lowest cost can be automatically searched within a small range around its coordinates, such as a 7x7 area, as the real seed point position. This process is called Cursor snap.
For more about a simple implementation of magnetic lasso in Photoshop (based on Python), please pay attention to the PHP Chinese website for related articles!
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn