Dissecting long methods and performing extracted legacy code refactoring

解剖长方法并进行提取的遗留代码重构 - 第10部分

In part six of our series, we discussed attacking long methods by leveraging pair programming and viewing the code from different levels. We're constantly zooming in and out, looking at little things like naming, form, and indentation.

Today we will take another approach: we will assume that we are alone and have no colleagues or partners to help us. We'll use a technique called "Extract Until you Drop" to break the code into very small pieces. We will make every effort to make these sections as easy to understand as possible so that future generations of us or any other programmer will be able to understand them easily.

Extract until you give up

I first heard about this concept from Robert C. Martin. He proposed this idea in one of his videos as a simple way to refactor difficult-to-understand code.

The basic idea is to take small, understandable code snippets and extract them. It doesn't matter if you identify four lines or four characters that you can extract. When you identify content that can be encapsulated in a clearer concept, you can proceed with extraction. You continue this process on the original method and the newly extracted fragments until you can't find a piece of code that can be encapsulated as a concept.

This technique is especially useful when you are working alone. It forces you to think about both small and large chunks of code. It also has another nice effect: it makes you think about code - a lot! In addition to the extraction method or variable refactoring mentioned above, you will also find yourself renaming variables, functions, classes, and more.

Let’s look at an example of random code from the internet. Stackoverflow is a great place to find small code snippets. Here's how to determine if a number is prime:

//Check if a number is prime
function isPrime($num, $pf = null)
{
    if(!is_array($pf))
    {
        for($i=2;$i<intval(sqrt($num));$i++) {
            if($num % $i==0) {
                return false;
            }
        }
        return true;
    } else {
        $pfCount = count($pf);
        for($i=0;$i<$pfCount;$i++) {
            if($num % $pf[$i] == 0) {
                return false;
            }
        }
        return true;
    }
}

Copy after login

At this point, I have no idea how this code works. I just found it online as I was writing this and I will discover it with you. The process that follows may not be the cleanest. Instead, it will reflect my reasoning and reconstruction without pre-planning.

Refactoring the prime number checker

According to Wikipedia:

A prime number (or prime number) is a natural number greater than 1 that has no positive factors except 1 and itself.
As you can see, this is an easy way to solve a simple math problem. It returns true or false, so it should be easy to test as well.
class IsPrimeTest extends PHPUnit_Framework_TestCase {

    function testItCanRecognizePrimeNumbers() {
		$this->assertTrue(isPrime(1));
	}

}

// Check if a number is prime
function isPrime($num, $pf = null)
{
	// ... the content of the method as seen above
}
Copy after login
When we are just using the example code, the easiest way is to put everything into a test file. This way we don't have to think about which files to create, which directory they belong to, or how to include them in another directory. This is just a simple example so that we can become familiar with the technique before applying it to one of the trivia game methods. So everything is put in a test file and you can name it whatever you want. I chose IsPrimeTest.php.

The test passed. My next instinct was to add more primes instead of writing another test with non-primes.
function testItCanRecognizePrimeNumbers() {
    $this->assertTrue(isPrime(1));
	$this->assertTrue(isPrime(2));
	$this->assertTrue(isPrime(3));
	$this->assertTrue(isPrime(5));
	$this->assertTrue(isPrime(7));
	$this->assertTrue(isPrime(11));
}
Copy after login
It just passed. But so what?
function testItCanRecognizeNonPrimes() {
    $this->assertFalse(isPrime(6));
}
Copy after login
This fails unexpectedly: 6 is not a prime number. I'm expecting the method to return false. I have no idea how the method works, nor the purpose of the $pf parameter - I just want it to return false based on its name and description. I don't know why it doesn't work or how to fix it.

This is a rather confusing dilemma. What should we do? The best answer is to write tests that pass large numbers. We may have to try and guess, but at least we'll have some idea of what this method does. Then we can start reconstructing it.
function testFirst20NaturalNumbers() {
    for ($i=1;$i<20;$i++) {
		echo $i . ' - ' . (isPrime($i) ? 'true' : 'false') . "\n";
	}
}
Copy after login
Output something interesting:
1 - true
2 - true
3 - true
4 - true
5 - true
6 - true
7 - true
8 - true
9 - true
10 - false
11 - true
12 - false
13 - true
14 - false
15 - true
16 - false
17 - true
18 - false
19 - true
Copy after login
A pattern is starting to emerge here. All true up to 9, then alternate until 19. But will this pattern repeat? Try running 100 numbers and you'll immediately see that it's not. In fact, it seems to work for numbers between 40 and 99. Between 30-39 it fails once by specifying 35 as a prime number. The same is true in the 20-29 range. 25 is considered a prime number.

This exercise started out as a simple code demonstration of a technique, but turned out to be much more difficult than expected. I decided to keep it because it reflects real life in a typical way.

How many times have you started a seemingly simple task, only to find it extremely difficult?

We don't want to fix the code. Whatever the method does, it should continue to do so. We hope to refactor it so others can understand it better.

Since it doesn't tell the prime numbers in the correct way, we will use the same Golden Master method we learned in the first lesson.
function testGenerateGoldenMaster() {
    for ($i=1;$i<10000;$i++) {
		file_put_contents(__DIR__ . '/IsPrimeGoldenMaster.txt', $i . ' - ' . (isPrime($i) ? 'true' : 'false') . "\n", FILE_APPEND);
	}
}
Copy after login
Run it once to generate Golden Master. It should run fast. If you need to rerun it, don't forget to delete the file before executing the test. Otherwise the output will be appended to the previous content.
function testMatchesGoldenMaster() {
    $goldenMaster = file(__DIR__ . '/IsPrimeGoldenMaster.txt');
	for ($i=1;$i<10000;$i++) {
		$actualResult = $i . ' - ' . (isPrime($i) ? 'true' : 'false'). "\n";
		$this->assertTrue(in_array($actualResult, $goldenMaster), 'The value ' . $actualResult . ' is not in the golden master.');
	}
}
Copy after login
现在为金牌大师编写测试。这个解决方案可能不是最快的，但它很容易理解，并且如果破坏某些东西，它会准确地告诉我们哪个数字不匹配。但是我们可以将两个测试方法提取到 private 方法中，有一点重复。
class IsPrimeTest extends PHPUnit_Framework_TestCase {

    function testGenerateGoldenMaster() {
		$this->markTestSkipped();
		for ($i=1;$i<10000;$i++) {
			file_put_contents(__DIR__ . '/IsPrimeGoldenMaster.txt', $this->getPrimeResultAsString($i), FILE_APPEND);
		}
	}

	function testMatchesGoldenMaster() {
		$goldenMaster = file(__DIR__ . '/IsPrimeGoldenMaster.txt');
		for ($i=1;$i<10000;$i++) {
			$actualResult = $this->getPrimeResultAsString($i);
			$this->assertTrue(in_array($actualResult, $goldenMaster), 'The value ' . $actualResult . ' is not in the golden master.');
		}
	}

	private function getPrimeResultAsString($i) {
		return $i . ' - ' . (isPrime($i) ? 'true' : 'false') . "\n";
	}
}
Copy after login
现在我们可以移至生产代码了。该测试在我的计算机上运行大约两秒钟，因此是可以管理的。

竭尽全力提取

首先我们可以在代码的第一部分提取一个 isDivisible() 方法。
if(!is_array($pf))
{
    for($i=2;$i<intval(sqrt($num));$i++) {
		if(isDivisible($num, $i)) {
			return false;
		}
	}
	return true;
}
Copy after login
这将使我们能够重用第二部分中的代码，如下所示：
} else {
    $pfCount = count($pf);
	for($i=0;$i<$pfCount;$i++) {
		if(isDivisible($num, $pf[$i])) {
			return false;
		}
	}
	return true;
}
Copy after login
当我们开始使用这段代码时，我们发现它是粗心地对齐的。大括号有时位于行的开头，有时位于行的末尾。

有时，制表符用于缩进，有时使用空格。有时操作数和运算符之间有空格，有时没有。不，这不是专门创建的代码。这就是现实生活。真实的代码，而不是一些人为的练习。
//Check if a number is prime
function isPrime($num, $pf = null) {
    if (!is_array($pf)) {
		for ($i = 2; $i < intval(sqrt($num)); $i++) {
			if (isDivisible($num, $i)) {
				return false;
			}
		}
		return true;
	} else {
		$pfCount = count($pf);
		for ($i = 0; $i < $pfCount; $i++) {
			if (isDivisible($num, $pf[$i])) {
				return false;
			}
		}
		return true;
	}
}
Copy after login
看起来好多了。两个 if 语句立即看起来非常相似。但由于 return 语句，我们无法提取它们。如果我们不回来，我们就会破坏逻辑。

如果提取的方法返回一个布尔值，并且我们比较它来决定是否应该从 isPrime() 返回，那根本没有帮助。可能有一种方法可以通过使用 PHP 中的一些函数式编程概念来提取它，但也许稍后。我们可以先做一些简单的事情。
function isPrime($num, $pf = null) {
    if (!is_array($pf)) {
		return checkDivisorsBetween(2, intval(sqrt($num)), $num);
	} else {
		$pfCount = count($pf);
		for ($i = 0; $i < $pfCount; $i++) {
			if (isDivisible($num, $pf[$i])) {
				return false;
			}
		}
		return true;
	}
}

function checkDivisorsBetween($start, $end, $num) {
	for ($i = $start; $i < $end; $i++) {
		if (isDivisible($num, $i)) {
			return false;
		}
	}
	return true;
}
Copy after login
提取整个 for 循环要容易一些，但是当我们尝试在 if 的第二部分重用提取的方法时，我们可以看到它不起作用。有一个神秘的 $pf 变量，我们对此几乎一无所知。

它似乎检查该数字是否可以被一组特定除数整除，而不是将所有数字达到由 intval(sqrt($num)) 确定的另一个神奇值。也许我们可以将 $pf 重命名为 $divisors。
function isPrime($num, $divisors = null) {
    if (!is_array($divisors)) {
		return checkDivisorsBetween(2, intval(sqrt($num)), $num);
	} else {
		return checkDivisorsBetween(0, count($divisors), $num, $divisors);
	}
}

function checkDivisorsBetween($start, $end, $num, $divisors = null) {
	for ($i = $start; $i < $end; $i++) {
		if (isDivisible($num, $divisors ? $divisors[$i] : $i)) {
			return false;
		}
	}
	return true;
}
Copy after login
这是一种方法。我们在检查方法中添加了第四个可选参数。如果它有值，我们就使用它，否则我们使用 $i。

我们还能提取其他东西吗？这段代码怎么样：intval(sqrt($num))?
function isPrime($num, $divisors = null) {
    if (!is_array($divisors)) {
		return checkDivisorsBetween(2, integerRootOf($num), $num);
	} else {
		return checkDivisorsBetween(0, count($divisors), $num, $divisors);
	}
}

function integerRootOf($num) {
	return intval(sqrt($num));
}
Copy after login
这样不是更好吗？有些。如果后面的人不知道 intval() 和 sqrt() 在做什么，那就更好了，但这无助于让逻辑更容易理解。为什么我们在该特定数字处结束 for 循环？也许这就是我们的函数名称应该回答的问题。
[PHP]//Check if a number is prime
function isPrime($num, $divisors = null) {
    if (!is_array($divisors)) {
		return checkDivisorsBetween(2, highestPossibleFactor($num), $num);
	} else {
		return checkDivisorsBetween(0, count($divisors), $num, $divisors);
	}
}

function highestPossibleFactor($num) {
	return intval(sqrt($num));
}[PHP]
Copy after login
这更好，因为它解释了我们为什么停在那里。也许将来我们可以发明一个不同的公式来确定这个数字。命名也带来了一点不一致。我们将这些数字称为因子，它是除数的同义词。也许我们应该选择一个并只使用它。我会让您将重命名重构作为练习。

问题是，我们还能进一步提取吗？好吧，我们必须努力直到失败。我在上面几段提到了 PHP 的函数式编程方面。我们可以在 PHP 中轻松应用两个主要的函数式编程特性：一等函数和递归。每当我在 for 循环中看到带有 return 的 if 语句，就像我们的 checkDivisorsBetween() 方法一样，我就会考虑应用一种或两种技术。
function checkDivisorsBetween($start, $end, $num, $divisors = null) {
    for ($i = $start; $i < $end; $i++) {
		if (isDivisible($num, $divisors ? $divisors[$i] : $i)) {
			return false;
		}
	}
	return true;
}
Copy after login
但是我们为什么要经历如此复杂的思考过程呢？最烦人的原因是这个方法做了两个不同的事情：循环和决定。我只想让它循环并将决定留给另一种方法。一个方法应该总是只做一件事并且做得很好。
function checkDivisorsBetween($start, $end, $num, $divisors = null) {
    $numberIsNotPrime = function ($num, $divisor) {
		if (isDivisible($num, $divisor)) {
			return false;
		}
	};
	for ($i = $start; $i < $end; $i++) {
		$numberIsNotPrime($num, $divisors ? $divisors[$i] : $i);
	}
	return true;
}
Copy after login
我们的第一次尝试是将条件和返回语句提取到变量中。目前，这是本地的。但代码不起作用。实际上 for 循环使事情变得相当复杂。我有一种感觉，一点递归会有所帮助。
function checkRecursiveDivisibility($current, $end, $num, $divisor) {
    if($current == $end) {
		return true;
	}
}
Copy after login
当我们考虑递归性时，我们必须始终从特殊情况开始。我们的第一个例外是当我们到达递归末尾时。
function checkRecursiveDivisibility($current, $end, $num, $divisor) {
    if($current == $end) {
		return true;
	}

	if (isDivisible($num, $divisor)) {
		return false;
	}
}
Copy after login
我们会破坏递归的第二个例外情况是当数字可整除时。我们不想继续了。这就是所有例外情况。
ini_set('xdebug.max_nesting_level', 10000);
function checkDivisorsBetween($start, $end, $num, $divisors = null) {
    return checkRecursiveDivisibility($start, $end, $num, $divisors);
}

function checkRecursiveDivisibility($current, $end, $num, $divisors) {
	if($current == $end) {
		return true;
	}

	if (isDivisible($num, $divisors ? $divisors[$current] : $current)) {
		return false;
	}

	checkRecursiveDivisibility($current++, $end, $num, $divisors);
}
Copy after login
这是使用递归来解决我们的问题的另一次尝试，但不幸的是，在 PHP 中重复 10.000 次会导致我的系统上的 PHP 或 PHPUnit 崩溃。所以这似乎又是一个死胡同。但如果它能发挥作用，那将是对原始逻辑的一个很好的替代。

挑战

我在写《金主》的时候，故意忽略了一些东西。假设测试没有涵盖应有的代码。你能找出问题所在吗？如果是，您会如何处理？

Final Thoughts

"Fetch until discard" is a good way to dissect long methods. It forces you to think about small pieces of code and give those pieces purpose by extracting them into methods. I find it amazing how this simple process coupled with frequent renaming helps me discover that certain code can do things I never thought possible.

In our next and final tutorial on refactoring, we will apply this technique to a trivia game. I hope you enjoyed this tutorial that's a little different. We don't talk about textbook examples, we use some real code, and we have to struggle with real problems we face every day.
The above is the detailed content of Dissecting long methods and performing extracted legacy code refactoring - Part 10. For more information, please follow other related articles on the PHP Chinese website!