Weird precision diff tracing_PHP tutorial-PHP Tutorial-php.cn

Table of Contents

Weird precision diff tracking

Home

Backend Development

PHP Tutorial

Weird precision diff tracing_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jul 12, 2016 am 09:01 AM

android

Weird precision diff tracking

1. Problems found in Query-diff test

Query-diff is a commonly used testing method on the retrieval end. The idea is to use A set of the same retrieval information respectively requests the baseline version and the test version of a system or module. Typically, there are only minor differences (program functionality/configuration, etc.) between the baseline version and the version under test. After sending the request, compare the search results returned by the two versions to verify whether the difference affects the final calculation result.

Weird precision diff tracing_PHP tutorial

The tested module A in this case is written in C, and the output core data is a single-precision floating point number, recorded as Q.

When performing the query-diff test after a certain upgrade of module A, it was found that there is a precision diff in the Q value, the proportion is about 1%, the maximum diff is in the decimal place, and this upgrade is expected to be diff-free.

2. In-depth investigation

Usually when diff occurs, you must first clarify the direction of the investigation. If you cannot see the reason at a glance, you need to use the elimination method to verify the suspects one by one, narrow the scope, and reduce the Unnecessary investment of energy. So two major investigation directions are listed: environment or program.

Look at the environment first:

l Carefully checked the configuration and vocabulary of the old and new environments at the environment site, and they were in line with expectations, excluding factors related to environment construction tools.

l Since this upgrade is forward compatible, the configuration and vocabulary of the old and new environments are unified, retested, and diff reproduced, eliminating configuration differences.

There seems to be no problem with the environment, let’s go back to the verification process:

l Since multiple sets of tests have been done, the verification results have not changed, ruling out the possibility of random strategy diff.

l Print the debug log and check the intermediate results of each step in the processing. There are no problems. Only diff appears in the last step of calculating the Q value. Thread dirty data, process-level cache dirty data and variable types are successively excluded. Conversion and other risk points.

l For complete confirmation, directly replace the programs in the old and new environments with the new versions and retest. If it is really caused by the program, there should be no diff. However, the diff reappears! Obviously there is no random diff? ! !

At this time, the bottleneck has been identified. The reasons for the environment and the program seem to be wrong.

Calm down and think again. The previous investigation explained the concept of environment as the configuration and vocabulary used. It was believed that if the two are the same, the environment is the same. This is one-sided. The meaning of environment should also include the compilation environment and running environment of the system and hardware. So we have a new verification idea:

l Both the old and new versions of the program are produced using the company's cloud compilation cluster, so there should be no problem. However, to avoid taking things for granted, we carefully checked the compilation parameters and re-run them on the same local machine. Compiled the old and new versions, confirmed the diff recurrence, and eliminated compilation factors;

l Copy the old and new environments to the same machine, repress the request, and the diff disappears! Confirmed to be a factor in the operating environment

The operating environment includes the operating system and hardware levels. Strike while the iron is hot and continue to investigate:

l Confirm that the operating systems of the two machines where the diff appears are consistent, both are centos 4.3, and are ruled out Operating system;

l The difference in hard disk and memory models is less likely to cause diff, so we will not verify it yet;

l The CPU version of the machine where the new environment is located is Xeon E5645, and the CPU version of the machine where the old environment is located Xeon E5-2620, suspecting that the CPU model is different, I found another machine with the same CPU as the old environment to deploy the new environment, retested, the diff disappeared, and the target was locked to the CPU.

Weird precision diff tracing_PHP tutorial

2. Revealing the truth

After analyzing the CPU, after simply excluding the number of cores, the maximum number of threads, and the first, second and third level caches, the instruction set differences in the CPU feature list caught my attention. .
Weird precision diff tracing_PHP tutorial

Supplementary knowledge 1: The role of the cpu instruction set

The instruction set is a hard program stored inside the CPU that guides and optimizes CPU operations. With these instruction sets, the CPU can run more efficiently. To explain how instruction sets are optimized, two technologies have to be mentioned: SISD (Single Instruction Single Data) and SIMD (Single Instruction Multiple Data).

Take the addition instruction as an example. After using the SISD CPU to decode the addition instruction, the execution unit first accesses the memory to obtain the first operand, and then accesses the memory again to obtain the second operand. Only then can the summation operation be performed. In a CPU using SIMD, after the instruction is decoded, several execution units access the memory at the same time and obtain all the operands at once for operation. This feature makes SIMD particularly suitable for data-intensive operations.

The SSE series and AVX in the CPU instruction set are used for floating point operations, and AVX is one of the differences between the two CPUs, which is highly suspicious. Now we need to find evidence that the program is optimized using AVX.

However, there is no directly optimized code logic in the ASQ module. Although the program involving Q value calculation calls the static libA interface, the libA code does not use the instruction set. However, libA compiled static libB, so we traced all the way to the bottom layer and found that the fourth layer of compilation dependencies was libX provided by IDL. The code was confidential and could not be viewed.

I had to ask the relevant RD for advice. The RD informed that libX does use SSE instruction optimization and the math function library MKL provided by Intel, but does not use AVX.

Weird precision diff tracing_PHP tutorial

Is this another dead end? With the last bit of hope, I checked MKL’s official introduction on Intel and found an unexpected gain: AVX optimization was introduced in MKL! 【1】

Weird precision diff tracing_PHP tutorial

Now we have the last step to confirm that AVX is the culprit of the diff source. Soon, further evidence was found in Intel's products [2]:

Weird precision diff tracing_PHP tutorial

The FMA instructions in AVX2 involve floats in matrix multiplication, dot product, polynomial evaluation, etc. The efficiency and accuracy of point operations have been improved compared to previous instruction sets, because FMA can complete multiplication and accumulation operations at one time. I also found posts from relevant technical personnel in the official forum to support [3]:

Weird precision diff tracing_PHP tutorial

Supplementary knowledge two: floating point number storage methods in computers

float and double They all comply with IEEE specifications in terms of storage methods. Float complies with IEEE R32.24, and double complies with R64.53.

Whether it is single precision or double precision, storage is divided into three parts:

1. Sign bit (Sign): 0 represents positive, 1 represents negative

2. Exponent bit (Exponent): used to store exponent data in scientific notation, and uses shift storage

3. Mantissa part (Mantissa): mantissa part

where float The storage method is as shown in the following table:

	Total length	Mantissa part	Exponent part	Sign bit
Single precision	32bit	0-22	23-30	31
Double precision	64bit	0-51	52-62	63
Extended Double	80bit	0-63	64-78	79

At the hardware level, the floating point operation logic of the CPU is implemented on the FPU (Floating Point Operation Unit) (whether SSE or AVX). The default calculation precision of the FPU is 80bit, while the float precision output by SSE and AVX Not that high (both are 32bit). If there are differences in calculation accuracy in the FPU (provided that they are both greater than 32bit), the calculated output is truncated to 32bit and then stored in the memory, which will inevitably cause a diff in the result due to approximate truncation.

Since Intel’s underlying algorithm is confidential, we can only guess that the FPU accuracy set when implementing the optimization functions of AVX and SSE is different, but the conclusion of the accuracy difference is certain.

The truth has emerged at this time: AVX’s FMA has 1 bit more accuracy than SSE. When there are iterative calculations, the difference will accumulate. The generation of the Q value undergoes complex matrix operations, and this tiny 1-bit difference is magnified to ten thousandths of a decimal point. At the same time, Intel ensures the compatibility of various machines. MKL code will be downgraded to SSE when running on a CPU that does not support AVX.

Supplementary knowledge three: Methods of using SSE and AVX to optimize programs

Still taking the addition instruction as an example, the introduction of relevant header files and preparation of compilation instructions will not be introduced here. Please refer to Related information.

Basic version:

Simple loop to accumulate and sum.

Weird precision diff tracing_PHP tutorial

SSE optimized version

SSE register 128bit, 16 bytes, can store 4 single-precision floating point numbers at a time, and can be stored in groups of 4 Register, use the built-in addition function to sum, then add the 4 group sums, and finally add the remaining items of the group to get the final result.

Weird precision diff tracing_PHP tutorial

AVX optimized version

AVX optimization method is similar to SSE, but the AVX register uses 256bit, 32 bytes, and can store 8 single-precision floating point numbers. Each group of 8 floats needs to be stored in the register.

Weird precision diff tracing_PHP tutorial

Now randomly generate the input array and write a simple test case to verify the effect of optimization. The following is a performance comparison of the three algorithms. The unit is cumulative per second. The number of floats. As a result, SSE efficiency is increased to 4 times that of the regular version, while AVX is 8 times higher! 【4】

Weird precision diff tracing_PHP tutorial

2. Summary and Enlightenment

Problem Summary:

l During the Query-diff compatibility test, it was found that module A is new or old There is a diff in the Q value calculated by the version;

l After investigation, it is determined that the accuracy diff comes from the floating point instruction set difference (AVX/SSE) supported by the program's running environment CPU

l In this case The proportion and absolute value of diff are both small. Although it currently does not affect online services, if the algorithm is further complicated and diff accumulates to the percentile, it will cause the strategy to fail.

l If the floating-point number operations of other modules use instruction set optimization, you also need to check whether the same problem exists.

Solution:

l When allocating test resources, ensure that the CPU of the machine where the new and old environments are located is consistent;

l Add an environment check mechanism before executing query-diff, and confirm again that the hardware is intact Difference;

l When deploying services online, you also need to make sure that the machine supports the AVX instruction set to achieve optimal performance and accuracy;

l Check whether other modules have similar use of instruction set optimization to avoid risks in advance.

Inspiration and suggestions:

l Floating-point operation-intensive programs can consider using instruction set functions such as SSE/AVX to optimize performance, which can usually significantly improve operating efficiency (SSE: 4 times, AVX : 8 times);

l When using the instruction set, pay attention to controlling the number of iterations (that is, the output of the instruction set function is used as the input of the instruction set function again) to avoid accumulating precision diffs to a level that cannot be ignored;

l Query-diff testing can be applied to more compatibility testing scenarios, such as comparing the impact of underlying system and hardware differences on applications such as CPU, operating system, and basic libraries.

Software engineering is inseparable from hardware support. Differences in compilation and running environments may cause differences in service performance and final calculation results. Such issues require special attention at all stages of development, testing, and launch. It is important to be a programmer who combines software and hardware!

Reference materials:

【1】 https://software.intel.com/zh-cn/articles/whats-new-in-intel-mkl

【 2】 https://software.intel.com/zh-cn/articles/intel-xeon-processor-e7-88004800-v3-product-family-technical-overview

【3】 https:// software.intel.com/en-us/forums/topic/507004

【4】 http://www.cnblogs.com/zyl910/archive/2012/10/22/simdsumfloat.html

Baidu MTC is the industry's leading mobile application testing service platform, providing solutions to the cost, technology and efficiency issues faced by developers in mobile application testing. At the same time, industry-leading Baidu technology is shared, and the authors come from Baidu employees and industry leaders.

>>If you have any questions, please feel free to communicate with me

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

4 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks ago By DDD

Two Point Museum: All Exhibits And Where To Find Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7375

Java Tutorial

1628

CakePHP Tutorial

1355

Laravel Tutorial

1267

PHP Tutorial

1216

Related knowledge

New report delivers damning assessment of rumoured Samsung Galaxy S25, Galaxy S25 Plus and Galaxy S25 Ultra camera upgrades Sep 12, 2024 pm 12:23 PM

In recent days, Ice Universe has been steadily revealing details about the Galaxy S25 Ultra, which is widely believed to be Samsung's next flagship smartphone. Among other things, the leaker claimed that Samsung only plans to bring one camera upgrade

Samsung Galaxy S25 Ultra leaks in first render images with rumoured design changes revealed Sep 11, 2024 am 06:37 AM

OnLeaks has now partnered with Android Headlines to provide a first look at the Galaxy S25 Ultra, a few days after a failed attempt to generate upwards of $4,000 from his X (formerly Twitter) followers. For context, the render images embedded below h

$IFA 2024 | TCL\'s NXTPAPER 14 won\'t match the Galaxy Tab S10 Ultra in performance, but it nearly matches it in size$ IFA 2024 | TCL\'s NXTPAPER 14 won\'t match the Galaxy Tab S10 Ultra in performance, but it nearly matches it in size Sep 07, 2024 am 06:35 AM

Alongside announcing two new smartphones, TCL has also announced a new Android tablet called the NXTPAPER 14, and its massive screen size is one of its selling points. The NXTPAPER 14 features version 3.0 of TCL's signature brand of matte LCD panels

Vivo Y300 Pro packs 6,500 mAh battery in a slim 7.69 mm body Sep 07, 2024 am 06:39 AM

The Vivo Y300 Pro just got fully revealed, and it's one of the slimmest mid-range Android phones with a large battery. To be exact, the smartphone is only 7.69 mm thick but features a 6,500 mAh battery. This is the same capacity as the recently launc

New report delivers damning assessment of rumoured Samsung Galaxy S25, Galaxy S25 Plus and Galaxy S25 Ultra camera upgrades Sep 12, 2024 pm 12:22 PM

Samsung Galaxy S24 FE billed to launch for less than expected in four colours and two memory options Sep 12, 2024 pm 09:21 PM

Samsung has not offered any hints yet about when it will update its Fan Edition (FE) smartphone series. As it stands, the Galaxy S23 FE remains the company's most recent edition, having been presented at the start of October 2023. However, plenty of

Motorola Razr 50s shows itself as possible new budget foldable in early leak Sep 07, 2024 am 09:35 AM

Motorola has released countless devices this year, although only two of them are foldables. For context, while most of the world has received the pair as the Razr 50 and Razr 50 Ultra, Motorola offers them in North America as the Razr 2024 and Razr 2

Xiaomi Redmi Note 14 Pro Plus arrives as first Qualcomm Snapdragon 7s Gen 3 smartphone with Light Hunter 800 camera Sep 27, 2024 am 06:23 AM

The Redmi Note 14 Pro Plus is now official as a direct successor to last year'sRedmi Note 13 Pro Plus(curr. $375 on Amazon). As expected, the Redmi Note 14 Pro Plus heads up the Redmi Note 14 series alongside theRedmi Note 14and Redmi Note 14 Pro. Li

See all articles