Home Common Problem Why do double floating point operations lose precision?

Why do double floating point operations lose precision?

Jun 26, 2019 am 09:15 AM
double floating point number

Why do double floating point operations lose precision?

Preface: At work, when it comes to addition, subtraction, multiplication and division with decimal points, they will think of using BigDecimal to solve it, but many people are confused as to why double or float lose precision. And how to solve BigDecimal? Without further ado, let’s get started.

1. What is a floating point number?

Floating point numbers are a data type used by computers to represent decimals, using scientific notation. In Java, double is a double precision, 64-bit, floating point number, and the default is 0.0d. float is single precision, 32 bits. Floating point number, the default is 0.0f;

Why do double floating point operations lose precision?

Store in memory

float Sign bit (1bit) Exponent (8 bit) Mantissa (23 bit)
double Sign bit (1bit) Exponent (11 bit) Mantissa (52 bit)


The exponent of float in the memory is 8bit, because the exponent actually stores For the frameshift of the exponent, assuming that the true value of the exponent is e and the order code is E, then E=e (2^n-1 -1). Among them, 2^n-1 -1 is the exponential offset specified by the IEEE754 standard. According to this formula, we can get 2^8 -1=127. Therefore, the exponent range of float is -128 127, while the exponent range of double is -1024 1023. The negative exponent determines the non-zero number with the smallest absolute value that a floating-point number can express; while the positive exponent determines the number with the largest absolute value that a floating-point number can express, which also determines the value range of a floating-point number.


The range of float is -2^128 ~ 2^127, that is, -3.40E 38 ~ 3.40E 38;
The range of double is -2^1024 ~ 2^1023, also That is -1.79E 308 ~ 1.79E 308

2. Enter the scientific notation of distortion

Let’s talk about scientific notation first. Scientific notation is a method of simplifying counting. Use To approximately represent a very large or small number with a large number of digits, scientific notation has no advantage for values ​​with a small number of digits, but for values ​​with a large number of digits, the advantages of the counting method are very obvious. For example: the speed of light is 300000000 meters/second, and the world's population is approximately 6100000000. Large numbers like the speed of light and the world's population are inconvenient to read and write, so the speed of light can be written as 3*10^8, and the world's population can be written as 6.1*10^9. So the calculator uses scientific notation to indicate that the speed of light is 3E8, and the world's population is approximately 6.1E9.

When we were kids, we used to play with calculators and like to add or subtract like crazy. In the end, the calculator would display the picture below. This is the result displayed by scientific notation

Why do double floating point operations lose precision?

The real value in the picture is -4.86*10^11=-486000000000. Decimal scientific notation requires that the integer part of the significant digit must be within the interval [1, 9].

3. Get into the precision of distortion

When computers process data, they involve data conversion and various complex operations, such as conversion of different units and different bases. (such as binary decimal) conversion, etc., many division operations cannot be divided, such as 10÷3=3.3333...infinite, and the accuracy is limited, 3.3333333x3 is not equal to 10, the decimal obtained after complex processing The data is not precise, and the higher the precision, the more accurate it is. The accuracy of float and double is determined by the number of digits in the mantissa. The integer part is always an implicit "1". Since it is unchanged, it cannot affect the accuracy. float: 2^23 = 8388608, a total of seven digits. Since the leftmost digit is omitted, it means that it can represent up to 8 digits: 28388608 = 16777216. There are 8 significant digits, but it is absolutely guaranteed to be 7 digits, that is, the precision of float is 7~8 significant digits; double: 2^52 = 4503599627370496, a total of 16 digits, similarly, the precision of double is 16~17 Bit.

Why do double floating point operations lose precision?

When it reaches a certain value, it automatically starts using scientific notation and retains significant figures of relevant precision, so the result is an approximate number and the exponent is an integer. In the decimal system, some decimals cannot be fully expressed in binary. Therefore, it can only be represented by limited bits, so there may be errors during storage. To convert decimal decimals into binary, use the multiplication by 2 method to calculate. After removing the integer part, continue to multiply the remaining decimals by 2 until the decimal parts are all 0.

If you encounter the situation where

Why do double floating point operations lose precision?

the output is 0.19999999999999998

double type 0.3-0.1. You need to convert 0.3 into binary in the operation


0.3 * 2 = 0.6 => .0 (.6), take 0 and leave 0.6
0.6 * 2 = 1.2 => .01 (. 2) Take 1 and leave 0.2
0.2 * 2 = 0.4 => .010 (.4) Take 0 and leave 0.4
0.4 * 2 = 0.8 => .0100 (.8) Take 0 and leave 0.8
0.8 * 2 = 1.6 => .01001 (.6) takes 1 and leaves 0.6
.............

Why do double floating point operations lose precision?

3. Summary

After reading the above, it is probably clear why floating point numbers have precision problems. Simply put, the float and double types are mainly designed for scientific calculations and engineering calculations. They perform binary floating point operations, which are carefully designed to provide more accurate and fast near-sum calculations over a wide range of values. However, they do not provide completely accurate results and should not be used for precise results. Floating point numbers that reach a certain size will automatically use scientific notation. Such representation is only an approximation of the real number but not equal to the real number. Infinite loops or exceeding the length of the floating-point mantissa may also occur when converting decimal digits to binary.

4. So how do we use BigDecimal to solve it?

Look at the two outputs below

Why do double floating point operations lose precision?

##Output results:

0.29999999999999988897769753748434595763683319091796875

0.3

As shown in the picture Alibaba's code constraint plug-in has marked a warning and asked me to use the constructor method of String parameters to create BigDecimal. Because double cannot be represented exactly as 0.3 (any finite-length binary), the value passed by the constructor is not exactly equal to 0.3. When using BigDecimal, you must use the constructor method of String parameters to create it. Speaking of which, are there any curious babies who have questions, what is the principle of BigDecimal? Why is there no problem with it? In fact, the principle is very simple. BigDecimal is immutable and can be used to represent signed decimal numbers of any precision. The problem with double is because the decimal point is converted to binary and the precision is lost. During processing, BigDecimal expands the decimal number by N times so that it can be calculated on integers and retains the corresponding precision information. As for how BigDecimal is saved, you can read the source code.

For more technical articles related to frequently asked questions, please visit the

FAQ column to learn more!

The above is the detailed content of Why do double floating point operations lose precision?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to convert string to decimal in php How to convert string to decimal in php Mar 22, 2023 pm 03:22 PM

PHP is a powerful programming language that is widely used in the field of web development. One of the very common situations is the need to convert a string to a decimal. This is very useful when doing data processing. In this article, we will explain how to convert string to decimal in PHP.

PHP floating point number rounding method PHP floating point number rounding method Mar 21, 2024 am 09:21 AM

This article will explain in detail the PHP floating point number rounding method. The editor thinks it is very practical, so I share it with you as a reference. I hope you can gain something after reading this article. PHP Floating Point Rounding Overview Floating point numbers are represented in computers as a decimal point followed by an exponent, however, they are often stored in approximations with a limited number of digits. When you need to round a floating point number to a specific precision, there are several ways to do it. Method 1. round() function The round() function rounds a floating point number to the nearest integer. It accepts floating point numbers and optional precision parameters. For example: $num=1.55;echoround($num);//Output: 2echoround($num,1)

Convert floating point number to string using strconv.FormatFloat function Convert floating point number to string using strconv.FormatFloat function Jul 25, 2023 am 11:45 AM

Use the strconv.FormatFloat function to convert floating point numbers into strings. In the Go language, we often need to convert floating point numbers into string types for output or storage needs. The strconv package is provided in the Go language, and the FormatFloat function in it can convert floating point numbers into string types. The FormatFloat function takes three parameters: f represents the floating point number to be converted, fmt represents the format, and prec represents the number of decimal places to retain. Among them, the f parameter

C++ program to convert double type variable to int type C++ program to convert double type variable to int type Aug 25, 2023 pm 08:25 PM

In C++, variables of type int can only hold positive or negative integer values; they cannot hold decimal values. There are float and double values ​​available for this purpose. The double data type was created to store decimals up to seven digits after the decimal point. Conversion of an integer to a double data type can be done automatically by the compiler (called an "implicit" conversion), or it can be explicitly requested by the programmer from the compiler (called an "explicit" conversion). In the following sections, we'll cover various conversion methods. Implicit conversions The compiler performs implicit type conversions automatically. To achieve this, two variables are required - one of floating point type and the other of integer type. When we simply assign a floating point value or variable to an integer variable, the compiler takes care of all the other things

Causes and avoidance strategies of PHP floating point calculation errors Causes and avoidance strategies of PHP floating point calculation errors Feb 27, 2024 pm 06:33 PM

As a popular server-side scripting language, PHP often encounters problems of loss of precision or calculation errors when performing floating-point calculations. These problems may affect the accuracy and stability of the program. This article will explore the causes of PHP floating point calculation errors, propose some avoidance strategies, and give specific code examples for reference. 1. Reasons for PHP floating-point calculation errors. In computers, floating-point numbers are represented in binary form, and binary cannot accurately represent all decimal decimals, which leads to the inaccuracy of floating-point numbers.

An in-depth explanation of PHP BCMath: Unleashing the potential of number operations An in-depth explanation of PHP BCMath: Unleashing the potential of number operations Feb 23, 2024 am 09:10 AM

:1. Introduction to BCMath BCMath is an extension library built into PHP, which is specially used to handle large integer and floating point number operations. It provides a wealth of functions to perform various mathematical operations such as addition, subtraction, multiplication, division, square, and square root, and supports digital representation in multiple bases. 2. Advantages of BCMath Compared with the arithmetic operators and functions natively provided by PHP, BCMath mainly has the following advantages: Higher precision: BCMath’s operation results can retain more significant digits, which is useful for calculations involving large numbers. scenes are particularly important. Wider range: BCMath can handle larger numbers than PHP's native data types, thus avoiding overflow or loss of precision issues. Richer features: BCMath provides

How to convert string to float in PHP How to convert string to float in PHP Mar 27, 2024 pm 12:48 PM

Converting a string to a floating point number is a common operation in PHP and can be accomplished through built-in methods. First make sure that the string is in a legal floating point format before it can be successfully converted to a floating point number. The following will detail how to convert a string to a floating point number in PHP and provide specific code examples. 1. Use (float) cast In PHP, the simplest way to convert a string into a floating point number is to use cast. The way to force conversion is to add (float) before the string, and PHP will automatically convert it

Round floating point numbers using Math.Round function in C# Round floating point numbers using Math.Round function in C# Nov 18, 2023 pm 02:17 PM

Using the Math.Round function in C# to round floating-point numbers requires specific code examples. In the C# programming language, sometimes we need to round floating-point numbers. At this time, we can use the Math.Round function to achieve this function. The Math.Round function is a built-in function in C# used for mathematical calculations. Its main function is to round the specified floating point number. The following is the common format of the Math.Round function: Math.Round(doub