Table of Contents
What is float , double , long double ?
To store numbers in a computer , an algorithm must be used . The C
standard does not specify the algorithm , or the encoding to be used , for storing any kind
of numbers , be it rational such as 1/2 , integer such as 5 or irrational such
as pi .
It only specify the name of the numerical data types , such as int , or
float . Their meaning , for example int is used to store signed integer types ,
like -1 or 1 , and float is used to store approximation of real
numbers such as 1.2 or -12.5 . Their minimum range , for example the minimum
range of the int type is between -32767 and +32767 . The
algorithms to encode numbers , are specified by computer manufacturer .
The real types in C are the float ,
double , and long double . The C standard defines the model of real numbers that
must be encoded , this model is called the floating point model , and it has the following format :

Multiple algorithms exist for encoding floating points , the most commonly used one is the IEEE floating point format .
On computers , that uses the IEEE floating point format
algorithm , the float type maps to the IEEE single precision floating point , and the
double type maps to the IEEE double precision floating point . The long double
maps either to the IEEE quadruple precision floating point format , or the IEEE 80 bit floating point
format .
The ranges of the C real types , when using the IEEE floating point format is as follow .
| Floating point type | Number of bits | Min value | Max value | Closest value to 0 |
|---|---|---|---|---|
| float | 32 bits | -3.4 E+38 | +3.4 E+38 | ± 1.17549 E-38 |
| double | 64 bits | -1.79769 E+308 | +1.79769 E+308 | ±2.22507 E-308 |
| long double | 80 bits | -1.18 E+4932 | +1.18 E+4932 | ±3.36 E-4932 |
The float.h header , contain information related to
floating point implementations , such as the absolute value of the range [min , max] for each
of the floating types , and the closest value to 0 .
#include<stdio.h>
#include<float.h>
int main( void){
/*
print absolute value min,max range , each floating
type .*/
printf( "float absolute value of range : %e\n", FLT_MAX);
printf( "double absolute value of range : %e\n", DBL_MAX);
printf( "long double absolute value of range : %Le\n", LDBL_MAX);
/* print closest absolute value to 0 , for each
of the floating types .*/
printf( "closest to 0 absolute value , float : %e\n", FLT_MIN);
printf( "closest to 0 absolute value , double : %e\n", DBL_MIN);
printf( "closest to 0 absolute value , long double : %Le\n", LDBL_MIN);}
/* Output :
float absolute value of range : 3.402823e+38
double absolute value of range : 1.797693e+308
long double absolute value of range : 1.189731e+4932
closest to 0 absolute value , float : 1.175494e-38
closest to 0 absolute value , double : 2.225074e-308
closest to 0 absolute value , long double : 3.362103e-4932 */
The type in which floating point arithmetic operations are
performed , is defined in the macro FLT_EVAL_METHOD , defined in the header
float.h .
If FLT_EVAL_METHOD value is set to 2 , then arithmetic operations are performed
by promoting the operands to the long double type . If FLT_EVAL_METHOD is set to
1 , then arithmetic operations are performed by promoting the operands to
long double , if any operand is of the long double type , otherwise operands
are promoted to the double type , even if both operands are of the float type .
If FLT_EVAL_METHOD is set to 0 , then arithmetic operations are done in the type
of the widest operand . If FLT_EVAL_METHOD is set to -1 , then it is
indeterminable .
#include<stdio.h>
#include<float.h>
int main( void){
printf( "FLT_EVAL_METHOD : %d\n" , FLT_EVAL_METHOD);}
/* Output :
FLT_EVAL_METHOD : 0 */
Floating point literal
A floating point literal in C , can be written in decimal , in one of the following format :
d+.d* d*.d+ d+[.]ed+
where d is any digit between 0-9 , + means one or more ,
* means zero or more , what is between[] is optional , and e is
case insensitive , and means an exponent of the number 10 . As an example :
double x; x = 1. ; x = .1 ; x = 1.0; x = 1e1; // 10.0 x = 1.E1; // 10.0
By default the type of a floating point literal in C , is the
double type , unless suffixed with f , case insensitive , in this case it will
be of the float type , or suffixed with l , case insensitive , in this case it
will be of the long double type . As an example :
float aFloat = 1.0f ; double aDouble = 1.0 ; long double alongDouble = 1.0L ;
A floating point literal , can also be written in hexadecimal notation ,
0xh+[.]h*Pd+ 0xh*.h+Pd+
Where 0x is case insensitive , and stands for hexadecimal , h is an hexadecimal
digit between 0-F , + means one or more , what is between [] is
optional , * means zero or more , and P is case insensitive , and means
2 to the power p , and d is one or more digits between
0-9 . As an example :
double x ; x = 0xfP0; // 15.0 x = 0Xf.P0; // 15.0 x = 0xf.0P0; // 15.0 x = 0X.1P0; // 1/16 = 0.062500 x = 0x.1p1; // (1/16) * 2 = 0.125000
As with decimal floating point constant , hexadecimal floating point constant
has a default type of double . To provide the hexadecimal floating point
constant , a type of float , use the suffix f , case insensitive , and to give
it the type of long double , use the suffix l , case insensitive . As an example
:
float aFloat = 0x1P2f;// 4.0f double aDouble = 0x.1p3 ;// 0.5 long double alongDouble = 0X.3p2L ; // 0.75L