 # What Is float In C Programming

In C language programs are often written using numerical data. Numerical data in a C program is manipulated using mathematical operators. To manipulate numerical data it has to be stored in memory.

Numerical data can be categorized as follows:

1. Natural numbers: 1,2,3,4,5….
2. Whole Numbers: 0,1,2,3….
3. Integers: -2, -1, 0, 1, 2….
4. Rational Numbers expressed as ratio of two integers

To use these numbers in C language programs numerical data needs to be stored in memory. Different numerical data consume different amounts of memory.

Based on the amount of memory consumed C language has defined different data types. C language supports following data types to store numerical data:

• int
• float
• double

These data types have variations such as short, long, long long.

integer data type can store integral values. Integral values are those that do not contain decimal places but it can be negative and positive both. To do precise calculation decimal places are required.

To store non-integral values, that is numbers having decimal places, a float data type is used. Double data type is same as that of float data type, difference between double and float is that double data type gives twice the precision that float data type gives.

## float in C

float is data type in C language. Data types have specific definitions which can not be changed. In C language to signify that the number is float %f is used.

Floating point numbers can be expressed in scientific notation for example, 1.5e3 means 1.5 × 103 . 1.5e3 is read as 1.5 exponent 3, here 1.5 is mantissa, letter e is exponent and specifies that number after e is exponent and number before e is mantissa.  Exponents can be positive and negative both, thus 1.5e-3 can be written as 1.5 × 10-3 or 0.0015.

Floating point numbers consume 4-byte (32 bit) in memory. 1 bit is used for the sign, 8 bit is used for the exponent part and 23 bits are used as the significant part. To store floating numbers C language use following procedures:

Converting a floating point number into its equivalent binary number.

For example, (10.5)10   = (1010.1)2.

Normalize obtained binary numbers.

1010.1 will be normalized as 1.0101 * 23 as 3 bits are shifted to the left.

In 1.0101 * 23 , 3 is the exponent and 1.01011 is significant bit.

1. Managing negative exponents

A positive value is added to negative exponents to make it positive. Positive value to be added to negative exponent is calculated by using the following formula:

biasn = 2n-1 – 1

In floating point 8 bits are used to store exponent so the value of n is 8.

Bias8    =   28-1  – 1

=   27 – 1

=  127

Thus normalized exponent for 1.0101 * 23 will be-

=  actual exponent + bias value                                                  =  3 + 127                                  = 130

Binary equivalent of 130 will be – (10000010)2.

Since 10.5 is not a negative number, the sign bit will be 0.

Now, in order to store decimal number 10.5 as a float value in computer memory we broken it into three parts –

• Sign bit – 0
• Exponent part – (10000010)2
• Significant part –  1.0101 leaving 1 we will get 0101

Thus floating point number 10.5 will be stored in memory as given below – ## Criticals of float

### floating point number can also be represented using following equation: Apart from normalized floating point numbers, there also exists subnormal floating-point numbers and unnormalized floating point numbers.

The float.h header file defines constants associated with floating point values. These constants are implementation specific and used as #define directive. These constants are explained in Table 1 below. In Table 1. FLT is float, DBL is double and LDBL refers to long double.

## Table 1. Details of Constants used in Float.h

FLT_ROUNDS

Used for floating point addition and has these values:

• -1 is indeterminable
• 0 is approaching 0
• 1 is nearest
• 2 is approaching positive infinity
• 3 is approaching negative infinity

Specifies the base radix of the exponent.

• base -2 is binary
• base -10 is normal decimal number

FLT_MANT_DIG

DLB_MANT_DIG

LDBL_MANT_DIG

These are macros and define the number of digits in number.

FLT_DIG 6

DBL_DIG 10

LDBL_DIG 10

These define the possible number of decimal digits to be represented.

FLT_MIN_EXP

DBL_MIN_EXP

LDBL_MIN_EXP

These define the smallest negative integer value of an exponent.

FLT_MIN_10_EXP -37

DLB_MIN_10_EXP -37

LDBL_MIN_10_EXP -37

These specify the smallest negative integer value of an exponent in base 10.

FLT_MAX_EXP

DLB_MAX_EXP

LDBL_MAX_EXP

These specify the largest integer value of an exponent.

FLT_MAX_10_EXP +37

DLB_MAX_10_EXP +37

LDBL_MAX_10_EXP +37

These specify the maximum integer value of an exponent in base 10.

FLT_MAX 1E+37

DBL_MAX 1E+37

LDBL_MAX 1E+37

This specifies the largest floating point value which should be finite.

FLT_EPSILON 1E-5

DBL_EPSILON 1E-9

LDBL_EPSILON 1E-9

This macro specifies the least significant digit.

FLT_MIN 1E-37

DBl_MIN 1E-37

LDBL_MIN 1E-37

This macro specifies the smallest floating point value.

### Header file <float.h> is given below:

```/*  float.h

Defines implementation specific macros for dealing with
floating point.

Copyright (c) 1987, 1991 by Borland International
*/

#ifndef __FLOAT_H
#define __FLOAT_H

#if !defined( __DEFS_H )
#include <_defs.h>
#endif

#define FLT_ROUNDS          1
#define FLT_GUARD           1
#define FLT_NORMALIZE       1

#define DBL_DIG             15
#define FLT_DIG             6
#define LDBL_DIG            19

#define DBL_MANT_DIG        53
#define FLT_MANT_DIG        24
#define LDBL_MANT_DIG       64

#define DBL_EPSILON         2.2204460492503131E-16
#define FLT_EPSILON         1.19209290E-07F
#define LDBL_EPSILON        1.084202172485504E-19

/* smallest positive IEEE normal numbers */
#define DBL_MIN             2.2250738585072014E-308
#define FLT_MIN             1.17549435E-38F
#define LDBL_MIN            _tiny_ldble

#define DBL_MAX             _huge_dble
#define FLT_MAX             _huge_flt
#define LDBL_MAX            _huge_ldble

#define DBL_MAX_EXP         +1024
#define FLT_MAX_EXP         +128
#define LDBL_MAX_EXP        +16384

#define DBL_MAX_10_EXP      +308
#define FLT_MAX_10_EXP      +38
#define LDBL_MAX_10_EXP     +4932

#define DBL_MIN_10_EXP      -307
#define FLT_MIN_10_EXP      -37
#define LDBL_MIN_10_EXP     -4931

#define DBL_MIN_EXP         -1021
#define FLT_MIN_EXP         -125
#define LDBL_MIN_EXP        -16381

extern float        _Cdecl _huge_flt;
extern double       _Cdecl _huge_dble;
extern long double  _Cdecl _huge_ldble;
extern long double  _Cdecl _tiny_ldble;

#ifdef __cplusplus
extern "C" {
#endif
unsigned int _Cdecl _clear87(void);
unsigned int _Cdecl _control87(unsigned int __newcw, unsigned int __mask);
void         _Cdecl _fpreset(void);
unsigned int _Cdecl _status87(void);
#ifdef __cplusplus
}
#endif

#if !__STDC__

/* 8087/80287 Status Word format   */

#define SW_INVALID      0x0001  /* Invalid operation            */
#define SW_DENORMAL     0x0002  /* Denormalized operand         */
#define SW_ZERODIVIDE   0x0004  /* Zero divide                  */
#define SW_OVERFLOW     0x0008  /* Overflow                     */
#define SW_UNDERFLOW    0x0010  /* Underflow                    */
#define SW_INEXACT      0x0020  /* Precision (Inexact result)   */

/* 8087/80287 Control Word format */

#define MCW_EM              0x003f  /* interrupt Exception Masks*/
#define     EM_INVALID      0x0001  /*   invalid                */
#define     EM_DENORMAL     0x0002  /*   denormal               */
#define     EM_ZERODIVIDE   0x0004  /*   zero divide            */
#define     EM_OVERFLOW     0x0008  /*   overflow               */
#define     EM_UNDERFLOW    0x0010  /*   underflow              */
#define     EM_INEXACT      0x0020  /*   inexact (precision)    */

#define MCW_IC              0x1000  /* Infinity Control */
#define     IC_AFFINE       0x1000  /*   affine         */
#define     IC_PROJECTIVE   0x0000  /*   projective     */

#define MCW_RC          0x0c00  /* Rounding Control     */
#define     RC_CHOP     0x0c00  /*   chop               */
#define     RC_UP       0x0800  /*   up                 */
#define     RC_DOWN     0x0400  /*   down               */
#define     RC_NEAR     0x0000  /*   near               */

#define MCW_PC          0x0300  /* Precision Control    */
#define     PC_24       0x0000  /*    24 bits           */
#define     PC_53       0x0200  /*    53 bits           */
#define     PC_64       0x0300  /*    64 bits           */

/* 8087/80287 Initial Control Word */
/* use affine infinity, mask underflow and precision exceptions */

#define CW_DEFAULT  _default87
extern unsigned int _Cdecl _default87;

/*
SIGFPE signal error types (for integer & float exceptions).
*/
#define FPE_INTOVFLOW       126 /* 80x86 Interrupt on overflow  */
#define FPE_INTDIV0         127 /* 80x86 Integer divide by zero */

#define FPE_INVALID         129 /* 80x87 invalid operation      */
#define FPE_ZERODIVIDE      131 /* 80x87 divide by zero         */
#define FPE_OVERFLOW        132 /* 80x87 arithmetic overflow    */
#define FPE_UNDERFLOW       133 /* 80x87 arithmetic underflow   */
#define FPE_INEXACT         134 /* 80x87 precision loss         */
#define FPE_STACKFAULT      135 /* 80x87 stack overflow         */
#define FPE_EXPLICITGEN     140 /* When SIGFPE is raise()'d     */

/*
SIGSEGV signal error types.
*/
#define SEGV_BOUND          10  /* A BOUND violation (SIGSEGV)  */
#define SEGV_EXPLICITGEN    11  /* When SIGSEGV is raise()'d    */

/*
SIGILL signal error types.
*/
#define ILL_EXECUTION       20  /* Illegal operation exception  */
#define ILL_EXPLICITGEN     21  /* When SIGILL is raise()'d     */

#endif  /* !__STDC__ */

#endif
```

## Program to illustrate the use of float.h

```#include <stdio.h>
#include <float.h>

int main ()
{
printf("The maximum value that float can attain is  = %.10e\n", FLT_MAX);

printf("The minimum value that float can attain is = %.10e\n", FLT_MIN);

printf("The number of digits that can be in the number is = %.10e\n",FLT_MANT_DIG);
}
```
```Output:

The maximum value of float = 3.4028234664e+38
The minimum value of float = 1.1754943508e-38
The number of digits in the number = 1.1754943508e-38
```
```Code Analysis

In the above code three printf statements are used to display the value of macros - FLT_MAX, FLT_MIN, and FLT_MANT_DIG. These macros are defined in the header file
```

## Program to convert temperature from fahrenheit to celsius

```#include <stdio.h>

int main()
{
int chh;
float aa,bc,c,f;
printf("\n");
printf("1. Press 1 for Fahrenheit to Celsius conversion \n");
printf("2. Press 2 for Celsius to Fahrenheit conversion\n");

scanf("%d", &chh);
switch(chh)
{
case 1:
printf("\n Enter the temperature in Fahrenheit : ");
scanf("%f",&aa);
c = 5 * (aa-32) / 9;
printf("\n \n  Temperature in Celsius is :%f ", c);
break;
case 2:
printf("\n Enter the temperature in Celsius : ");
scanf("%f",&bc);
f = ( ( 9 * bc ) / 5 ) + 32;
printf("\n \n Temperature in Fahrenheit is: %f", f);
break;
default:
printf("\n\n This is Wrong Choice.....Try Again later!!!\n");
}
```
```Output:
1. Press 1 for Fahrenheit to Celsius conversion
2. Press 2 for Celsius to Fahrenheit conversion
2
Enter the temperature in Celsius : 97
Temperature in Fahrenheit is: 36.111111
```
```Code Analysis:

To convert temperature fahrenheit to celsius following formula is used:
f = ( ( 9 * bc ) / 5) +32

Where, bc is temperature in centigrade.

To convert temperature celsius to fahrenheit following formula is used:
c = 5 * ( aa - 32) / 9

Where, aa is temperature in fahrenheit.

In the above code, program is giving three choice to the user as follows:

Press 1 for Fahrenheit to Celsius conversion
Press 2 for Celsius to Fahrenheit conversion

Last choice is to exit.
```

## Conclusion

Float is termed as “floating point”. It is a basic data type defined in compiler grammar. Float is a data type in C language and used to store decimal point values. Maximum use of float is in computer graphics as they require accurate calculation upto a decimal place. Float has precision upto 6 decimal digits.