IEEE754 Floating point conversion in C and 68k assembler

Author: Andre Adrian
Date: 2025-01-25

Introduction

The very early computers worked with integer numbers and used a "fixed point" interpretation of the bit pattern. No bit set was the real number 0.000.. . All bit set was the real number 0.999.. . Today, a typical Digital Signal Processor (DSP) uses fixed point calculations. Many calculations need very small and very big numbers. One solution is to increase the number of digits in the fixed number. This is important in finance. If you are a multi billionaire, you still want a cent accurate account balance. Scientific computation uses "floating point". A larger part of a floating point bit pattern is used as "fixed point" mantissa (or fraction). A smaller part is used as exponent. The well known 32bit IEEE754 floating point number uses 24 bits for mantissa with mantissa sign and 8 bits for exponent with exponent sign.

A computer with floating point (FP) hardware can directly add, subtract, multiply IEEE754 numbers. The traditional approach for a computer without FP hardware was using one "fixed point" binary number for mantissa with mantissa sign and another integer binary number for exponent with exponent sign. The computer is totally happy with these binary, or radix 2, numbers. But, input and output needs to be done in the decimal number, or radix 10, system. The conversion of floating point numbers between radix 2 and radix 10 is done in software, even if the computer has a hardware floating point unit.

The details of FP radix conversion are tricky. The range 0 to 0.999.. for mantissa has no bit pattern for 1. The range 1 to 1.999.. has no bit pattern for 0. The IEEE754 people (the standards body) decided that a special bit pattern for 0 is better then a special bit pattern for 1. Let's convert the radix 10 FP number 1e2 to a radix 2 number. 1e2 is short notation for 10 to the power of 2 or 10^2. 10^2 is 100. We can not express the radix 10 exponent 10^2 as an radix 2 exponent integer number. The next radix 2 exponent is 2^6 or 64. Now we have the following equation: 10^2 = x*2^6. To calculate x, we divide the two exponents: 10^2 / 2^6 = 100 / 64 = 1.5625. The radix 2 representation of 10^2 is 1.5625*2^6.

If the mantissa is 1.5, the equation becomes 1.5*10^2 = (1.5*x)*2^6. Value x is again the division result of 10^2 / 2^6. We just have to multiply the radix 10 mantissa by the "mantissa multiplier" to get the radix 2 mantissa: 1.5*1.5625=2.34375. Now our mantissa is out of the range 1 to 1.999.. . We avoid this problem with radix 2 exponent 2^7 instead of 2^6. 1.5*10^2 = (1.5*x)*2^7. Exponent division result is 10^2 / 2^7 = 0.78125. Radix 2 mantissa is 1.5*0.78125=1.171875. The radix 2 representation of 1.5*10^2 is 1.171875*2^7. Please check with your calculator.

My floating point conversion or radix conversion subroutines use an exponent tabel (array) and a mantissa multiplier tabel, the mantissa multiplication and some specialties. For conversion from radix 10 to radix 2, I use the radix 10 exponent as index into the exponent tabel and mantissa tabel. Lets convert radix 10 FP number -1.3*10^-3. For radix 10 exponent -3 I get radix 2 exponent -10 and radix 2 mantissa multiplier 1.024. With radix 10 mantissa -1.3 the mantissa conversion calculation for radix 2 is -1.3*1.024=-1.3312. The radix 2 exponent is a look-up. Together we get -1.3*10^-3=-1.3312*2^-10. Check again.

One specialty of radix conversion is mantissa underrun or overrun. Underrun is mantissa less then 1, overrun is mantissa larger then 1.999.. . We got mantissa overrun with radix 10 FP number 1.5*10^2 and radix 2 exponent 2^6 and radix 2 mantissa multipilier 1.5625: 1.5*10^2 = 2.34375*2^6. To repair overrun, we divide the mantissa by 2 and add one to the exponent: 2.34375*2^6 = 1.171875*2^7. To repair underrun, we multiply the mantissa by 2 and subtract one from the exponent. Divide by 2 and multiply by 2 of binary numbers is done by shifting, a simple operating for a computer. "Repair the mantissa" is often called "normalize the mantissa".

Traditional Microprocessor approach

The book "Microprocessor programming for computer hobbyists" by Neill Graham from 1977 describes the traditional microprocessor approach for radix conversion. The traditional approach uses multiple multiplication or division instead of mantissa multiplication table and exponent constant table. I wrote a web page about Traditional floating point conversion.

Development Environment

I use Code::Blocks as C language IDE on MS-Windows. Code::Blocks uses gcc as C compiler. EASy68K is my 68000 assembler/simulator IDE. My "real" 68000 computer is the 68k-MBC from Just4Fun. This is a Single board computer (SBC) with minimum 3 IC: the CPU 68008, a SRAM and a 40 pin PIC microcontroller as GLUE chip. Connection to the Host computer, my PC, is via an USB to UART (RS232) converter board (e.g. CP2102). Here is an introduction in C programming on the 68k-MBC.

Picture: My 68k-MBC. Left module is microSD, right module is USB to UART. This is full system with 1MByte RAM and CP/M 68K from microSD card without GPIO chip.

Test program

The "proof of the pudding" of every piece of software is a test program. The EASy68K simulator simulates a BIOS, too. There are BIOS calls for ASCII string print, hexadecimal number print and more. This is the output of the test program:

The first five rows are output of asc2int and int2asc testing. The first column is the input ASCII string to the asc2int subroutine. The second column is the output of asc2int using the BIOS "print signed decimal number" subroutine. The asc2int output is also the input to the int2asc subroutine. The third column is the output of int2asc, again an ASCII string.
The other rows are output of asc2fp and fp2asc testing. First column is input to asc2fp. Second column is output of asc2fp as IEEE754 number in hexadecimal by the BIOS. Third column is output of fp2asc as an ASCII string.
Note: 7 digits of radix 10 can represent 10 million different numbers. 23 bits of radix 2 can only represent 8,388,608 different numbers. Therefore not every 7 digit radix 10 mantissa can be represented as radix 2 mantissa. You see this for example with number 9.876543e21

*test asc2int, int2asc lea cascint, A2 .loop: move.b (A2), D0 beq.s .endl move.l A2, A1 * print input data ASCII string moveq #14, D0 * Display NULL terminated string (A1) trap #15 move.l A2, A0 * convert ASCII to floating point bsr.w asc2int move.w D0, D7 * save result move.w D0, D1 * print floating point as hex number ext.l D1 moveq #8, D2 moveq #20, D0 * Display signed number in D1.L, width in D2.B trap #15 moveq #' ', D1 moveq #6, D0 * Display single ASCII char (D1.B) trap #15 move.w D7, D0 * restore lea buf, A0 * convert floating point to ASCII bsr.w int2asc lea buf, A1 * print output data ASCII string moveq #13, D0 * Display NULL terminated string (A1) with CR, LF trap #15 addq.w #8, A2 bra.s .loop .endl: * test asc2fp, fp2asc lea cascfp, A2 .loop2: move.b (A2), D0 beq.s .endl2 move.l A2, A1 * print input data ASCII string moveq #14, D0 * Display NULL terminated string (A1) trap #15 move.l A2, A0 * convert ASCII to floating point bsr.w asc2fp move.l D6, D1 * print floating point as hex number moveq #16, D2 moveq #15, D0 * Display unsigned number in D1.L in number base of D2.B trap #15 moveq #' ', D1 moveq #6, D0 * Display single ASCII char (D1.B) trap #15 lea buf, A0 * convert floating point to ASCII bsr.w fp2asc lea buf, A1 * print output data ASCII string moveq #13, D0 * Display NULL terminated string (A1) with CR, LF trap #15 add.w #16, A2 bra.s .loop2 .endl2: moveq #9, D0 * Terminate the program, halt simulator trap #15 SIMHALT ; halt simulator cascint: dc.b '-32768 ',0 dc.b '32767 ',0 dc.b '-1 ',0 dc.b '1 ',0 dc.b '0 ',0 dc.b 0 * last entry ds.w 0 * align to even address cascfp: dc.b '-9.876543e-21 ',0 dc.b '9.876543e-21 ',0 dc.b '-9.876543e21 ',0 dc.b '9.876543e21 ',0 dc.b '1.000001e0 ',0 dc.b '1.000000e0 ',0 dc.b '9.999999e-1 ',0 dc.b '9.999998e-1 ',0 dc.b '3.000003e-1 ',0 dc.b '3.e-1 ',0 dc.b '-9.999999e32 ',0 dc.b '9.999999e32 ',0 dc.b '1e32 ',0 dc.b '-9.999999e-31 ',0 dc.b '9.999999e-31 ',0 dc.b '1e-31 ',0 dc.b '-1e-1 ',0 dc.b '1e-1 ',0 dc.b '-1e1 ',0 dc.b '1e1 ',0 dc.b '9.999999 ',0 dc.b '9 ',0 dc.b '-1 ',0 dc.b '1 ',0 dc.b '-0 ',0 dc.b '0 ',0 dc.b 0 * last entry ds.w 0 * align to even address buf: ds.b 16

Floating Point Bit pattern

IEEE 754 single precision:

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 +-+---------------+---------------------------------------------+ |S| bias 127 exp. | positive mantissa, MSB = 2^0, hidden bit | +-+---------------+---------------------------------------------+
Mantissa range: 1 to 1.999..., mantissa has hidden leading bit

Motorola MC68343 FLOATING POINT FIRMWARE:

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 +-----------------------------------------------+-+-------------+ | positive mantissa, MSB = 2^0 |S| bias 65 exp.| +-----------------------------------------------+-+-------------+
Mantissa range: 1 to 1.999...

The source code

I designed my FP conversion source code in the programming language C. Later I translated this source code by hand from C to Motorola 68000 assembler. The hand translation was iterative: I changed the C source code to make good use of the 68000 opcodes.

Subroutine asc2int

This subroutine is used in subroutine asc2fp. It is an universal ASCII to signed integer conversion. The string contains the ASCII representation of a radix 10 integer number in the range -32768 to 32767, that is 16bit signed integer. The algorithm is the traditional "multiply preliminary result by 10 and add the next digit" idea. Multiply by 10 is done with shift and addition: First, the integer is shifted one bit, that is multiply by two. Second a copy of the integer is created. The copy is shifted by 2 bits or multiplied by 4. Third the integer*2 and the integer*2*4 numbers are added. This multiply by 10 through shift and add is faster then the MULU multiply opcode of the 68000.

The 68000 has a powerful "get pointer value and increment pointer" opcode. The "move.b (A0)+, D2" opcode is 16bits long. This is good code density. The opcodes that need an 8bit or a 16bit constant like "cmp.b #'-', D2" are 32bits long. In average, the Motorola 68000 and Intel 8086 have the same code density. The 68000 processor has 8 32bit data registers, the 8086 has only 4 16bit data registers.

*// converts ASCII string to integer [-32768 .. 32767] asc2int: *int16_t asc2int(char* A0buf) * ===================================================================== * in: A0=pointer to char * out: D0=signed 16bit integer * uses D0 to D3, A0 *{ clr.w D0 * uint16_t D0v = 0; clr.b D1 * unsigned char D1sgn = 0; move.b (A0)+, D2 * unsigned char D2la = *A0buf++; // la = look ahead cmp.b #'-', D2 * if ('-' == D2la) { bne.s .endi moveq #1, D1 * D1sgn = 1; move.b (A0)+, D2 * D2la = *A0buf++; .endi: * } .while: * while (D2la >= '0' && D2la <= '9') { cmp.b #'0', D2 bcs.s .endw cmp.b #'9', D2 bhi.s .endw add.w D0, D0 * D0v <<= 1; // do D0v *= 10; with shifts move.w D0, D3 * uint16_t D3v = D0v << 2; asl.w #2, D3 add.w D3, D0 * D0v += D3v; sub.b #'0', D2 * D0v += (D2la - '0'); ext.w D2 add.w D2, D0 move.b (A0)+, D2 * D2la = *A0buf++; bra.s .while .endw: * } * int16_t D0w = (int16_t)D0v; // D0v and D0w can be a union and.b D1, D1 * if (D1sgn) { beq.s .endi2 neg.w D0 * D0w = 0 - D0w; .endi2: * } * return D0w; rts *}

Subroutine int2asc

This subroutine is used in subroutine fp2asc. It is an universal signed integer to ASCII conversion for the exponent. The IEEE754 allows a radix 10 exponent from -38 to +38. My implementation only allows radix 10 exponent from -31 to +32. This reduces the conversion constant tables a little. Every tabel is maximum 256 bytes long. The C language does not know about carry flag, but the CPU has a carry flag. Therefore I wrote C functions like "sub16()" to simulate the carry flag in C. This was part of the iterations between C and assembler to get fast and correct implementation. The traditional integer to ASCII algorithm uses divide to build the output ASCII string from last digit to first digit. My subroutine uses the tabel approach: tabel (array) c0 has the digit values 10000, 1000, 100, ... as binary numbers. Instead of division, multiple subtraction is used. The same tabel approach is used in the subroutine fp2asc for the mantissa. The variable D1flag or register D0 and an if() statement implement suppressing of leading zeros.

*// converts integer [-32768 .. 32767] to ASCII string int2asc: *void int2asc(char* A0buf, int16_t D0e) * ===================================================================== * in: D0=signed integer [-32768 .. 32767], A0=pointer to char * out: nothing * uses D0 to D4, A0, A1 *{ and.w D0, D0 * if (D0e < 0) { bpl.s .endi move.b #'-', (A0)+ * *A0buf++ = '-'; neg.w D0 * D0e = 0 - D0e; .endi: * } * uint16_t D0n = (uint16_t)D0e; lea c0, A1 * uint16_t* A1c0ptr = c0; clr.b D1 * uint8_t D1flag = 0; moveq #4, D2 * int8_t D2i = 4; .do: * do { moveq #'0', D3 * uint8_t D3digit = '0'; move.w (A1)+, D4 * uint16_t D4c0val = *A1c0ptr++; * // Assembler style unsigned division 16bit/16bit, result is D3digit .while: sub.w D4, D0 * while (0 == sub16(D0n, D4c0val)) { bcs.s .endw addq.b #1, D3 * ++D3digit; bra.s .while .endw: * } add.w D4, D0 * D0n += D4c0val; // make result positive again cmp.b #'0', D3 * if ('0' == D3digit && 0 == D1flag && D2i > 0) { bne.s .endi2 and.b D1, D1 bne.s .endi2 and.b D2, D2 ble.s .endi2 bra.s .dol * continue; // suppress leading 0 .endi2: * } moveq #1, D1 * D1flag = 1; move.b D3, (A0)+ * *A0buf++ = D3digit; .dol: dbra D2, .do * } while (--D2i > -1); clr.b (A0) * *A0buf = '\0'; rts *}

Subroutine fp2regs

The 32bit IEEE754 FP number starts with one bit for mantissa sign (1 is negative mantissa), 8 bits for the exponent and 23 bits for the mantissa. The exponent is in bias 127 coding. The mantissa coding uses "hidden leading bit". The mantissa is always normalized in the range 1 to 1.999.. . The leading bit is always 1 and not in the 32bit IEEE754 number. The subroutine splits the 32bit IEEE754 number into three parts: mantissa with leading bit in register D6, two's complement exponent in register D5 and mantissa sign in register D4. A register is a variable that is not in the main memory, but in the CPU. Register operations are faster then memory operations.
The traditional approach is to have a signed mantissa in one register. This approach has pros and cons. Floating point add and subtract is easy with a two's complement mantissa. But signed multiply and signed divide are slower then their unsigned counterparts. Furthermore, the sign bit in signed mantissa looses one bit of accuracy. Last but not least, shift as replacement for divide works with unsigned numbers, but not with signed numbers: -1/2 = 0 but -1 shift right is 0xFFFFFFFF if you use "arithmetic shift right", 32bit numbers and two's complement.
A "middle" ground is to use one's complement: the number is unsigned, the sign bit is just "tacked" on. Before multiply and divide you change one's complement into unsigned number and separate sign. Before add and subtract you change one's complement into two's complement, after you change it back to one's complement.

* IEEE754 32bit to 3 registers fp2regs: * ===================================================================== * in: D6=IEEE754 * out: D6=mantissa, D5=exponent, D4=sign move.l D6, D5 moveq #0, D4 and.l #$0007FFFFF, D6 * mask mantissa bset.l #23, D6 * set leading mantissa bit and.l #$0FF800000, D5 * mask sign, exp add.l D5, D5 * shift sign bit to X flag addx.b D4, D4 * shift X flag to bit 0 rol.l #8, D5 * rotate top 8 bits to bottom 8 bits sub.b #127, D5 * change bias 127 to two's complement rts

Subroutine regs2fp

This subroutine combines the three parts mantissa, exponent and mantissa sign into one 32bit IEEE754 FP number.
* 3 registers to IEEE754 32bit regs2fp: * ===================================================================== * in: D6=mantissa, D5=exponent, D4=sign * out: D6=IEEE754 * uses D4 to D6 bclr.l #23, D6 * clear leading mantissa bit add.b #127, D5 * change two's complement to bias 127 ror.l #8, D5 * rotate bottom 8 bits to top 8 bits lsr.b #1,D4 * shift sign bit to X flag roxr.l #1, D5 * rotate X flag to bit 31 or.l D5, D6 * combine sign, exp with mantissa rts

Subroutine qumul32

The Q notation is helpful for fixed point numbers. A Q 5.27 number has 5 bits for the integer part and 27 bits for the fraction part. We can express Radix 10 numbers from 0 to 31.999.. with Q 5.27. The maximum mantissa value is 9.999... and the maximum mantissa multiplier is 3.094850. The multiplication for Q numbers is like the integer multiplication. Two 32bit values result in a 64bit multiplication result. After the integer multiplication, the result is normalized (shifted) to a Q 5.27 result. The constant Ibits is 5, the constant Fbits is 27.

* unsigned 5.27 Q mul 32bit = 32bit * 32bit; D6 = D6 * D3 qumul32: * ===================================================================== * in: D6=unsigned Q 5.27, D3=unsigned Q 5.27 * out: D6=unsigned Q 5.27 * uses D2, D3, D6, D7 * See Logan, O'Hara; The complete Timex TS1000 & Sinclair ZX81 ROM Disassembly; Part B, page 41 moveq #0, D7 moveq #32, D2 bra.s .start .loop: bcc.s .noadd add.l D6, D7 * HLHL += DEDE .noadd: roxr.l #1, D7 * RR HLHL .start: roxr.l #1, D3 * RR BCBC dbra D2, .loop * result D7=high 32bit, D6= low 32bit * D6frac = D76facc >> Fbits; exg d7, d6 * or swap D7, D6 and D6frac = D76facc << Ibits; moveq #Ibits-1, D2 .loop2: add.l D7, D7 addx.l D6, D6 dbra D2, .loop2 * result D6 rts

Subroutine asc2fp

The first "tricky" subroutine. The tabel c1 or constant values array c1 contains the Q 5.27 binary values of the radix 10 digits, that is the binary value of 1.0, 0.1, 0.01, .. . As every radix 10 digit is counted down to zero, the radix 2 mantissa is counted up. Again, doing multiplication "by hand" is faster then using the 68000 MULU opcode. This opcode can only multiply 16bit numbers, but the mantissa is 32bit. If the radix 10 mantissa ASCII string is 2 or greater, the radix 2 mantissa is not normalized. The "magic constants" Imask1 and Imask2 help to shift the mantissa right or left until normalized. Having a not-normalized mantissa of maximum value 9.999... is the reason for the Q 5.27 fixed point format.

*// convert ASCII string -?[0-9](.[0-9]*(e-?[0-9]+)?)? to fp *// RE: ?= 0 to 1 repetition, *=0 to N repetition, +=1 to N repetition *// () block asc2fp: *FPU asc2fp(char* A0buf) * ===================================================================== * in: A0=pointer to char * out: D6=IEEE754 floating point * uses D0 to D7, A0, A1 *{ * // input mantissa part -?[1-9].[0-9]* move.b (A0)+, D1 * unsigned char D1la = *A0buf++; // la = look ahead clr.b D4 * unsigned char D4sgn = 0; cmp.b #'-', D1 * if ('-' == D1la) { bne.s .endi moveq #1, D4 * D4sgn = 1; move.b (A0)+, D1 * D1la = *A0buf++; .endi: * } lea c1+4, A1 * unsigned long *A1c1ptr = &c1[1]; moveq #0, D6 * unsigned long D6frac = 0; moveq #7, D3 * int8_t D3i = sizeof c1 / sizeof c1[0] - 2; .do: * do { move.l (A1)+, D2 * unsigned long D2c1val = *A1c1ptr++; cmp.b #'.', D1 * if ('.' == D1la) { bne.s .endi2 move.b (A0)+, D1 * D1la = *A0buf++; .endi2: * } cmp.b #'0', D1 * if (D1la < '0') { bcs.s .endw * break; * } cmp.b #'9', D1 * if (D1la > '9') { bhi.s .endw * break; * } sub.b #'0', D1 * int8_t D1digit = D1la - '0'; .while2: and.b D1, D1 * while (D1digit != 0) { beq.s .endw2 add.l D2, D6 * add32(D6frac, D2c1val); subq.b #1, D1 * --D1digit; bra.s .while2 .endw2: * } move.b (A0)+, D1 * D1la = *A0buf++; dbra D3, .do * } while (--D3i > -1); .endw: * // input exponent part e-?[0-9]+ clr.w D0 * int16_t D0exp10 = 0; cmp.b #'e', D1 * if ('e' == D1la) { bne.s .endi3 jsr asc2int * D0exp10 = asc2int(A0buf); .endi3: * } * FPU f; and.l D6, D6 * if (0 == D6frac && 0 == D0exp10) { // zero 0, 0., 0.e0 bne.s .endi4 and.w D0, D0 bne.s .endi4 move.b #-Ebias, D5 * f.e = -Ebias; * f.f = D6frac; * f.s = D4sgn; bra.s .ret * return f; .endi4: * } sub.w #Emin10, D0 * uint8_t D0ndx = D0exp10 - Emin10; lea c3, A0 * signed char D5exp = c3[D0ndx]; move.b 0(A0, D0), D5 lea c2, A0 * unsigned long D3c2val = c2[D0ndx]; asl.w #2, D0 move.l 0(A0, D0), D3 * unsigned long long D76facc = D6frac; jsr qumul32 * qumul32(D76facc, D3c2val); add.l #Rvalue, D6 * D6frac += Rvalue; * // normalize .while3: move.l D6, D0 * while (D6frac & Imask1) { // mantissa to large and.l #Imask1, D0 beq.s .endw3 lsr.l #1, D6 * D6frac >>= 1; addq.b #1, D5 * ++D5exp; bra.s .while3 .endw3: * } .while4: move.l D6, D0 * while (0 == (D6frac & Imask2)) { // mantissa to small and.l #Imask2, D0 bne.s .endw4 add.l D6, D6 * D6frac <<= 1; subq.b #1, D5 * --D5exp; bra.s .while4 .endw4: * } lsr.l #Rbits, D6 * D6frac >>= Rbits; // remove "rounding" bits .ret: jmp regs2fp * f.f = D6frac; * f.e = D5exp; * f.s = D4sgn; * return f; *}

Subroutine fp2asc

The second "tricky" subroutine. This time, the mantissa multiplication can give radix 10 mantissa underrun or overrun. Underrun is an ASCII string starting with 00.xxxx, overrun is an ASCII string starting with Xx.xxxx where X is a digit between 1 and 9. The subroutine performs a mantissa multiplication by suppressing leading zero, a mantissa division by moving the decimal point. The radix 10 exponent gets decreased or increased accordingly.
Another trick is in the c4 and c5 tables. Only every fourth radix 2 exponent value has an entry to reduce tables size. I call this "pseudo radix 16". Before mantissa multiplication, mantissa and exponent are shifted to "pseudo radix 16" exponents. The "pseudo radix 16" is another reason for Q 5.27 size of the mantissa.

*// convert floating point to ASCII string fp2asc: *void fp2asc(char* A0buf, FPU D6f) * ===================================================================== * in: A0=pointer to char, D6=IEEE754 floating point * out: nothing * uses D0 to D7, A0, A1 *{ jsr fp2regs * unsigned long D6frac = f.f; * signed char D5exp = f.e; * unsigned char D4sgn = f.s; and.b D4, D4 * if (D4sgn) { // 1 beq.s .endi move.b #'-', (A0)+ * *A0buf++ = '-'; .endi: * } // 1 cmp.b #-127, D5 * if (-127 == D5exp && 0x800000 == D6frac) { // 2 bne.s .endi2 cmp.l #$0800000, D6 bne.s .endi2 move.b #'0', (A0)+ * *A0buf++ = '0'; // output FP zero clr.b (A0) * *A0buf = '\0'; rts * return; .endi2: * } // 2 .while: * while ((unsigned char)D5exp & 3) { // change radix 2 exponent to pseudo radix 16 move.b D5, D4 * don't destroy D5 and.b #3, D4 beq.s .endw add.l D6, D6 * D6frac <<= 1; sub.b #1, D5 * --D5exp; bra.s .while .endw: * } asl.l #Rbits, D6 * D6frac <<= Rbits; // add "rounding" bits sub.b #Emin2, D5 * D5ndx = (D5exp - Emin2) >> 2; // get index of radix 2 to radix 10 constants lea c4, A1 * unsigned long D3cfrac = c4[D5ndx]; // get radix 2 to radix 10 mantissa multiplier move.l 0(A1, D5), D3 lsr.b #2, D5 * int8_t D0cexp = c5[D5ndx]; // get radix 10 exponent lea c5, A1 move.b 0(A1, D5), D0 * unsigned long long D76facc = D6frac; jsr qumul32 * qumul32(D76facc, D3cfrac); // change mantissa from radix 2 to radix 10 add.l #Rvalue2, D6 * D6frac += Rvalue2; * // output mantissa part moveq #Digits, D2 * int8_t D2digits = Digits; clr.b D1 * uint8_t D1flag = 0; lea c1, A1 * unsigned long* A1c1p = c1; moveq #-1, D4 * for (int8_t D4i = -1; D4i < D2digits; ++D4i) { // start at -1 because "real" output is 0X.XXXXXX .for: cmp.b D2, D4 bge.s .endf move.b #'0', D3 * int8_t D3digit = '0'; move.l (A1)+, D5 * unsigned long D5c1val = *A1c1p++; .while2: sub.l D5, D6 * while (0 == sub32(D6frac, D5c1val) && D3digit < '9') { // fix 5.04:0000e-1 output bcs.s .endw2 cmp.b #'9', D3 bcc.s .endw2 addq.b #1, D3 * ++D3digit; bra.s .while2 .endw2: * } add.l D5, D6 * add32(D6frac, D5c1val); // make result positive again * // without D1flag, output can be 00.Xxxxxxx, 0X.xxxxxx or Xx.xxxxx * // pseudo mul10, div10 by suppress leading zero or move dot. Correct radix 10 exponent move.b D3, (A0) * *A0buf = D3digit; and.b D1, D1 * if (0 == D1flag) { // 3 bne.s .endi3 cmp.b #'0', D3 * if (D3digit != '0') { // 4 beq.s .endi4 moveq #1, D1 * D1flag = 1; addq.w #1, A0 * ++A0buf; move.b #'.', (A0)+ * *A0buf++ = '.'; // output dot after first non zero D3digit sub.b D4, D0 * D0cexp -= D4i; // D4i=-1 pseudo div10, D4i=1 pseudo mul10 add.b D4, D2 * D2digits += D4i; // D4i=-1 one mantissa digit less, D4i=1 one mantissa digit more .endi4: * } // 4 bra.s .forl * continue; .endi3: * } // 3 addq.w #1, A0 * ++A0buf; .forl: addq.b #1, D4 bra.s .for .endf: * } and.b D0, D0 * if (D0c5val) { // 5 beq.s .endi5 * // output exponent part move.b #'e', (A0)+ * *A0buf++ = 'e'; ext.w D0 jmp int2asc * int2asc(A0buf, D0c5val); * return; .endi5: * } // 5 .ret: clr.b (A0) * *A0buf = '\0'; rts *}

Structured Programming and never nester

My C source code uses "break", "continue" and "multiple return". Some say, this is no more structured programming, specially multiple return is evil. I say, as long as every block has one entry and one exit, it is structured. I agree, break, continue and multiple return are C language features that are seldom used. But they are useful, specially if you are a "never nester" programmer. By the way, there is no "else" in a never nester programmers source code.

Little assembler tricks

The ASP68K PROJECT collected assembler tricks to save program space or execution time. If a "jump subroutine" is directly followed by a "return subroutine", you can replace the two opcodes with a "jump":

jsr     int2asc             jmp     int2asc rts

You can save space by replacing "jsr" with "bra.w" and "jmp" by "bra.w", if the target is in 16bit signed distance:
Note: the EASy68K assembler does this itself, if the jsr or jmp goes to lower address.

jsr     asc2int             bsr.w     asc2int jmp     regs2fp             bra.w     regs2fp

If only one bit is to set or clear, the "bset" or "bclr" opcodes are shorter then "or.l" or "and.l" opcodes:

or.l    #$000800000, D6     bset.l #23, D6 and.l   #$0007FFFFF, D6     bclr.l #23, D6

If the immediate value is 16bit signed, "add.w #16, A2" or "lea 16(A2), A2" opcode is shorter then "add.l   #16, A2":

add.l   #16, A2             add.w   #16, A2                             lea     16(A2), A2

The "add Dx, Dx" opcode executes faster then the "asl #1, Dx" opcode:

asl.w   #1, D0              add.w   D0, D0 asl.l   #1, D5              add.l   D5, D5 asl.l   #1, D7              add.l   D7, D7

The "addx Dx, Dx" opcode executes faster then the "roxl #1, Dx" opcode:

roxl.b #1, D4              addx.b D4, D4 roxl.l #1, D6              addx.l D6, D6

The "addq.w #n, Ax" opcode executes faster then the "addq.l #n, Ax" opcode:

addq.l #8, A2              addq.w #8, A2 addq.l #1, A0              addq.w #1, A0

The "moveq #0, Dx" opcode executes faster then the "clr.l Dx" opcode:

clr.l   D4                  moveq   #0, D4 clr.l   D7                  moveq   #0, D7

These assembler tricks saved 8 bytes program size and 2 or 4 clock cycles per faster opcode on a 68000. A 68000 NOP needs 4 clock cycles.

Data Structures

"Algorithms + Data Structures = Programs" was Niklaus Wirth's mantra. The data structures for the FP conversion subroutines are primitive. We have the tables (arrays) c0 to c5. The exponent value plus some offset (Emin10, Emin2) is used as index in these tables. Only "FORTRAN" style data structures, simple and fast!

c0 Integer constants 10000, 1000, ... for integer conversion
c1 Q constants 10.0, 1.0, ... 0.0000001 for mantissa conversion
c2 Q mantissa constants to convert exponent 10^x to exponent 2^y, values 2^Fbits * 10^x / 2^y
c3 integer exponent constants, radix 10 to radix 2, e.g. 10^0 converts to 2^0 .. 2^3
c4 Q mantissa constants to convert exponent 2^x to exponent 10^y, values 2^Fbits * 2^x / 10^y
c5 integer exponent constants, radix 2 to radix 10, e.g. 2^3 converts to 10^0 .. 10^1

The "equ" constants do not use memory, the "dc" constants do.

*enum { Digits equ 7 * Digits = 7, Ibits equ 5 * Ibits = 5, Emin10 equ -31 * Emin10 = -31, Emin2 equ -108 * Emin2 = -108, *}; *enum { Ebias equ 127 * Ebias = 127, // Exponent bias Leadbit equ $0800000 * Leadbit = 0x800000, // implicit/explicit leading bit Fbits equ 32-Ibits * Fbits = 32 - Ibits, // proper mantissa bits Rbits equ Fbits-23 * Rbits = Fbits - 23, // rounding bits Rvalue equ $086/2 * c1[7]/2-1 Rvalue2 equ $0D/2 * c1[8]/2 Imask1 equ $0F0000000 Imask2 equ $0F8000000 Smax equ 14 * Smax = 14, // max DCM String len *}; c1: *unsigned long c1[] = { dc.l $050000000, $08000000, $0CCCCCD, $0147AE1, $020C4A, $0346E dc.l $053E, $086, $0D *}; c2: *unsigned long c2[] = { dc.l $081CEB33, $0A2425FF, $065697C0, $07EC3DB0, $09E74D1B, $06309031 dc.l $07BCB43D, $09ABE14D, $060B6CD0, $078E4804, $0971DA05, $05E72843 dc.l $0760F254, $09392EE9, $05C3BD52, $0734ACA6, $0901D7CF, $0B424DC3 dc.l $0709709A, $08CBCCC1, $0AFEBFF1, $06DF37F6, $089705F4, $0ABCC771 dc.l $06B5FCA7, $08637BD0, $0A7C5AC4, $068DB8BB, $083126E9, $0A3D70A4 dc.l $06666666, $08000000, $0A000000, $06400000, $07D00000, $09C40000 dc.l $061A8000, $07A12000, $09896800, $05F5E100, $07735940, $09502F90 dc.l $05D21DBA, $0746A529, $09184E73, $05AF3108, $071AFD4A, $08E1BC9C dc.l $0B1A2BC3, $06F05B5A, $08AC7230, $0AD78EBC, $06C6B936, $08786783 dc.l $0A968164, $069E10DE, $08459516, $0A56FA5C, $06765C79, $0813F398 dc.l $0A18F07D, $064F964E, $07E37BE2, $09DC5ADB *}; c4: *unsigned long c4[] = { dc.l $018A6E322, $03F1BDF1, $064F964E, $0A18F07D, $01027E72F, $0295BE97 dc.l $0422CA8B, $069E10DE, $0A968164, $010F0CF06, $02B5E3AF, $04563918 dc.l $06F05B5A, $0B1A2BC3, $011C37938, $02D79884, $048C2739, $0746A528 dc.l $0BA43B74, $012A05F20, $02FAF080, $04C4B400, $07A12000, $0C350000 dc.l $013880000, $03200000, $05000000, $08000000, $0CCCCCCD, $0147AE148 dc.l $0346DC5D, $053E2D62, $08637BD0, $0D6BF94D, $015798EE2, $036F9BFB dc.l $057F5FF8, $08CBCCC1, $0E12E134, $016849B87, $039A5653, $05C3BD52 dc.l $09392EE9, $0EC1E4A8, $0179CA10D, $03C72402, $060B6CD0, $09ABE14D dc.l $0F79687B, $018C240C5, $03F61ED8, $065697C0, $0A2425FF, $01039D666 dc.l $02989D2F, $042761E5 *}; c0: *uint16_t c0[] = { dc.w 10000, 1000, 100, 10, 1 *}; c3: *signed char c3[] = { dc.b -103, -100, -96, -93, -90, -86, -83, -80, -76, -73, -70, -66 dc.b -63, -60, -56, -53, -50, -47, -43, -40, -37, -33, -30, -27 dc.b -23, -20, -17, -13, -10, -7, -3, 0, 3, 7, 10, 13 dc.b 17, 20, 23, 27, 30, 33, 37, 40, 43, 47, 50, 53 dc.b 56, 60, 63, 66, 70, 73, 76, 80, 83, 86, 90, 93 dc.b 96, 100, 103, 106 *}; c5: *signed char c5[] = { dc.b -33, -31, -30, -29, -28, -26, -25, -24, -23, -22, -20, -19 dc.b -18, -17, -16, -14, -13, -12, -11, -10, -8, -7, -6, -5 dc.b -4, -2, -1, 0, 1, 2, 4, 5, 6, 7, 8, 10 dc.b 11, 12, 13, 14, 16, 17, 18, 19, 20, 22, 23, 24 dc.b 25, 26, 28, 29, 30, 31, 33, 34 *};

Summary

My floating point conversion subroutines for Motorola 68000 come forty years late. In 1986, each of my friends had an ATARI ST 520. We enhanced the 520 from 512KByte RAM to 1MByte, added a floppy disk and later 20MByte hard disk. My hard disk had even 30 MByte, thanks to a RLL hard disk controller. I used the very fine Lattice C compiler. Only after programming C on 8088 MS-DOS "boxes" I realized how nice 32bit address pointers are and how fine the "real time" behavior of TOS was compared to MS-DOS. Atari ST can do real time MIDI ...

FP conversion Algorithms (program) size is 540 Bytes
FP conversion Data Structures (constants) size is 646 Bytes

Further work

Next steps for a full 68000 FP package are implementation of fpadd, fpsub, fpmul and fpdiv. The 1981 "MC68343 FLOATING POINT FIRMWARE" assembler source code is again available. This FP package was used in Commodore Amiga ROM. The DTACK GROUNDED newsletter, beginning in 1981, has a lot of information about this topic, too. The transcendental functions like sin and cos can be approximated by series expansion (Microsoft Altair BASIC, Sinclair ZX81 BASIC) or by CORDIC (HP35, Motorola FFP). See my CORDIC implementation in Z80 assembler document. See also my square root implementation in C language document.

Download

File fp_conversion.zip contains the C version of the program, the 68000 assembler version and a C program to create the tables c1 to c5.

Contact

Author contact E-mail is: