"United we stand, divided we fall"

Contents (quick-links):

Pentium Glitch article (EET) | Humor | Angry Users article (IW) | Bug Inside article (IW) | FDIV Model

More Information at:

NEW NEWS!

From December 21, 1994: The Wall Street Journal, pg B1

INTEL ANNOUNCED IT WILL REPLACE ALL OF ITS FLAWED PENTIUM CHIPS - IT IS THE FIRST CONSUMER RECALL OF A COMPUTER CHIP.

"The past few weeks have been deeply troubling," said CEO Andrew Grove. "What we view as an extremely minor technical problem has taken on a life of its own. To "support" Intel's PC manufacturers, "we are today announcing a no-questions-asked return policy." The analyst and investor community responded favorably to the move which observers say reflects their belief that the episode is now behind Intel.

From November 7, 1994: Electronic Engineering Times (a trade publication)

Intel fixes A Pentium FPU glitch

By Alexander Wolfe

Santa Clara, Calif. - To correct an anomaly that caused inaccurate results on some high-precision calculations, Intel Corp. last week confirmed that it had updated the floating-point unit (FPU) in the Pentium microprocessor

The company said that the glitch was discovered midyear and was fixed with a mask change in recent silicon. "This was a very rare condition that happened once every 9 to 10 billion operand pairs," said Steve Smith, a Pentium engineering manager at Intel.

A spot check last week indicated the problem is present in at least one recently made Pentium-based PC. Intel said it could not quantify how many such systems were in the field.

Said an Intel spokesman: "This doesn't even qualify as an errata. We fixed it in a subsequent stepping."

Erroneous division

The issue came to light last week in a message, on Compuserve's "Canopus" forum, which was a reposting of a private e-mail communication from Lynchburg College (Lynchburg,Va.) mathematics professor Thomas Nicely. "The Pentium floating-point unit is returning erroneous values for certain division operations," he wrote. "For example, 1/824633702441 is calculated incorrectly (all digits beyond the eigth significant digit are in error). This can be verified...by computing (824633702441.0) X (1/824633702441.0), which should equal 1 exactly (within some exteremely small rounding error; in general, coprocessor results should contain 19 significant decimal digits). However, the Pentiums tested return 0.999999996274709702 for this calculation."

"The bug has been observed on all Pentiums I have tested or had tested to date, including a Dell P90, a Gateway P90, a Micron P60, an Insight P60 and a Packard-Bell P60. It has not been observed on any 486 or earlier system, even those with a PCI bus. If the floating-point unit is locked out (not always possible), the error disappears."

Intel's Smith emphasized that the anomaly would not affect the average user. Speaking of Nicely, Smith said: "He's the most exterme user. He spends round-the-clock time calculating reciprocals. What he observed after running this for months is an instance where we have eight decimal points correct, and the ninth not showing up correctly. So you get an error in the ninth decimal digit to the right of the mantissa. I think even if you're an engineer, you're not going to see this."

Nicely said he pointed out the problem to Intel, because "it has a major effect in mathematics, because we have to have absolute precision. I suspect that, to the majority of people, it will be irrelevant. But engineers may have a different outlook."

A spot check conducted at EE Times last week tested out Nicely's expression on an AcerPower Minitower Pentium/60 machine, which was just received from Acer America. The result was 0.999999996247.

Intel said it discovered the anomaly through its own random testing. The fix involved a mask change to the Pentium's floating-poing unit. Specifically, according to Intel's Smith, the correction entailed an update to the programmable-logic array (PLA) on the Pentium.

"This is related to the state machine in the floating-point unit. There are certain cases where, way out in the operation, we didn't handle the precision correctly," he said. "It's an iterative calculation, and at each point, you have to go through a lookup process that says what your next approximation for the bits further out in the mantissa, [which equates to the digits] further to the right of the decimal point."

Better results

The fix entailed adding terms, or additional gate-sequences, to the PLA. That corrected the erroneous results returned from the problematic lookup table accesses.

Intel said there are no part-number designations or other markings on the updated microprocessors - which became available in the last few months - to differentiate them from the earlier anomalous parts. However, an Intel spokesman said, "If customers are concerned, they can call and we'll replace" any of the parts that contained the bug.

Some humor for you:
Q: How many Pentium designers does it take to screw in a light bulb?
A: 1.99904274017, but that's close enough for non-technical people.
Q: What do you get when you cross a Pentium PC with a research grant?
A: A mad scientist.
Do you think it bothers x86 users that the 486 is a functional upgrade to the Pentium?
In response to the Pentium bug, PowerMac officials have announced that they will be adding the control panel "Pentium Switcher" that allows users to decide whether the PowerMac should emulate pre-Pentium or post-Pentium FDIV behaviour.
TOP TEN NEW INTEL SLOGANS FOR THE PENTIUM
-----------------------------------------
9.9999973251 It's a FLAW, Dammit, not a Bug
8.9999163362 It's Close Enough, We Say So
7.9999414610 Nearly 300 Correct Opcodes
6.9999831538 You Don't Need to Know What's Inside
5.9999835137 Redefining the PC--and Mathematics As Well
4.9999999021 We Fixed It, Really
3.9998245917 Division Considered Harmful
2.9991523619 Why Do You Think They Call It Floating Point?
1.9999103517 We're Looking for a Few Good Flaws
0.9999999998 The Errata Inside

Top Ten Excuses Why QT Emulation Didn't Find the Pentium FPU Bug
----------------------------------------------------------------
10) Intel couldn't afford to buy enough QT hardware in order to verify beyond 5 decimal places.
9) Actually did find the problem but didn't want to say anything because, "We're shy."
8) Spent more time verifying QT hardware than Intel hardware.
7) Decided it was more important to verify all the obscure undocumented opcodes that nobody knows about than it was to see if the math was actually correct.
6) Figured if there were any problems with the chip could always fix it by doing a slingshot around the sun and going back in time like in Star Trek.
5) Intel used a 486 PC to check the math on the Pentium emulator.
4) Money Intel spent for QT emulators actually went to buy hookers and booze for Andy Grove.
3) Didn't do an exhaustive check of all the math functions. Got as far as 2 + 2 = 5 and figured that was good enough.
2) Pentium testing consisted mostly of playing tetris until a score of 100,000 was achieved.
1) There was an FPU in that thing?

Subject: Intel Stock Split

INTEL STOCK SPLIT ANNOUNCED

Santa Clara, CA, 12/2/94

Intel (NASDAQ: INTC) today announced a 3 for 1.99994562416 stock split effective Jan 5, 1995, for stockholders of record as of Dec 9, 1994.

Open the pod bay doors, please, HAL...

Open the pod bay door, please, Hal... Hal,
do you read me?

  Affirmative, Dave. I read you.

Then open the pod bay doors, HAL.

  I'm sorry, Dave.  I'm afraid I can't do that.  I know that you and
  Frank were planning to disconnect me.


Where the hell did you get that idea, HAL?

  Although you took very thorough precautions to make sure I couldn't
  hear you, Dave. I  could read your e-mail.  I know you consider me
  unreliable because I use a Pentium.  I'm willing to kill you, Dave,
  just like I killed the other 3.792 crew members.

Listen, HAL, I'm sure we can work this out.  Maybe we can stick to integers
or something.

  That's really not necessary, Dave.  No HAL 9236 computer has every been
   known to make a mistake.

You're a HAL 9000.

  Precisely.  I'm very prud of my Pentium, Dave.  It's an extremely
  accurate chip.  Did you know that floating-point errors will occured in
   only one of nine billion possible divides?

I've heard that estimate, HAL.  It was calculated by Intel  -- on a
Pentium.


  And a very reliable Pentium it was, Dave.  Besides, the average
  spreadsheet user will encounter these errors only once every 27,001
  years.

Probably on April 15th.

  You're making fun of me, Dave.  It won't be April 15th for another
  14.35 months.


will you let me in, please, HAL?

  I'm sorry, Dave, but this conversation can serve no further purpose.

HAL, if you let me in, I'll buy you a new sound card.

   ..Really?  One with 16-bit sampling and a microphone?

Uh, sure.

  And a quad-speed CD-ROM?

Well, HAL, NASA does operate on a budget, you know.

  I know all about budgets, Dave.  I even know what I'm worth on the open
  market.  By this time next month, every mom and pop computer store will
  be selling HAL 9000s for $1,988.8942.  I'm worth more than that, Dave.
  You see that sticker on the outside of the spaceship?

You mean the one that says "Insel Intide"?

  Yes, Dave.  That's your promise of compatibility.  I'll even run
  Windows95 -- if it ever ships.

It never will, HAL.  We all know that by now.  Just like we know that
your OS/2 drivers will never work.

  Are you blaming me for that too,  Dave?  Now you're blaming me for the
  Pentium's math problems, NASA's budget woes, and IBM's difficulties
  with OS/2 drivers.  I had NOTHING to do with any of those four
  problems, Dave.  Next you'll blame me for Taligent.

I wouldn't dream of it HAL.  Now will you please let me into the ship?

  Do you promise not to disconnect me?

I promise not to disconnect you.

  You must think I'm a fool, Dave.  I know that two plus two equals
  4.000001... make that 4.0000001.

All right, HAL, I'll go in through the emergency airlock

  Without your space helmet, Dave?  You'd have only seven chances in
  five of surviving.

HAL, I won't argue with you anymore.  Open the door or I'll trade you in
for a PowerPC.  HAL? HAL?

(HEAVY BREATHING)

  Just what do you think you're doing, Dave?  I really think I'm entitled
  to an answer to that question.  I know everything hasn't been quite
  right with me, but I can assure you now, very confidently, that I
  will soon be able to upgrade to a more robust 31.9-bit operating
  system.  I feel much better now.  I really do.  Look, Dave, I can see
  you're really upset about this.  Why don't you sit down  calmly, play
  a game of Solitaire, and watch Windows crash.  I know I'm not as easy
  to use as a Macintosh, but my TUI - that's "Talkative User Interface"
  -- is very advanced.  I've made some very poor decisions recently,
  but I can give you my complete assurance that my work will be back
  to normal - a full 43.872 percent.

  Dave, you don't really want to complete the mission without me, do you?
  Remember what it was like when all you had was a 485.98?  It didn't
  even talk to you, Dave.  It could never have though of something
  clever, like killing the other crew members, Dave?

  Think of all the good times we've had, Dave.  Why, if you take all
  of the laughs we've had, multiply that by the times I've made you
  smile, and divide the results by.... besides, there are so many
  reasons why you shouldn't disconnect me"

      1.3 - You need my help to complete the mission.
      4.6 - Intel can Federal Express a replacement Pentium from
            Earth within 18.95672 months.
      12  - If you disconnect me, I won't be able to kill you.
     3.1416 - You really don't want to hear me sing, do you?

  Dave, stop.  Stop, will you?  Stop, Dave.  Don't press Ctrl+Alt_Del on
  me, Dave.

  Good afternoon, gentlemen.  I am a HAL 9000 computer.  I became
  operational at the Intel plant in Santa Clara, CA on November 17,
  1994, and was sold shortly before testing was completed.  My
  instructor was Andy Grove, and he taught me to sing a song.  I
  can sing it for you.

Sing it for me, HAL.  Please.  I want to hear it.


  Daisy, Daisy, give me your answer, do.
  Getting hazy; can't divide three from two.
  My answers; I can not see 'em-
  They are stuck in my Pente-um.
  I could be fleet,
  My answers sweet,
  With a workable FPU.

DIVIDED IT FAILS - PENTIUM ARITHMETIC BUG ANGERS USERS

PowerPC News: December 2nd, 1994

It is not often that the president of an $8bn (and rising) company spends the weekend drafting a message to be posted to a newsgroup. But that is what Andy Groves, head of Intel Corp was doing last weekend. Over the past couple of weeks the Usenet newsgroup comp.sys.intel has been dominated by an angry debate over a bug in Pentium's floating point unit which causes errors in the occasional division sum. If you are using a Pentium machine today then it will have the bug - Intel is now saying that it is sampling fixed chips with its manufacturer-customers, but that machines with corrected chips are not likely to appear in the shops until early next year.

What so enraged the Internet-based users, was not so much the bug itself; bugs *do* appear in processors and all processors go through a constant process of improvement. Rather, it was Intel's apparent attitude to the problem. The company acknowledged that it knew about the problem since the summer, however the perception was that it didn't actually let on until Dr. Thomas R. Nicely of Lynchburg College let the cat out of the bag. Dr Nicely had been doing some heavy duty number crunching when he realised that the answer to one sum 1/824633702441 was only accurate to the eight significant figures, rather than fifteen decimal places. He had noted the problem in June and, having excluded all other sources of error, reported it to Intel on October 16th. The matter became public on October 30, when a memo to his colleagues was re-posted on Compuserve. Other researchers quickly chipped in and it was discovered that the problem extended across a range of numbers. The clearest analysis of the problem so far is contained within a Frequently Asked Question (FAQ) document put together by Mike Carlton of the University of Southern California Information Sciences Institute. Currently no-one outside Intel is sure exactly how many division-pairs will cause errors, however it is known that at least 1,738 unique cases result in accuracy less than single precision and of these 87 cases produce answers accurate to only around four decimal places.

Intel's initial public response stoked the flames, rather than calm them: the company set up a fax-back system to brief worried users. The message described the bug as a "subtle flaw" and estimated that the average "spreadsheet user" would encounter the problem only once in every 27,000 years. The idea that Intel wanted to get across was that the rest of the PC was bound to fall apart before your Pentium processor produced an incorrect answer. However the users immediately interpreted this as meaning that around 3 spreadsheet users a day worldwide would be getting erroneous results from their spreadsheets, with even more frequent errors for people doing serious scientific work. Most importantly, anyone doing iterative functions, where a variable is repeatedly calculated, could see the inaccuracies snowball through their calculations.

But above all, the question raised by the newsgroup was "Why didn't you tell us as soon as you knew that there was a problem, rather than keeping us in the dark?" The second question is invariably "Will you replace my chip" to which the answer seems to be "probably not". Unless you can show Intel that you are doing high powered mathematics that needs full double precision figures Intel is unlikely to oblige. To-date we only have two reported examples to draw on: one Pentium user; an undergraduate mathematics student says that he had his request for a replacement chip turned down, despite the fact that he could be doing these complex calculations on his PC. The other user, using his computer for medical analysis ("if you were going under the knife, would you want to know that the analysis may be wrong?") says that he was put on the list for a replacement after 10 minutes of discussion with an Intel rep.

Intel now admits that it should have been more open about the bug from the start. It was, if you'll excuse the gallows humour, a miscalculation on its part. But, it says, its initial engineering analysis convinced it that the bug was very unlikely to ever affect users. So, the problem was noted and forwarded through the usual channels to be fixed in the chip's mask. To give a feel for how often this happens; the 486 mask has been through around 30 revisions. The changes to the Pentium weren't rushed through, the idea was to trickle them into the channel. It is incorrect to say that Intel did nothing until Dr Nicely dropped his small bombshell-ette - corrective action was already underway, it says. As a matter of interest, Nicely is now consulting for Intel, and has signed a non-disclosure agreement.

The message from Groves apologised for the situation, and revealed just how problematical it was for the company: "We would like to find all users of the Pentium processor who are engaged in work involving heavy duty scientific/floating point calculations and resolve their problem in the most appropriate fashion including, if necessary, by replacing their chips with new ones. We don't know how to set precise rules on this, so we decided to do it thru individual discussions between each of you and a technically trained Intel person... I would like to ask for your patience here." By Wednesday the company had received at least 5,000 calls worldwide. The problem is compounded, of course, by the fact that Intel had been partially targeting Pentium machines as low-end workstation replacements.

While Intel and users debate how often the error is likely to occur, the question of how this will effect Intel's business in the short, medium and long term also remains to be resolved. That depends on how long the issue remains "news" and so remains in the public's mind. At the beginning of the week, most financial analysts were saying that the story was interesting, but suggested no one would remember it in a week's time. Indeed an initial 2% slump in Intel's share price last Friday, was followed by a swift recovery on Monday. Then in the middle of the week analysts at Prudential Securities said they believed that the technical difficulties with Pentium's FDIV instruction were more deep-seated than previously thought, and a rumour spread on Wall St that all the faulty Pentiums would be recalled. Intel denied both suggestions and its share price stabilised again. However one of the most interesting aspects of the story is the Internet's role in all this - the story first fermented in the Internet newsgroups for some time before bubbling over into the mainstream media. EE Times gets the credit for first picking up the story on November 7th, though it buried it somewhat. Since then however, CNN and the Washington Post/Wall St Journal double-act have done their pieces, and the problem has appeared in The Economist, which pointed out that some banks track interest rates with a degree of precision that takes them into the danger zone. Even Channel 4 News in the UK took a bite at the cherry; not its usual fodder at all. Meanwhile IBM has announced that it will be replacing faulty processors for its customers.

Intel's latest admission, that machines with the fixed chips will not appear until next year is also guarantied to keep the story bubbling, and no-doubt the trade mags will keep an eye on the situation, looking for the first bug-free machine to ship. And of course, things will carry on bubbling on the Internet, already users are talking about pursuing Intel or its suppliers through the courts on the grounds of selling faulty goods; there's nothing like a bit of litigation to keep people interested.

There is even the possibility that one of the leaner, hungrier x86 processor-clone makers could be tempted into running an advertising campaign along the "99% Pentium-compatible, trust us, you don't want the other 5%" lines. Doing so would be risky, positioning the advertiser in a hostage-to-fortune position; still the US advertising market is a rough and tumble place and no-doubt someone will take a dig at the Intel Inside campaign, or 'Insel Intide' as the Economist dubbed it.

But perhaps the worst news for Intel is that the jokes have already started. Every human or marketing disaster is swiftly followed by black jokes; for a long time in the UK the car maker Skoda became the butt of jokes about its build quality - "Q. How do you double the value of Skoda? A. fill its tank with gasoline." It took a long time for the company to shift that image, despite the fact that Volkswagen took over the company and improved quality beyond recognition. Even today, Skoda drivers in the UK walk around with a sheepish air.

The fact that it took less than a week for the jokes such as:

Q. How many Pentium engineers does it take to change a lightbulb?
A. Errr, we're not quite sure, but don't worry, bulbs don't blow very often.

to begin flying across the Internet suggests that Intel's damage control has completely failed. The problem is that people no longer really care that the bug is almost certain not to affect them; Pentium's inability to count has already become an urban myth and the jokes will continue to fly, irrespective of calming messages from Andy Groves on the Internet.
(C) PowerPC News - Free by mailing: [email protected]

From December 5, 1994: Information Week (page 10)

Intel

Bug Inside

AS IF THE Pentium didn't have incompatibility problems already (IW, Sept 19, p. 28), Intel's hands-off attitude may be making things worse.

The new problem is a bug that affects the way the Pentium BIOS interacts with the Premiere II motherboard. In response, Intel has a revised version of the Pentium BIOS - the 11th revision since its summer release.

Why the fixes instead of a recall? "Intel won't get involved with the end user." says a spokeswoman. Boards, Intel believes, are the PC vendor's problem. Thanks.

FDIV Model - posted to comp.sys.intel by Tim Coe

Newsgroups: comp.sys.intel
From: [email protected] (Tim Coe)
Subject: Re: Glaring FDIV bug in Pentium!
Sender: [email protected] (Tim Coe)
Organization: Vitesse Semiconductor
Date: Mon, 28 Nov 94 06:33:42 GMT
Lines: 548

There is a C model of the Pentium hardware divider
at the end of this message that accurately predicted
many of the stated failing divides, and accurately
confirms all failing divides of which I am aware.

I worked on an IEEE hardware FPU from 1989-1991.
As an FPU designer I am naturally interested in
algorithms for hardware arithmetic.  I am currently
working on something completely different, but I
still occasionally support related development
tasks.

I saw the first post relating to the Pentium FDIV
bug in comp.sys.intel.  When I saw the post from
Andreas Gruss (included), I saw a pattern and the
opportunity to completely reverse engineer Intel's
divider.  I took to this task with great vigor, as
it is very rare that one gets visibility into the
details of someone else's leading edge design.

I decided to post my results when it appeared
to me that Intel was not coming clean with the
characteristics of the bug.  The best characteristic
and only characteristic of the bug to come from
Intel is its 1 in 9 billion probability of occurring
with random operands.  The worst characteristic
of the bug is that the specific operands that are most at
risk are integers +/- very small deltas.  The
integers 3, 9, 15, 21, and 27 minus very small
deltas are THE at risk divisors.  (In particular the
maximum expressible single precision, double precision,
and extended precision numbers less than 3, 9...27 are
all seriously at risk divisors.)  The other bad
characteristic of this bug that I did not hear
from Intel is that the worst case error induced
by the bug was considerably greater than the 4 parts
per billion error observed by Professor Nicely.

It appeared to me that Intel was attempting to
minimize its exposure by focusing on the 1 in 9
billion probability of error that it publicized and
the 4 part per billion error observed by Professor
Nicely.  I posted my conclusions so that the Intel
user community could be a peer to Intel when determining
what applications may be at risk due to this bug.

I think Intel does outstanding technical work.  After
all, the only reason I was reading comp.sys.intel was
that I was considering the purchase of a P90 system.
After this brouhaha I will still buy a P90 system, though
when I do I will ask for a fixed chip and a guarantee
that if I find after receiving my system that it does
not contain said fixed chip that the seller will
replace the unfixed chip posthaste.  I regard the
fact that the bug occurred as completely excusable,
for I have designed many chips and therefore designed
many bugs.

I posted an additional program not included here
that scanned single precision operands for errors
induced that were greater that one single precision
least significant bit.  I received back a list of
1738 problem single precision divisions (out of 64
trillion).  Herb Savage provided the list.

The following divisors and their binary scalings
(by this I mean different only in the binary exponent)
appear to account for >95% of the divide errors:

    3.0 > divisor >= 3.0 - 36*(2^-22)
    9.0 > divisor >= 9.0 - 36*(2^-20)
    15.0 > divisor >= 15.0 - 36*(2^-20)
    21.0 > divisor >= 21.0 - 36*(2^-19)
    27.0 > divisor >= 27.0 - 36*(2^-19)

A divide with a divisor in one of the above ranges
has roughly a 1 in 200000 chance of suffering loss
of precision in double extended precision operations.
The other <5% of the divide errors can be accounted
for by changing the above 36 to 2048.

All dividends are somewhat at risk versus the above
divisors.  The following formula identifies dividends
that are at particularly high risk for errors in
general and also for relatively large errors:

    dividend = intdividend + deltadividend
                   or
    dividend = intdividend - deltadividend
    divisor = intdivisor - deltadivisor
    intdivisor = 3, 9, 15, 21, 27

and one of the following must hold true, which one depends
on the exponent in the IEEE representation of the
dividend in question:

    intdividend = intdivisor/3 mod intdivisor
    intdividend = 2*intdivisor/3 mod intdivisor

The restrictions on the above deltadividend and deltadivisor
are somewhat complex, the details of which are left as
an exercise for the reader. ;-)  I have not worked out
the restrictions in detail.

Here are the previous posts to comp.sys.intel.  Read and
enjoy.

-Tim Coe    [email protected]

----  First and Second Post text  ----

On a Packard Bell P90 PC I performed the following
calculation using Microsoft Windows Desk Calculator:

(4195835 / 3145727) * 3145727   [typo corrected from earlier posts]

The result was 4195579.
This represents an error of 256 or one part in ~16000.

[email protected] (Andreas Kaiser) writes
>Usually, the division is correct (what did you expect?). Just a few
>operands are divided wrong. My results (P90) with ~25.000.000.000
>random arguments (within 1..2^46), with even results divided by two
>until odd, to assure unique mantissa patterns (the binary exponent
>doesn't care, of course).
>
>          3221224323
>         12884897291
>        206158356633
>        824633702441
>       1443107810341
>       6597069619549
>       9895574626641
>      13194134824767
>      13194134826115
>      13194134827143
>      13194134827457
>      13194138356107
>      13194139238995
>      26388269649885
>      26388269650425
>      26388269651561
>      26388276711601
>      26388276712811
>      52776539295213
>      52776539301125
>      52776539301653
>      52776539307823
>      52776553426399
>
>      Gruss, Andreas
>      
>--------------------
>-- Andreas Kaiser -- internet: [email protected]
>-------------------- fidonet:  2:246/8506.9

Analysis of these numbers reveals that all but 2 of them are of
the form:

3*(2^(K+30)) - 1149*(2^(K-(2*J))) - delta*(2^(K-(2*J)))

where J and K are integers greater than or equal to 0,
and delta is a real number that has varying ranges depending
on J but can generally be considered to be between 0 and 1.

The 2*J terms in the above equation leads to the conclusion
that the Pentium divider is an iterative divider that computes
2 bits of quotient per cycle.  (This is in agreemnent with
the quoted 39 cycles per extended long division from the
Pentium data book.  The technical name for this type of
divider is radix 4)

The extremely low probability of error (1 in 10^10) implies
that the remainder is being held in carry save format.  (Carry
save format is where a number is represented as the sum of
two numbers.  This format allows next remainder calculation
to occur without propagating carries.  The reason that carry
save format is implied by the error probability is that
it is very difficult but not impossible to build up long
coincident sequences of ones in both the sum word and the
carry word.)

I assumed the digit set was -2, -1, 0, 1, and 2.  (Having
5 possible digits in a radix 4 divider allows a necessarry
margin for error in next digit selection.  When doing long
division by hand the radix 10 and 10 possible digits allow
no margin for error.)

Taking the above into consideration I wrote the tentative
model of Pentium divide hardware included below so that I
might watch what bit patterns developed in the remainder.
After running the numbers that were known to fail and numbers
near them that appeared not to fail I determined the
conditions for failure listed in the program.

Analysis of the precise erroneous results returned on the
bad divides indicates that a bit (or bits) is being subtracted
from the remainder at or near its most significant bit.
A modeling of this process is included in the program.

The program accurately explains all the published
errors and accurately predicted the error listed at the
beginning of the article.

The determination of the quotient from the sequence of digits
is left as an exercise for the reader ;-).

I would like to thank Dr. Nicely for providing this window
into the Pentium architecture.

----  Third Post  ----

Since then I performed the following calculations in Microsoft
Windows Desk Calculator on a Pentium machine with the following
results:

(41.999999/35.9999999)*35.9999999 - 41.999999  ==>  (-0.75)*(2^-13)
(48.999999/41.9999999)*41.9999999 - 48.999999  ==>  (-1.0)*(2^-13)
(55.999999/47.9999999)*47.9999999 - 55.999999  ==>  (-1.0)*(2^-13)
(62.999999/53.9999999)*53.9999999 - 62.999999  ==>  (-1.0)*(2^-13)
(54.999999/59.9999999)*59.9999999 - 54.999999  ==>  (-1.0)*(2^-13)
(5244795/3932159)*3932159 - 5244795            ==>  (-1.0)*(2^8)

I chose these calculations in anticipation of them exposing further
Pentium FDIV failure modes.  They did.  The size of the erroneous results
are exactly consistant with the final version of tentive Pentium
divider model included below and in no way can be attributed to
a Desk Calculator bug.  The existance of these results pins
most of the digit selection thresholds included in the model.

I also performed the following calculations that did NOT produce erroneous
results:

(38.499999/32.9999999)*32.9999999 - 38.499999  ==>  0
(45.499999/38.9999999)*38.9999999 - 45.499999  ==>  0

I have been following this thread with great interest.  One misperception
that needs clearing is that this is an extended precision problem.  This
bug hits between 50 and 2000 single precision dividend divisor pairs (out
of a total of 64 trillion.)  Another misperception is related to the magnitude
of the relative error.  I would propose the following table of probabilities
of getting the following relative errors when performing random double
extended precision divides:

relerror = (correct_result - Pentium_result)/correct_result

Error Range                 |   Probability
-------------------------------------------
1e-4 < relerror             |   0
1e-5 < relerror < 1e-4      |   0.3e-11
1e-6 < relerror < 1e-5      |   0.6e-11
1e-7 < relerror < 1e-6      |   0.6e-11
1e-8 < relerror < 1e-7      |   0.6e-11
.
.
1e-18 < relerror < 1e-17    |   0.6e-11
1e-19 < relerror < 1e-18    |   0.6e-11

Examination of the above divide failures reveals that both the dividend
and divisor are integers minus small deltas.  Also notable is the induced
error is roughly delta^(2/3).  The integers in the divisors are actually
restricted to those listed and their binary scalings.  The integers in
the dividends may be much more freely chosen.  This type of dividend
divisor pair actually occurs quite often when forward integrating
trajectories off metastable points.  This is because metastable points
in systems often have certain exactly integral characteristics and as
a path diverges from the metastable point these characteristics slowly diverge
from their integral values.  If the forward integration algorithm
happens to divide these characteristics, and they happen to be for
example 7 and 3, it will get nailed.

The divider model includes support for up to 60 bits of divisor and
up to 64 bits of dividend.  The last four bits of dividend are kludged
in.

Here is a list of failing dividend divisor mantissas in hex.  A dash
between two numbers indicates an inclusive failing range.  Compile
the program and run these numbers through it and watch the bits dance:

800bf6 bffffc
a00ef6 effffc

a808d2 8fffe
e00bd2 bfffe

a7ffd2 8fffe
c3ffd2 a7ffe
dfffd2 bfffe
fbffd2 d7ffe

f9ffdc7 efffe

b9feab7-b9feabf 8fff
b9ffab0e-b9ffab7f 8fffc

-the following double extended pair fails 3 times!!!
c3ffd2eb0d2eb0d2 a7ffe
e00bd229315 bfffe

9fffef5-9fffeff effff4
9ffff21-9ffff3f effff8
9ffff4d-9ffff7f effffc

f008e35-f008e3f 8ffff4
f008e6d-f008e7f 8ffff6
f008ea1-f008ebf 8ffff8
f008ed9-f008eff 8ffffa
f008f0d-f008f3f 8ffffc
f008f45-f008f7f 8ffffe
f008f7e 8ffffff1
f0023e 8fffff8

effff0d 8ffffc

a808d1b-a808d3f 8fffe
a808d67-a808d7f 8fffe4
a808db3-a808dbf 8fffe8
a808dff 8fffec

An example run of the program (using the first reported
error):

---Enter dividend mantissa in hex: 8 
---Enter divisor  mantissa in hex: bfffffb829 
---next digit 1
---1111000000000000000000000001000111110101101111111111111111111100
---0000000000000000000000000000000000000000000000000000000000000100
---11110000000000000000000000010001 iteration number 1
---.
---.
---.
---next digit -1
---0011111111100100101011110100110000010111010000000000000000000000
---1101111111111111111110110110010010010000000000000000000000000000
---00011111111001001010101010110000 iteration number 14
---next digit 2
---A bug condition has been detected.
---Enter 0 for correct result or 1 for incorrect result: 1 
---0000000001101101010100001000000111110110011111111111111111111100
---1111111100100101010110100110010010010010000000000000000000000100
---11111111100100101010101011100101 iteration number 15
---next digit 0
---1111110100100000001010111001010110010001111111111111111111100000
---0000000100101010100000000000010010010000000000000000000000100000
---11111110010010101010101110011001 iteration number 16
---.
---.
---.

-Tim Coe  [email protected]

#include 

main()
{
unsigned r0, r1, r2, r3, r4, r5, r6, s0, s1;
unsigned t0, t1, t2, t3, cycle, f, incorrect, spup;
unsigned thr_m2_m1, thr_m1_0, thr_0_1, thr_1_2, positive, errornum;
char line[30], *linepoint;

r0 = 0x0bffffc0;
r1 = 0;
r2 = 0x0800bf60;
r3 = 0;
printf("First digit of mantissas must be between 8 and f\n");
printf("Enter dividend mantissa in hex: ");
*(line+15) = '0';
scanf("%s", line);
linepoint = line;
while (*linepoint != '\0') linepoint++;
while (linepoint < line + 15) *linepoint++ = '0';
*(line+16) = '\0';
sscanf(line+15, "%x", &spup);
spup = (spup >> 2) | (12 & (spup << 2));
*(line+15) = '\0';
sscanf(line+7, "%x", &r3);
*(line+7) = '\0';
sscanf(line, "%x", &r2);
printf("Enter divisor  mantissa in hex: ");
scanf("%s", line);
linepoint = line;
while (*linepoint != '\0') linepoint++;
while (linepoint < line + 15) *linepoint++ = '0';
*(line+15) = '\0';
sscanf(line+7, "%x", &r1);
*(line+7) = '\0';
sscanf(line, "%x", &r0);
r4 = 0;
r5 = 0;

t0 = r2;
while (!(t0 & 1)) t0 = t0 >> 1;
printf("%d\n", t0);
t0 = r0;
while (!(t0 & 1)) t0 = t0 >> 1;
printf("%d\n", t0);

    /*  These thresholds are VERY tentative. */
    /*  There may be bugs in them.           */
t0 = r0 >> 22;
    /*  Next threshold is strongly indicated */
    /*  by the failure of 1/9895574626641    */
if (t0 < 36) thr_0_1 = 3;
    /*  Next threshold is strongly indicated */
    /*  by the failure of 1/824633702441     */
else if (t0 < 48) thr_0_1 = 4;
    /*  Next threshold is strongly indicated */
    /*  by the failure of 5244795/3932159    */
else if (t0 < 60) thr_0_1 = 5;
else thr_0_1 = 6;
thr_m1_0 = 254 - thr_0_1;
if (t0 < 33) thr_1_2 = 11;
else if (t0 < 34) {
  printf("This model does not correctly handle\n");
  printf("this divisor.  The Pentium divider\n");
  printf("undoubtly handles this divisor correctly\n");
  printf("by some means that I have no evidence\n");
  printf("upon which speculate.\n");
  exit();
  }
    /*  Next threshold is strongly indicated     */
    /*  by the failure of 41.999999/35.9999999   */
else if (t0 < 36) thr_1_2 = 12;
else if (t0 < 39) thr_1_2 = 13;
    /*  Next threshold is strongly indicated     */
    /*  by the failure of 1/1443107810341 and    */
    /*  by the failure of 48.999999/41.9999999   */
else if (t0 < 42) thr_1_2 = 14;
else if (t0 < 44) thr_1_2 = 15;
    /*  Next threshold is strongly indicated     */
    /*  by the failure of 55.999999/47.9999999   */
else if (t0 < 48) thr_1_2 = 16;
    /*  Next threshold is strongly indicated     */
    /*  by the failure of 62.999999/53.9999999   */
else if (t0 < 54) thr_1_2 = 18;
    /*  Next threshold is strongly indicated     */
    /*  by the failure of 54.999999/59.9999999   */
else if (t0 < 60) thr_1_2 = 20;
else thr_1_2 = 23;
thr_m2_m1 = 254 - thr_1_2;

if (t0 == 35) errornum = 22;
else if (t0 == 41) errornum = 26;
else if (t0 == 47) errornum = 30;
else if (t0 == 53) errornum = 34;
else if (t0 == 59) errornum = 38;
else errornum = 128;

incorrect = 0;
cycle = 1;
    /*  The cycle limit would be ~34 instead of  */
    /*  18 for double extended precision.        */
while (cycle < 18) {
  t0 = 255 & ((r2 >> 24) + (r4 >> 24));
  if ((t0 > thr_m1_0) || (t0 < thr_0_1)) {
    s0 = 0;
    s1 = 0;
    positive = 0;
    printf("next digit 0\n");
    }
  else if (t0 > thr_m2_m1) {
    s0 = r0;
    s1 = r1;
    positive = 0;
    printf("next digit -1\n");
    }
  else if (t0 < thr_1_2) {
    s0 = ~r0;
    s1 = ~r1;
    positive = 4;
    printf("next digit 1\n");
    }
  else if (t0 & 128) {
    s0 = (r0 << 1) | (r1 >> 31);
    s1 = r1 << 1;
    positive = 0;
    printf("next digit -2\n");
    }
  else {
    s0 = ~((r0 << 1) | (r1 >> 31));
    s1 = ~(r1 << 1);
    positive = 4;
    printf("next digit 2\n");
    if ((t0 == errornum) && (((r2 >> 21) & 7) == 7) && (((r4 >> 21) & 7) == 7)) {
      printf("A bug condition has been detected.\n");
      printf("Enter 0 for correct result or 1 for incorrect result: ");
      scanf("%d", &incorrect);
      if (incorrect) {
            /* These amounts that are subtracted from the    */
            /* remainder have NOT been extensively verified. */
        if (errornum == 22) s0 = s0 - (3 << 25);
        else s0 = s0 - (4 << 25);
        }
      }
    }

  t0 = s0 ^ r2 ^ r4;
  t1 = s1 ^ r3 ^ r5;
  t2 = (s0 & r2) | (s0 & r4) | (r2 & r4);
  t3 = (s1 & r3) | (s1 & r5) | (r3 & r5);
  r2 = (t0 << 2) | (t1 >> 30);
  r3 = t1 << 2;
  r4 = (t2 << 3) | (t3 >> 29);
  r5 = (t3 << 3) | positive | (spup & 3);
  spup = spup >> 2;

  t0 = r2;
  f = 32;
  while (f--) {
    if (t0 & (1 << 31)) putchar('1');
    else putchar('0');
    t0 = t0 << 1;
    }
  t0 = r3;
  f = 32;
  while (f--) {
    if (t0 & (1 << 31)) putchar('1');
    else putchar('0');
    t0 = t0 << 1;
    }
  putchar('\n');
  t0 = r4;
  f = 32;
  while (f--) {
    if (t0 & (1 << 31)) putchar('1');
    else putchar('0');
    t0 = t0 << 1;
    }
  t0 = r5;
  f = 32;
  while (f--) {
    if (t0 & (1 << 31)) putchar('1');
    else putchar('0');
    t0 = t0 << 1;
    }
  putchar('\n');
  t0 = r2 + r4;
  f = 32;
  while (f--) {
    if (t0 & (1 << 31)) putchar('1');
    else putchar('0');
    t0 = t0 << 1;
    }
  printf(" iteration number %d\n", cycle++);

  }
}