Hot, cold, and broken: Thermal-design techniques
Both heat and cold can have an adverse effect on your circuits. At extremely high temperatures, chips may burn up (Figure 1). More commonly, if your design is subject to temperatures you did not expect, many of the parts may fall out of specified limits. When this scenario happens, your circuits may not perform as you would expect. Equally concerning is a scenario in which circuits' temperatures go from hot to cold and then back again. Such situations can cause thermal shock and can even destroy components. Many engineers do not worry about the performance of their circuits at low temperatures, but this lack of concern is a mistake. The performance of semiconductor devices can change dramatically at low temperatures. The base-emitter junction voltage of bipolar transistors rises significantly in low temperatures (Figure 2 and Reference 1). "To design an amplifier that can operate on 1.8V at negative temperatures, you need to consider that VBE [base-emitter voltage] will increase by 130 mV from room temperature to –40°C," says Francisco Santos, product-development-engineering manager at Analog Devices. "This situation will force the designer into a different set of amplifier architectures."
Many amplifiers, such as the Analog Devices AD8045, speed up when they get cold (Figure 3), whereas others, such as the AD8099 slow down when they get cold. "Most of the trouble with cold in bipolar is low-voltage operation," says Bill Gross, now retired, former vice president and general manager of signal-conditioning products at Linear Technology. He says that higher base-emitter voltage and lower current gain make it more difficult to meet specifications. "Lower input impedance and mismatches in beta [current gain] cause bigger problems in the cold," he says, "especially if they are trimmed for room temperature. The higher gm [transconductance] is easy to compensate for by changing the operating current, but then the slew rate varies."
Low temperatures cause oscillations, instability, overshoot, and poor filter performance. The parts-per-million measurement can change your component values at both high and low temperatures. If you expect the IC die to work from –55 to +85°C, there are only 60° from a 25° ambient to the hottest temperature, but there are 80° from ambient to –55°C. So, make sure that your error budgets examine both the hot and the cold regimes. James McLaughlin, professor of electrical engineering at Kettering University (Flint, MI), says that, as you heat silicon past several hundred degrees, it "goes intrinsic." In other words, the temperatures would get high enough that the dopants would migrate through the lattice, and there would be no more PN junctions, just a block of conducting, impure silicon. Would the bond wires explode, or would the silicon continue to melt until it vaporized?
The damage to ICs running at higher temperatures can be subtler. Martin DeLateur, consultant and former product engineer at National Semiconductor, points out that at temperatures higher than 165°C, the molding compound starts to carbonize. At this point, the molding compound turns into a hard, gray material. Outgassing, the slow release of a gas that some material trapped, froze, absorbed, or adsorbed, causes the release of polymer additives, such as fire retardant. At low levels, this outgassing can impact an IC's short- and long-term operation by adding ions or surface effects to the chip. The bond wires, which may be conducting excessive current, also carbonize the mold compound. This excessive current can cause the hardening of carbon tubes, which might melt the bond wire yet keep it conductive inside the tube. Eventually, the higher thermal expansion cracks the passivation, die, or carbonized-molding compound and causes massive failure. (Military specifications define excessive current as that exceeding 1.2×105A/cm2; thus, the military insists on hermetically sealed packaging for ICs.) No charring or degradation occurs when there is no plastic on the die. Oil-well-instrumentation companies often test and characterize silicon ICs at 200°C for use in their products. These products have limited lifetime but work far longer than if they were in plastic packages. ICs have shorter lifetimes even when die temperatures are less than 150°C.
In 1884, Dutch chemist Jacobus H van't Hoff first proposed the Arrhenius equation, and Swedish chemist Svante Arrhenius physically justified and interpreted it five years later. In the equation, k=Ae(–Ea/RT), k is the rate coefficient, A is a constant, Ea is the activation energy, R is the universal gas constant, and T is the temperature in degrees Kelvin. Arrhenius initially applied the equation to chemical reactions to describe the speedup of reactions with temperature (reference 2 and reference 3). Engineers now also use it to describe the shorter life of electronics when they run at high temperatures. The equation implies that every 10°C rise in temperature halves the lifetime of the part. Thus, it is essential to reduce silicon temperatures in your designs. If you can reduce IC temperatures from 85 to 65°C, you quadruple the life of those components.
The cause of problems can be not only the static presence of heat or cold, but also the change from one temperature to another. In extreme cases, thermal shock can rip boards and parts into pieces. Temperature gradients, such as those that create small voltage errors, can also cause problems due to the thermocouple effect of the solder and pin materials (Reference 4). Moreover, the temperature gradients themselves can be dynamic. The late Bob Widlar, a pioneering electronics engineer who worked at National Semiconductor, Fairchild, Maxim, and Linear Technology, once received prototype silicon that stopped working at 1 kHz. Widlar discerned that waves of heat were radiating outward from the output transistors. These waves propagated symmetrically through the silicon die. The problem was that the IC had two reference nodes that were unequally spaced from the output transistors. Operating at 1 kHz, one of the referenced nodes was in a thermal trough, while the other was in a thermal crest. This situation so unbalanced the bias circuits that the part stopped working properly. Because of these thermal gradients, some power-supply designers prefer to use controllers rather than ICs with built-in power FETs. With controllers, the heat from the FETs does not wash across the same die and over the amplifiers and reference circuits.
Analyzing heat in your circuit is a three-step process. You estimate the heat produced inside the IC. Then, you estimate the heat that the board or heat sink removes. Finally, you estimate the ambient temperature in which the part will be operating (Figure 4). DC analysis is often trivial when you are estimating the heat that the component produces: A resistor with 1V across it and 1A going through it produces 1W of heat. Estimating the heat that ac or undefined signals produce is more problematic, however. For one thing, the quiescent current that runs from the power to the ground pin is always dissipating a dc-power term. A part with 10V power rails and 5 mA of quiescent current produces 50 mW of heat. However, under operation, that quiescent current may change somewhat. Bias currents and base-drive currents usually increase when they encounter ac signals. The biggest challenge is figuring out how much heat the output current of the part is creating. This estimation may not be obvious. A part can deliver sizable power to a load, but if the output transistors are either all the way on or all the way off, the power the part dissipates internally will be relatively small. With conventional totem-pole output stages, like those that most amplifiers use, outputting a rail-to-rail square wave is not the most thermally demanding task. The worst-case heat production inside the IC occurs when the part outputs a square wave with an amplitude one-half of the power-supply span. If the part is working on ±12V rails, a ±6V p-p square wave creates the most heat in the output stage. A sine-wave output has lower internal heating. If the signals are complex or indeterminate, it may be difficult to estimate the true worst-case heat production of the IC. Reactive loads with large capacitive or inductive components further complicate the power-dissipation estimation. The voltage and current are not in phase, so the simple assumption about a half-swing square wave becomes false.
You can use Spice to estimate power dissipation if you can characterize the signals that the ICs will be passing. You must ensure that the Spice models are proper and that they give reasonable results on a few test signals in which the power-dissipation calculation is trivial. Figure 5 shows a Spice schematic. The power that the chip dissipates differs from the power that arrives at the load. Figure 6 is the Spice plot of the schematic in Figure 5. It shows oscillation in the red trace at start-up. Whether this oscillation will occur in the circuit is anyone's guess, but it should cause you to look for this behavior after you build the prototype. Bear in mind that clicking the W button in Orcad Capture displays only the quiescent power consumption of the chips. To get the operating power dissipation, use the power markers on the schematic and then use the rms-math function on the plot program to give the average power dissipation in the part.
The board or heat sink removes heat from your IC through convection, conduction, or radiation. Conduction removes heat primarily through the metal lead frames and board copper. Once board copper or a discrete heat sink spreads the heat, then convection transfers the heat by providing enough surface area for the heat to dissipate into the air. Radiation is rarely a viable method of heat removal. Satellite designers use radiation because no other way exists to remove heat from the system. Because looking out into space presents a radiant temperature close to absolute zero, the temperature differential is large enough to allow a sufficient amount of heat to transfer to space, so the satellite electronics do not burn up.
Convection involves some complications. For example, airflow has an effect on commercial heat sinks (Figure 7). Note the five-times improvement in thermal resistance with high airflows. Heat sinks that use forced-air cooling have thinner and more closely spaced fins, as examination of a fan-type CPU cooler will prove. If your product has no fan, the heat from your IC will conduct and spread out and then transfer to the air inside the unit. Then, as the whole unit heats up, the heat transfers convectively to the ambient air, along with some conductive transfer if the unit is sitting on your legs. The thermal resistance of the case material then becomes important. A plastic case more slowly transfers heat from the inside to the outside ambient than does a metal case.
Engineers who work on noncabin electronics for fighter jets understand that a jet operates at altitudes as high as 70,000 feet. At that elevation, the air is so thin that convective cooling becomes ineffective. These systems have a cold plate with ethyl-glycol cooling passages that guarantee that the plate will get no hotter than 80°C. Every part physically contacts a metal heat spreader that can take the heat from the components to the edge of the board. At the edge of the board, a thermally effective clamping system presses this heat spreader to the sides of the case. The side of the case takes the heat down to the cold plate on which the case resides. Thermal grease ensures the maximum heat transfer to the cold plate and ensures the maximum transfer from ICs to heat sinks.
Most electrical engineers are comfortable with using thermal resistance as a thermal-analysis technique. You express thermal resistances in units of degrees Celsius per watt. You simply multiply the number of watts you estimated in the first step to get the degrees-in-Celsius temperature increase that the part will experience. Several cautions are in order here. Look for the subscripts on the thermal-resistance specification on the part's data sheet. The thermal resistance from die to case, ΦJC, is not a useful measurement. IC or package designers at the semiconductor manufacturer may care about the IC's temperature rise as heat flows from the die to the case, but you need far more information. The next spec you frequently encounter on the data sheet is the thermal resistance from the junction to ambient, ΦJA. This value measures the temperature rise when the part is not connected to a heat sink or soldered into a PCB (printed-circuit board). Darvin Edwards, a Texas Instruments fellow, points out that ΦJA is a useless measurement for most engineers when predicting junction temperature. "What matters is the thermal resistance from the die to the board [ΦJB] and the thermal resistance from the die to the package surface [ΦJC]," he says. "We use two JEDEC [Joint Electron Device Engineering Council]-standard boards to measure ΦJA to show the engineer it is not a package constant. One is single-sided, and one is multilayer. If you have ΦJB and ΦJC specifications, you have a far better chance of estimating a realistic temperate rise of the IC." He also points out that engineers must remember that the ΦJA measurement takes place with no other chips on the board. When power-supply and other heat-dissipating chips are around the IC and when the board is in a restrictive plastic enclosure with no fan, the actual temperature rise is higher than the ΦJA measurement suggests (Figure 8). Also bear in mind that little heat transfers from the plastic top of most ICs. Epoxy plastic has a 0.6 to 1W/mK (meter-Kelvin) thermal conductivity, and copper has 400W/mK thermal conductivity. Thus, copper is 400 to 600 times more thermally conductive than plastic, and the design of the PCB to maximize thermal conduction is critical.
More sophisticated methods exist for estimating heat removal from the board. National Semiconductor's Webench on-line-design tool uses Flomerics' Flotherm thermal-analysis software to calculate part temperatures in still air. All the usual simulation caveats apply. If your circuit has a fan and some airflow, its temperature will increase less. If it has an enclosure and other parts inside, its temperature will increase more. Flomerics uses finite-element-solution techniques (Reference 5). Figure 9 shows the result of analyzing a computer case for heat generation and airflow. Many other finite-element solvers can analyze this problem, as well. For example, a solver from Comsol can perform multiphysics, so it can solve partial differential equations for more than one problem, such as the thermal response of a part that has a changing thermal conductivity based on its temperature. TI's Edwards points out that his company provides two levels of thermal-modeling abstraction: ΦJB resistance and the Delphi-compact-model standard. Flotherm, Icepak, and many other thermal-analysis programs use these models.
The final step in heat analysis, estimating the ambient temperature, is fundamental. A motorcycle with an air-cooled engine undergoes a certain temperate rise over ambient as you drive it. If the ambient air gets 10° higher, so does the cylinder-head temperature. Your electronic system is the same. For example, your chips may operate at 50°C on your lab bench where the air is 25°C. When you place those chips in a 50°C ambient temperature, the chips' temperature will reach 75°C. In this step of analyzing heat, engineers sometimes fail to account for the ambient environment in which their parts may have to work. Aside from simply working, those parts must also survive. For example, the rework-paint oven in an auto plant exposes all the electronics to higher temperatures than they would ever see in a car's remaining lifetime. Mercifully, the parts can survive this treatment because the automaker does not power them up during this process. Many engineers do not appreciate how extreme the environment can get. We all know satellites in outer space can have temperature swings from a few degrees above absolute zero to hundreds of degrees Celsius as they go from the shade to the sun.
Challenging environments abound here on earth, as well. Bruce Robinson, test-development engineer for Nissan America, works at the automaker's desert proving grounds in Arizona. He reports that Nissan generally estimates maximum temperatures as: 46°C ambient day temperature, 81°C interior-air temperature, 111°C maximum instrument-panel-surface temperature, and 82°C interior-component temperature. In other words, you can boil water on the top of the instrument panel. Think about this fact if you design vehicle electronics.
Without question, most engineers trip up when they fail to understand the nested levels of ambient temperature. For example, imagine designing a part that goes on the optical-pickup unit of a CD player (Figure 10). You might assume that, because the part is for a consumer product, it could operate at 0 to 70°C. Think twice. The part on your lab bench may be operating in a 25°C environment. However, the optical-power unit mounts inside the CD drive. Other components inside the drive heat the air. The unit may not have a fan. Even worse, the player resides in a computer. The drive must work in that ambient temperature. The inside of the computer has its own heat sources and fans. The outside-world ambient temperature adds on to all this heat. So, the 25°C ambient temperature you measured on the bench becomes 40°C in the computer and 50°C inside the CD drive. Now, what if you put the computer in a hot upstairs room in Ecuador? The part might have to operate at ambient temperature far above 70°C. It is your job to make sure that it can still meet specifications and that the high temperatures do not radically shorten the product's life.
A dose of reality
Performing design estimates and Spice simulations is fine, but, at some time in the development process, you must face the reality of what you have designed. Reality involves prototyping the circuit in the correct form, fit, and finish. Then, you can use various measurement techniques to verify all the nice theory you've done so far. It is imperative to re-create the expected operating environment as closely as possible. Your priorities are to determine first whether the circuit will break, then whether it will last, and finally whether it will work as expected in all conditions.
You may recall that a poorly designed and installed in-flight-entertainment system caused the crash of Swissair Flight 111 on Sept 2, 1998 (Reference 6). Arcing from wiring of the in-flight-entertainment network ignited flammable covering on insulation blankets and quickly spread across other flammable materials. If the designers at the small company that produced the system had insisted on testing in the 8000-foot-altitude atmosphere of a passenger plane, they would have understood that the disk-drive heads flew closer to the platters and that the heat of the entire system was difficult to remove. TI's Edwards points out that 10,000 feet of altitude reduces the convective cooling of systems by 20%. Verifying that all the engineering assumptions correlate with reality is the only way you can ensure that the design will perform electrically as well as thermally. The 229 passengers on Swissair 111 lost their lives because the in-flight system's designers bypassed this reality check.
Two essential measuring devices for all engineers are their sense of touch and their sense of smell. Most of you are all too familiar with the pungent odor of melted electronics. Those with good olfactory senses can even smell the subtle odor from a chip that is approaching 70°C. You can also put touch to good use on circuits that contain no lethal voltages. If you can hold your finger on the part for more than five seconds, the part's temperature is lower than 70°C. Most people overestimate the heat they sense with their fingers. Often, they estimate a temperature of 70°C when it is only 50°C. If you wet your finger, wipe the part with it, and the part sizzles, you are in trouble, because having any part at a temperature higher than 100°C is bad news. Again, the ambient temperature around your lab bench is the most beneficial environment.
Once you have made rough estimates, you must do some real measurement. Most DVMs (digital voltmeters) have accessories that allow you to connect thermocouples. Fluke and other vendors make handheld instruments that accept two thermocouples for measuring the chip along with the ambient temperature around it. You should measure the IC's temperature increase over the ambient temperature. National Instruments, IO Tech, and many other data-acquisition-equipment manufacturers can help you set up measurement systems with hundreds of thermocouples, thermistors, and platinum RTD (resistance-temperature-detector) sensors. Be careful regarding the size of the sensor and the gauge of the wires. When you measure a small IC, the thermocouple wire can conduct heat away, just like a heat sink can, and this conductance lowers the measured temperature. Many manufacturers also offer noncontact-IR (infrared) detectors. When using them, however, note the emissivity of the surface you are measuring. "Emissivity" is a measure of the thermal emittance of a surface—the fraction of energy an object emits relative to that of a "black body," or thermally black surface. A black body is a perfect emitter of heat energy in that it emits all energy it absorbs and has an emissivity value of 1. In contrast, a material with an emissivity value of 0 would be a perfect thermal mirror (Reference 7). A shiny metal package has low emissivity and would thus yield a lower-than-actual-temperature reading. Flat-finish black paint has an emissivity of 1, which is the value that IR detectors measure against. To achieve an emissivity value of 1 for your electronics, you can spray them with flat-finish black paint or simply put a piece of clear tape on the metal package, which yields an emissivity value approaching 1.
Many savvy semiconductor manufacturers measure the temperature of the die itself, even when the part is operating in a circuit, using one of the ESD (electrostatic-discharge) diodes that are available on every input, output, and control pin of an IC (Figure 11). You can use this method for ICs having reset or CS (chip-select) lines. You can use many other pins for the measurement, as well. Because the forward drop of a diode is directly proportional to current, you can put the chip into an oven and run a small current through the ESD diode. Most people in the industry feel that a current of 100 μA does not cause any self-heating of the diode. You need not power up the part to measure the diode voltage; you can use any input or output pin above the power pin or below the ground pin. The internal ESD diodes on the pin clamp that pin to approximately 0.6V. If the pin is a reset that needs to remain high for the part to work, then pull the pin above the power pin. As the oven temperature increases, the ESD diode's forward voltage falls from about 0.7 to 0.53V. Similarly, if the extra pin is a chip select that must remain low for the IC to operate, you can pull that pin below the ground pin and take your data for that ESD diode. If the pin is an output, check with the manufacturer to ensure no extraneous currents will prevent the full 100 μA from traversing the diode. You must measure this data for each kind of IC you are measuring; different processes have different voltage/overtemperature relationships. When you are ready to measure the IC as it operates in your circuit, you inject 100 μA into the pin to raise it above VCC or pull 100 μA from the pin to drop it below ground. Then, you can measure the voltage difference and infer the die temperature.
The ESD method is valuable but has limitations. If the IC delivers hundreds of milliamperes, voltage drops may occur internally on the VCC or ground-metallization and bond wires. These voltage drops may add to or subtract from your ESD-diode-voltage measurement. You should consult with the application group or even the IC designer if this situation occurs. To counteract the voltage drop, you can stop the power delivery as you are taking the measurement. Be aware that the thermal time constant of silicon chips is microseconds, so you must take the measurement with a fast scope or acquisition system to ensure that you have not measured the ESD diode's forward voltage after the diode has cooled substantially.
Another worry with the ESD-diode method is that IC chips are not isothermal—that is, they do not have equal or constant temperature with respect to either space or time. Measuring the ESD diode does not always ensure that you have measured the hottest point of the die. The concern here is that the ESD diode, which is always on the edge of the chip, is cooler than the output transistors. You can take an IR-thermal-camera image of the IC die as the IC operates (Figure 12). The bright, white spot in the figure is a full 25°C higher than the edge of the die where the ESD diode resides. The part may need derating when it operates at elevated temperatures (Reference 8). At 150°C, the part may not meet the circuit's needs.
You can use an equally valuable method as the ESD-diode technique to measure the temperature of FETs, even as they operate. This method takes advantage of the fact that the on-resistance of a FET is directly proportional to its temperature. The higher the temperature of the FET, the higher its on-resistance is. By noting the on-resistance at various temperatures, you can infer the temperature of the FET by measuring the voltage across it and the current through it while it operates in the on mode. This method works even for integrated FETs in power-supply chips. Remember that self-heating is always an insidious phenomenon in electronics, so, when you take your on-resistance data in an oven, you must apply a short, fast rise-time pulse current to the FET to ensure that the die is at the same temperature as the oven.
Taking the measurement is just one component of checking reality to verify your assumptions and estimations. You also must create the ambient temperature if it is not readily available. Automob