# Radiation-Induced Soft Errors in Advanced Semiconductor Technologies

Robert C. Baumann, Fellow, IEEE

Invited Paper

*Abstract*—The once-ephemeral radiation-induced soft error has become a key threat to advanced commercial electronic components and systems. Left unchallenged, soft errors have the potential for inducing the highest failure rate of all other reliability mechanisms combined. This article briefly reviews the types of failure modes for soft errors, the three dominant radiation mechanisms responsible for creating soft errors in terrestrial applications, and how these soft errors are generated by the collection of radiation-induced charge. The soft error sensitivity as a function of technology scaling for various memory and logic components is then presented with a consideration of which applications are most likely to require soft error mitigation.

*Index Terms*—Radiation effects, reliability, single-event effects, soft errors.

## I. INTRODUCTION

S the dimensions and operating voltages of computer A electronics are reduced to satisfy the consumer's insatiable demand for higher density, functionality, and lower power, their sensitivity to radiation increases dramatically. There are a plethora of radiation effects in semiconductor devices that vary in magnitude from data disruptions to permanent damage ranging from parametric shifts to complete device failure [1], [2]. Of primary concern for commercial terrestrial applications are the "soft" single-event effects (SEEs) as opposed to the "hard" SEEs and dose/dose-rate related radiation effects that are predominant in space and military environments. As the name implies, SEEs are device failures induced by a single radiation event. The author uses the term soft error throughout the text to encompass the key SEE that affects commercial semiconductor technologies. But it is useful to be aware of the different failure modes.

A soft error occurs when a radiation event causes enough of a charge disturbance to reverse or flip the data state of a memory cell, register, latch, or flip-flop. The error is "soft" because the circuit/device itself is not permanently damaged by the radiation—if new data are written to the bit, the device will store it correctly. The soft error is also often referred to as a single event upset (SEU). If the radiation event is of a very high energy, more than a single bit maybe affected, creating a multibit upset (MBU) as opposed to the more likely single

Digital Object Identifier 10.1109/TDMR.2005.853449

bit upset (SBU). While MBUs are usually a small fraction of the total observed SEU rate, their occurrence has implications for memory architecture in systems utilizing error correction [3], [4]. Another type of soft error occurs when the bit that is flipped is in a critical system control register such as that found in field-programmable gate arrays (FPGAs) or dynamic random access memory (DRAM) control circuitry, so that the error causes the product to malfunction [5]. This type of soft error, called a single event interrupt (SEFI), obviously impacts the product reliability since each SEFI leads to a direct product malfunction as opposed to typical memory soft errors that may or may not effect the final product operation depending on the algorithm, data sensitivity, etc. Radiation events occurring in combinational logic result in the generation of single event transients (SET) that, if propagated and latched into a memory element, will lead to a soft error [6]. The last mode in which an SEE can cause disruption of electrical systems is by turning on the complimentary metal-oxide-semiconductor (CMOS) parasitic bipolar transistors between well and substrate-inducing a latch-up [7], [8]. The only difference between single event latch-up (SEL) and electrical latch-up is that the current injection that turns on the parasitic bipolar elements is provided by the radiation instead of an electrical overvoltage. SEL can also be debilitating since its occurrence will necessitate a full chip power down to remove the condition, and in some cases can cause permanent damage.

## II. THE TERRESTRIAL RADIATION ENVIRONMENT

## A. What Radiation Does in Silicon

The terrestrial environment is dominated by three different mechanisms (described in the next section) that generate (either directly or as secondary reaction products) energetic ions that are responsible for inducing soft errors. The magnitude of the disturbance an ion causes depends on the linear energy transfer (LET) of that ion (typically reported in units of megaelectron volt square centimeter per milligram). In a silicon substrate, one electron hole pair is produced for every 3.6 eV of energy lost by the ion. With a simple conversion, the LET can be plotted in more convenient units of charge loss per distance as illustrated in Fig. 1. The LET is dependent on the mass and energy of the particle and the material in which it is traveling. Typically, more massive and energetic particles in denser materials have the highest LET. Note that the LET of a magnesium ion (one

Manuscript received February 14, 2005; revised April 6, 2005.

The author is with Texas Instruments, Dallas, TX 75243 USA (e-mail: rbaum@ti.com).



Fig. 1. LET converted into charge generation per linear distance for some ions in silicon (generated with IBM's SRIM 2000) [9].

of the ions commonly produced when a high-energy neutron reacts with a silicon nucleus) is significantly higher than that of alpha particles (helium ion emitted from radioactive impurities in materials) or lithium ions (emitted from the reaction of low-energy neutrons and <sup>10</sup>Boron). Charge collection generally occurs within a few microns of the junction, thus the collected charge ( $Q_{coll}$ ) for these events is from 1 to several 100 fC depending on the type of ion, its trajectory, and its energy over the path through or near the junction.

The reverse-biased junction is the most charge-sensitive part of circuits, particularly if the junction is floating or weakly driven (with only a small drive transistor or high resistance load sourcing the current required to keep the node in its state). As shown in Fig. 2, at the onset of an ionizing radiation event, a cylindrical track of electron hole pairs with a submicron radius and a very high carrier concentration is formed in the wake of the energetic ion's passage (a). When the resultant ionization track traverses or comes close to the depletion region. carriers are rapidly collected by the electric field creating a large current/voltage transient at that node. A notable feature of the event is the concurrent distortion of the potential into a funnel shape [10]. This funnel greatly enhances the efficiency of the drift collection by extending the high field depletion region deeper into the substrate (b). The size of the funnel is a function of substrate doping-the funnel distortion increasing for decreased substrate doping. This "prompt" collection phase is completed within a nanosecond and followed by a phase where diffusion begins to dominate the collection process (c). Additional charge is collected as electrons diffuse into the depletion region on a longer time scale (hundreds of nanoseconds) until all excess carriers have been collected, recombined, or diffused away from the junction area. The corresponding current pulse resulting from these three phases is also shown in Fig. 2. In general, the farther away from the junction that the event occurs, the smaller the amount of charge that will be collected and the less likely it is that the event will cause a soft error. In actual circuits, a node is never isolated but is actually part of a complex "sea of nodes" in close proximity to one another; thus, charge sharing among nodes and parasitic bipolar action (the formation of an unintentional bipolar transistor between junctions and wells) can greatly influence the amount

of charge collected and the size and location of voltage/current glitches in the circuit.

The magnitude of  $Q_{coll}$  depends on a complex combination of factors including the size of the device, biasing of the various circuit nodes, substrate structure, device doping, the type of ion, its energy, its trajectory, the initial position of the event within the device, and the state of the device. However,  $Q_{coll}$ is only half the story, as the device's sensitivity to this excess charge needs to be taken into account. This sensitivity is defined primarily by the node capacitance, operating voltage, and the strength of feedback transistors, all defining the amount of charge or critical charge  $(Q_{crit})$  required to trigger a change in the data state. The response of the device to the charge injection is dynamic and dependent on the magnitude and the temporal characteristics of the pulse, and thus  $Q_{crit}$  is not constant but depends on the radiation pulse characteristics and the dynamic response of the circuit itself, making the effect extremely difficult to model [11]. For simple isolated junctions (like DRAM cells in storage mode), a soft error will be induced when a radiation event occurs close enough to a sensitive node such that  $Q_{coll} > Q_{crit}$ . Conversely, if the event results in a  $Q_{\rm coll} < Q_{\rm crit}$ , then the circuit will survive the event and no soft error will occur. In SRAM or other logic circuits where there is active feedback, there is an additional term to comprehend the speed with which the circuit can react-slower speeds allow more time for the feedback circuit to restore the corrupted node value and thereby reduce the probability of a soft error. This additional term tends to increase the effective  $Q_{\rm crit}$ .

The rate at which soft errors occur is called the soft error rate (SER). The unit of measure commonly used with SER and other hard reliability mechanisms is the failure in time (FIT). One FIT is equivalent to one failure in  $10^9$  device hours. Soft errors have become a huge concern in advanced computer chips because, uncorrected, they produce a failure rate that is higher than all the other reliability mechanisms combined! For example, a typical failure rate for a "hard" reliability mechanism (such as gate oxide breakdown, metal electromigration, etc.) is about 1–50 FIT. There are a half-dozen critical reliability mechanisms degrading integrated circuit performance, but in general the aggregate failure rate is typically in the 50–200 FIT range. In stark contrast, without mitigation, the SER can easily exceed 50 000 FIT/chip!

## B. Alpha Particles

In the late 1970s, alpha particles emitted by trace uranium and thorium impurities in packaging materials were shown to be the dominant cause of soft errors in DRAM devices [12]. The alpha particle is composed of two neutrons and two protons—a doubly ionized helium atom emitted from the nuclear decay of unstable isotopes. The most common source of alpha particles are from the naturally occurring <sup>238</sup>U, <sup>235</sup>U, and <sup>232</sup>Th. These impurities emit alpha particles at specific discrete energies over a range from 4 to 9 MeV. When an alpha particle travels through a material, it loses its kinetic energy predominantly through interactions with the electrons of that material and thus leaves a trail of ionization in its wake. The higher the energy of the alpha particle, the farther it travels



Fig. 2. Charge generation and collection phases in a reverse-biased junction and the resultant current pulse caused by the passage of a high-energy ion.



Fig. 3. Alpha energy spectrum obtained from a thick foil of solid Th-232. Note that the discrete alpha particle emission energies are broadened due to energy lost in travelling different (random) distances before reaching the surface and being detected.

before being "stopped" by the material. The distance required to stop an alpha particle (its range) is both a function of its energy and the properties of the material (primarily the material's density) in which it is traveling. In silicon, the range for a 10-MeV alpha particle is  $< 100 \ \mu\text{m}$ . Thus, alpha particles from outside the packaged device are clearly not a concern—only alpha particles emitted by the device materials and packaging materials need be considered. The energy spectrum of alpha particles emitted from the surface of a thick sample of  $^{232}$ Th (the spectrum from  $^{238}$ U is similar in that the bulk of the emission is in the 4–6 MeV range) is shown in Fig. 3. This broad energy spectrum is characteristic of the alpha particle flux in packaged ICs as the discrete emission energies are "smeared-out" since alpha emitters are generally uniformly distributed in the different materials.

Since virtually all semiconductor materials are highly purified, the alpha emitting impurities will generally not be in equilibrium. Alpha counting must be used to determine the alpha emission since the exact nature of parent/daughter distributions is seldom known. In other words, a low concentration of uranium and thorium impurities is a necessary requirement for low alpha emission but not sufficient. In nonequilibrium situations, higher activity daughters may be present that greatly increase the alpha emission rate. This situation was highlighted during investigations into eutectic lead solders (for flip-chip bumps) in which all radioactive impurities had been eliminated except the radioactive  $^{210}$ Pb that was chemically inseparable from the  $^{206\,208}$ Pb. Since  $^{210}$ Pb does not emit an alpha particle when it decays, initial alpha counting measurements revealed the solder to be emitting alpha particles at extremely low levels. With the relatively short half-life of  $^{210}$ Pb, a regrowth of the alpha emitter  $^{212}$ Po (from the decay of  $^{210}$ Pb  $\Rightarrow ^{210}$ Bi  $\Rightarrow ^{210}$ Po) occurred and within a few months the solder alpha emission was  $10 \times$  higher than initial measurements indicated.

There are two fundamental approaches to reducing the SER from alpha particles in ICs: purification of all production materials in close proximity to the IC, and methods that reduce the probability that alpha particles emitted from materials will reach the sensitive devices. Material and IC vendors are always scrutinizing their processes and raw materials to eliminate the major causes of contamination. As a result, most of the IC and packaging materials went from emitting alpha particles at rates as high as 100  $\alpha/h cm^2$  down to levels below 0.001  $\alpha/h$  cm<sup>2</sup>. As a prerequisite that a material be ultra low alpha (ULA implies emission at or below 0.002  $\alpha/h \text{ cm}^2$ ), the <sup>238</sup>U and <sup>232</sup>Th impurity content must be below about one part per 10 billion. Again, this is not a guarantee that the material will meet the ULA emission specification since higher activity daughters may regrow. To ensure that the alpha emission rate is a low-enough measurement of the alpha particles emitted, direct alpha counting techniques must be employed. In lead-based solders, this is especially true, where chemical separation will leave known radioactive daughter products, samples should be measured several times over several months to ensure that there is no significant ingrowth of alpha-emitting daughter products that would increase the material's alpha particle emission. One of the challenges of advanced technologies is verifying that all materials meet or exceed the ULA specification. In the majority of CMOS devices, if semiconductor manufacturing and packaging materials could be purified such that together



Fig. 4. Cosmic ray differential neutron flux as a function of neutron energy at sea level. Adapted from [13].

they contributed  $< 0.001 \ \alpha/h \ cm^2$ , the fraction of soft errors from alpha particles would fall to less than 20% in most cases (based on accelerated testing and simulation results). At this point, further emission reduction becomes prohibitively expensive while providing diminishing returns since the SER is dominated by cosmic background radiation.

## C. High-Energy Cosmic Rays

The second significant source of SER is related to cosmic ray events. Primary cosmic rays are thought to be of galactic origin. They react with the Earth's atmosphere via the strong interaction and produce complex cascades of secondary particles. These in turn continue on deeper into the atmosphere, creating tertiary particle cascades, and so on. At terrestrial altitudes (as opposed to flight or satellite altitudes), less than 1% of the primary flux reaches the sea level where the flux is isotropic and composed of muons, protons, neutrons, and pions [13]. Neutrons are one of the higher flux components, and since neutron reactions have higher LETs, they are the most likely cosmic radiation to cause upsets in devices at terrestrial altitudes (assuming <sup>10</sup>B and alpha emitting impurities have been minimized). The "accepted" cosmic differential neutron flux at sea level is shown in Fig. 4. This curve defines how many neutrons over the given energy range are incident on a device at sea level. Recent work has been published improving the accuracy of this data [14], [15]. The neutron flux is strongly dependent on altitude with the intensity of the cosmic ray neutron flux increasing with increasing altitude. For example, in going from sea level to 10000 ft, the cosmic ray flux increases  $10 \times$  (this trend starts to saturate at about 50000 ft.). Hence, altitude can have a significant impact on a customer's perceived SER. Due to proton shielding effects induced by interactions with the Earth's magnetic field, the neutron flux is also dependent on magnetic rigidity-on geographical location (this effect is less pronounced than the variation due to altitude). A clear and comprehensive assessment of terrestrial cosmic radiation as a function of altitude and location has been published [16], [17].

TABLE I Reaction Products and Threshold Energies for n + <sup>28</sup>Si Reactions

| Reaction Product        | Threshold Energy<br>(megaelectron volt) |
|-------------------------|-----------------------------------------|
| $^{25}Mg + \alpha$      | 2.75                                    |
| $^{28}Al + p$           | 4.00                                    |
| $^{27}Al + d$           | 9.70                                    |
| $^{24}Mg + n + \alpha$  | 10.34                                   |
| $^{27}Al + n + p$       | 12.00                                   |
| $^{26}Mg + {}^{3}He$    | 12.58                                   |
| $^{21}$ Ne + 2 $\alpha$ | 12.99                                   |



Fig. 5. Burst generation rate (per cubic micrometer hour) versus neutron energy for various burst energies. Note that the probability of a burst event drops as the burst energy increases—larger higher-energy bursts are rarer than smaller low-energy bursts. Adapted from [22].

Since neutrons themselves do not directly generate ionization in silicon, the neutron flux alone does not define the cosmic component of SER. Neutrons interact with chip materials elastically and inelastically. Inelastic reactions typically end with the excited nucleus breaking into lighter fragments. The reaction cross sections for both elastic and inelastic reactions decrease rapidly with increasing neutron energy (generally following a 1/E dependence). Nuclear physics simulations have been used to calculate the distributions in energy of reaction products generated as a function of incident neutron energy [18], [19]. Table I summarizes some of the reactions that occur when a neutron interacts with a silicon nucleus.

When the silicon nucleus fragments in these inelastic reactions, the resultant products are a lighter ion with additional particles (neutrons, protons, and/or alpha particles). Kinetic energy is shared among the particles and momentum is conserved so the particles tend to be emitted in opposing directions, such that only one reaction product will cause the soft error [20]. Note that as the energy of the incident neutron gets higher, the number of reaction pathways increases. Similar reactions also occur with neutrons and oxygen, and since  $SiO_2$  is in close proximity to the active junction areas, these reactions can also contribute to the overall SER [21]. Some simulation results are shown in Fig. 5, illustrating the charge burst generation rate in silicon as a function of different neutron energies and burst energies [22]. The probability of higher energy bursts increases with increasing neutron energy. More importantly, however, the burst generation rate drops rapidly as the recoil energy is increases. In fact, a 1-MeV burst is 100–3000 times more likely than a 15-MeV burst. The LET for silicon reaction products is significantly higher than that of alpha particles so when they occur cosmic events have a significantly higher potential to upset semiconductor devices as compared to alpha particle events. Additionally, certain soft error effects (described earlier) like MBU and SEL cannot generally be induced by alpha particles because the LET threshold for these types of 0.1 events is above 16 fC/ $\mu$ m. Thus, MBU and SEL are typically due to high-energy neutron effects.

Unlike alpha particles, the cosmic neutron flux cannot be reduced significantly at the chip level with shielding, keep-out zones, or high purity materials. Concrete has been shown to shield the cosmic radiation at a rate of approximately  $1.4 \times$  per foot [23] of concrete thickness. Thus, while the SER due to cosmic neutrons of a system operating in a basement surrounded by many feet of concrete could be significantly reduced (a viable option for mainframes, base stations, etc.) for personal desktop applications or portable electronics, little can be done to reduce the cosmic ray portion of the SER. Cosmic ray SER must therefore be dealt with by reducing device sensitivity, either by design or process modifications.

#### D. Low-Energy Cosmic Rays

The third significant source of ionizing particles in electronic devices is the secondary radiation induced from the interaction of low-energy cosmic ray neutrons and boron. The author has already discussed the source of neutrons from cosmic rays. While the previous discussion focused on high-energy neutron reactions, here the author is concerned with very low-energy neutrons ( $\ll 1 \text{ MeV}$ ). Boron is used extensively as a p-type dopant and implant species in silicon and is also used in the formation of boron-doped (2-8% by weight) phosphosilicate glass (BPSG) dielectric layers. Boron is composed of two isotopes, <sup>11</sup>B (80.1% abundance) and <sup>10</sup>B (19.9% abundance). The <sup>10</sup>B is unstable when exposed to neutrons (the <sup>11</sup>B also reacts with neutrons; however, its reaction cross section is nearly a million times smaller, and its reaction products, gamma rays, are much less damaging). At 3838 b (1 b = $10^{-24} \text{ cm}^2/\text{nucleus}$ ), the thermal neutron capture cross section of <sup>10</sup>B is extremely high in comparison to most other isotopes present in semiconductor materials-by three to seven orders of magnitude. Unlike most isotopes that emit gamma photons after absorbing a neutron, the <sup>10</sup>B nucleus breaks apart with an accompanying release of energy in the form of an excited <sup>7</sup>Li recoil nucleus and an alpha particle (a prompt gamma photon is also emitted from the lithium recoil soon after fission occurs). In the  ${}^{10}B(n, \alpha)^{7}Li$  reaction, the alpha particle and the lithium nucleus are emitted in opposite directions to conserve momentum. The lithium nucleus is emitted with a kinetic energy of 0.840 MeV 94% of the time and 1.014 MeV 6% of the time. The alpha particle is emitted with an energy of 1.47 MeV. The alpha and the lithium recoil are both capable of inducing soft errors in electronic devices, particularly with advanced lower voltage technologies. The lithium recoil has



Fig. 6. Cumulative probability function based on  ${}^{10}\text{B}$  cross section and cosmic neutron background flux versus neutron energy. As illustrated by the dotted line, 90% of the  ${}^{10}\text{B}$  reactions are caused by neutron energies below 15 eV; thus, this process is dominated by low-energy neutrons.

a peak LET of 25 fC/ $\mu$ m while that of the alpha particle is 16 fC/ $\mu$ m. The alpha and the lithium recoil are both capable of inducing soft errors in electronic devices, particularly in advanced low-voltage technologies. Since the <sup>10</sup>B capture cross section decreases rapidly as neutron energy is increased, only neutrons in the epithermal energy range need to be considered. A calculation based on convolving the cross-sectional curve with the cosmic background neutron flux has shown that 90% of the reactions are caused by neutrons with energies below 15 eV, as illustrated in Fig. 6. Assuming maximum doping and implant levels encountered in standard processes and a BPSG layer containing 5% boron, the <sup>10</sup>B concentration in diffusions and implants (which are predominantly <sup>11</sup>B) is thousands of times lower than that of the BPSG layer. The range of the alpha particle and lithium recoil is  $< 3 \ \mu m$ , and calculations have shown that in most cases beyond  $\sim 0.5 \ \mu m$ they have insufficient energy to induce soft errors. Thus, generally only <sup>10</sup>B in close proximity to the silicon substrate should be considered as a threat. For conventional BPSG-based semiconductor processes, the BPSG is the dominant source of boron reactions and in some cases can be the primary cause of soft errors [24], [25].

The SER due to the activation of  ${}^{10}\mathrm{B}$  in BPSG can be mitigated in several ways [26]. The first and most direct is simply to eliminate BPSG from the process flow. Due to the limited range of the alpha and lithium recoil emitted during the  ${}^{10}B(n, \alpha)^{7}Li$  reaction, only the first level of BPSG needs be replaced with a dielectric free of <sup>10</sup>B. In cases where the unique reflow and gettering properties of boron are needed, the regular BPSG process can be replaced by an enriched <sup>11</sup>BPSG process without changing the physical or chemical properties of the film and without the requirement for new equipment or processing steps. Package-level environment of devices is the sum of three mechanisms; alpha particles emitted from the radioactive impurities in the device materials, terrestrial cosmic radiation in the form of high-energy neutrons, and <sup>10</sup>B reactions induced by the low-energy neutrons from the cosmic background. To accurately determine the SER of any product, the SER for each of the three components must be accounted for.



Fig. 7. (a) DRAM scaling parameters, normalized cell capacitance, normalized junction volume, and cell voltage as a function of technology node. (b) DRAM single bit SER and system SER as a function of technology node.

## III. RESULTS—TECHNOLOGY SCALING TRENDS

## A. Memory SER Sensitivity

To create the functionality provided by today's electronic systems and appliances, several distinct components must be integrated together. At the core of each system is a micro-processor or digital signal processor with large embedded memories (usually SRAM) interconnected with a slew of peripheral logic. In larger systems, discrete main memory (usually DRAM) is also used. Finally, all systems have some analog or digital input/output components to allow the device to respond and interact with the outside world. The SER of these various components behaves differently as the technologies are scaled.

It is somewhat ironic that soft errors were first discovered to be a problem in DRAM, because after many generations, it is currently one of the more robust electronic devices. DRAM bit SER was high when manufacturers used planar capacitor cells that stored the signal charge in two-dimensional (2-D) large-area junctions because these were very efficient at collecting radiation-induced charge. To address pause refresh and soft error problems while increasing packing density, DRAM manufacturers have developed three-dimensional (3-D) capacitor designs that significantly increase  $Q_{\rm crit}$  while greatly reducing junction collection efficiency by eliminating the large storage junction in silicon [27]. Collection efficiency decreases with the decreasing volume of the junction (junction/well doping also plays a role) while cell capacitance remains relatively constant with scaling since it is dominated by the external 3-D capacitor cell. These DRAM device scaling trends are illustrated in Fig. 7(a) along with DRAM cell voltage scaling. Voltage reduction has reduced  $Q_{\rm crit}$ , but with concurrent aggressive junction volume scaling, a much more significant reduction in collected charge is observed. The net result to DRAM SER performance is shown in Fig. 7(b), with the SER of a DRAM single bit shrinking about  $4-5 \times$  per generation. While DRAM bit SER has been reduced by more than 1000 times over seven generations, the DRAM system SER has remained essentially unchanged. System requirements have increased the memory density (bits/system) almost as fast as the SER reduction provided by technology scaling. Thus, DRAM system reliability has remained roughly constant over many generations. So contrary to the popular misconception that DRAM SER is problematic, undoubtedly left over from the days when DRAM designs utilized planar cells, DRAM is one of the more robust devices in terms of soft error immunity.

In contrast, early SRAM was more robust against SER because of high operating voltages and the fact that data in an SRAM are stored as an active state of a bistable circuit made up of two cross-coupled inverters, each strongly driving the other to keep the SRAM bit in its programmed state. The  $Q_{\rm crit}$  for the SRAM cell is largely defined by the charge on the node capacitance as with DRAM but with a dynamic second term related to the drive capability of the transistor keeping the node voltage at the proper value-the stronger the transistor, the more charge that must be collected for the node voltage to reach the switching threshold. With technology scaling, the SRAM junction area has been deliberately minimized to reduce capacitance, leakage, and cell area, while, simultaneously, the SRAM operating voltage has been aggressively scaled down to minimize power. These device scaling trends are shown in Fig. 8(a). With each successive SRAM generation, reductions in cell collection efficiency due to shrinking cell depletion volume have been cancelled out by big reductions in operating voltage and reductions in node capacitance. It can be seen that SRAM single bit SER was initially increasing with each successive generation, particularly in products using BPSG as illustrated in Fig. 8(b). Most recently, as feature sizes have been reduced into the deep submicron regime ( $< 0.25 \ \mu m$ ), the SRAM bit SER has saturated and may even be decreasing. This saturation is primarily due to the saturation in voltage scaling, reductions in junction collection efficiency, and increased charge sharing due to short-channel effects with neighboring nodes. It should be noted that the SRAM curve has been presented at nominal use voltage for each technology node, not constant voltage. Under constant voltage, the SRAM bit sensitivity will actually be seen to decrease with each technology node beyond 0.25  $\mu$ m. Ultimately, scaling also implies increased memory density, so the saturation in SRAM bit SER does not translate into a saturation in the SRAM system SER. The exponential growth in the amount of SRAM in microprocessors and digital signal processors has led the SER to increase with each generation with no end in sight. This trend is of great concern to chip manufacturers since SRAM constitutes a large part of all advanced integrated circuits today.



Fig. 8. (a) SRAM parameters, normalized storage node capacitance, normalized junction volume, and voltage as a function of technology node. (b) SRAM single bit and system SER as a function of technology node. Note the reduction in SER following the 0.25- $\mu$ m node due to BPSG elimination (dotted lines show simulated SRAM SER with BPSG present).

#### B. Sequential/Combinational Logic SER Sensitivity

The computer's discrete and embedded SRAM and DRAM memories would be useless without the peripheral logic that interconnects them. While less sensitive than SRAM, logic devices can also experience soft errors [28]-[31]. Sequential logic elements include latches and flip-flops used to hold system event signals and to buffer data before it goes in or out of the microprocessor and to interface to combinational elements that perform logical operations based on multiple inputs. The SER of these devices and its impact on the system are much harder to quantify since their period of vulnerability (when they are actually doing something critical in the system versus simply waiting) varies widely depending on the circuit design, frequency of operation, and the actual algorithm being executed. Flip-flops and latches are fundamentally similar to the SRAM cell in that they use cross-coupled inverters to store the data state. However, they tend to be more robust because they are usually designed with more transistors for each node, and these devices are often larger and capable of sourcing larger currents that can more easily compensate for spurious charge collected during radiation events. Ultimately, the reliability concern with sequential and combinational logic circuits is that, like SRAM, their SER sensitivity is also increasing with



Fig. 9. Comparison of SRAM bit SER with logic bit (flip-flop/latch) SER obtained from test structures, product characterizations, and/or simulations. The gray region at the bottom of the plot represents the effective bit failure rate of SRAM with error correction employed (the actual failure rate is dependent on how often the memory is accessed). It should be noted that the large variation in logic SER is due to the dozens of types of logic tested, not experimental error. The 90-nm data are based only on a single test chip; thus, the average and range of SER have been adjusted based on 130-nm studies. These data have not been derated for dynamic or logic timing/masking effects.

scaling as illustrated in Fig. 9. Soft errors in logic are of particular concern in high-reliability systems whose memory has been protected by error correction where the peripheral logic failure rate may be the dominant reliability failure mechanism. Similar findings (based on simulation) were reported by others [32].

In a combinational circuit where the output is based on a logical relation to the inputs (with no capability for retention), if enough radiation-induced charge is collected, a shortlived transient in the output will be generated (a single-event transient or SET) [33]. If this radiation-induced "glitch" is actually propagated to the input of a latch or flip-flop during a latching clock signal, the erroneous input will be "latched" and will be stored. For older technologies, the SET could not propagate since it usually could not produce a full output swing and/or was quickly attenuated due to large load capacitances and large propagation delays. In advanced technologies where the propagation delay is reduced and the clock frequency is high, the SET can more easily traverse many logic gates, and the probability that it is latched increases. SET-induced soft errors are not expected to become an issue until the 65-nm technology node or beyond. It should be noted that once an SET can propagate freely, synchronous and especially asynchronous (self-clocked) circuits would be extremely sensitive to such events. In technology nodes beyond 90 nm and at high product operating frequencies, there is an increased risk that a large fraction of observed soft failures will be related to latched SET events.

#### IV. DISCUSSION: RELIABILITY IMPLICATIONS

### A. Product Reliability Impact

The author has already explained that one FIT is one error in a billion device hours and that advanced processors with large multimegabit-embedded SRAM can easily have soft failure

311



Fig. 10. Monthly system SER as a function of the number of chips in the system and the amount of embedded SRAM per chip. SER in single chip systems will rarely be a reliability concern while in large multichip systems the uncorrected SER is clearly a liability.

rates in excess of 50 000 FIT per chip. An SER of 50 000 FIT is equivalent to about one soft fail every 2 years (assuming the component is used 24 h/day).

If a digital signal processor is used in a cell phone application, will the failure rate of 50 000 FIT affect the customer's perception of cell phone reliability? Probably not, since, in reality, given that the phone will not be operated all the time and that the soft failure can occur anywhere in the chip (only if the error occurs in one of a few critical bits crucial to the phone's operation will the error be perceived), the cell phone will probably not fail once in its lifetime due to soft errors. Thus, for single-user applications, it does not make sense to implement costly error correction or redundancy even when the SER rate is very high.

That same chip, however, if used in a telecom base station as a component in a mainframe computer server or in a lifesupport system, is in a different situation. In such systems, reliability requirements are much higher and many of chips are used in parallel so that the single-chip SER of one soft fail every 2 years must be multiplied by the number of chips in the system—one fail every 2 years for a single chip becomes a failure rate of once a week for a system with 100 chips. For such applications, error correction is mandatory. Fig. 10 shows the monthly number of soft errors as a function of the number of chips in the system and the amount of SRAM integrated in each chip. Logic SER is not comprehended, and the failure rates are based on an uncorrected 1.6 kFIT/Mb SRAM SER (which is typical for typical 6T CMOS SRAM operating near 1 V if tested according to JEDEC JESD89 test standard [34], [35] for neutron SER and alpha particle SER). In order to avoid overestimating the product failure rate, issues such as data sensitivity, timing sensitivity, and logical masking must be comprehended since in many scenarios errors that do not affect machine states ultimately do not cause a product failure [36], [37]. Data sensitivity is application dependent and implies that not all memory bits are equal-for example, a soft error occurring in a part of memory that has already been accessed and which will not be used again before it is rewritten with new data does not affect the product's reliability since the corrupted data will be overwritten. Ignoring data sensitivity will lead to overestimating the impact of the SER on the product's reliability. Similar with timing sensitivities, particularly in logic circuits, some soft errors that cannot propagate fast enough to be latched in the next memory element before the clock edge can be ignored as can radiation events during actual switching when the nodes are driven by external circuits that totally swamp the transient charge caused by the soft error. In logical masking, a soft error occurring in a part of a complex circuit that is not currently being used (such as an input to an AND gate when any other input is low rendering the AND gate's output low regardless of the other inputs) will also have no effect on final product reliability and should not be counted. The key point is that the level of mitigation required to meet the customer's reliability expectations is far more dependent on the end application reliability requirements than the component's specific SER.

## B. Mitigation Options

Having made the decision that a particular product's SER is too high, mitigation strategies need to be considered. The most obvious way to eliminate soft errors is to get rid of the radiation sources that cause them. To mitigate the dominant SER threat posed by the reaction of low-energy neutrons and <sup>10</sup>B, BPSG has been removed from virtually all advanced technologies. To reduce alpha particle emissions, semiconductor manufacturers use extremely high purity materials and processes, production screening all materials with low background alpha emission measurements. Another method of reducing alpha particles is to design chips where the materials with the highest alpha emission are kept physically separated from sensitive circuit components. One last solution frequently employed to shield the high alpha emission from packaging materials is to coat the chip with a thick polyimide layer prior to packaging. While large reductions in SER are possible either by removing the sources of or shielding the <sup>10</sup>B reaction products and alpha particles, a large portion of the high-energy cosmic neutrons will always reach the devices and cause soft errors. Ultimately, SER is limited to a level defined by high-energy cosmic neutron radiation.

The remaining SER can be addressed, to some extent, by process and technology choices. Substrate structures or doping profiles that minimize the depth from which carriers can be collected can have a large impact on reducing  $Q_{\rm coll}$ , thus reducing SER. In DRAM, multiple-well isolation has been used to reduce charge collection. Well-based mitigation technologies have also been suggested for CMOS logic [38]. Guard ring structures around sensitive junctions have also been used in SRAM devices to provide SER robustness at the expense of SRAM density.

Substrates incorporating a very thin silicon layer on a thicker layer of buried oxide (silicon on insulator—SOI) have also been shown to reduce SER sensitivity [39]–[41] as compared with bulk silicon. Ultimately, though, the improvements garnered by substrate engineering provide a limited path for mitigating soft errors. The majority of process solutions seldom reduce SER by more than five times so their use does not justify the expense of additional process complexity, yield loss, and substrate cost.

An exception to this trend is a recently reported [42] process solution using additional capacitance provided by an embedded DRAM capacitor attached to all sensitive nodes to increase the  $Q_{\rm crit}$  of SRAM and logic devices. This approach does not use extra area but does add the expense of several additional process steps required for defining the embedded DRAM structures. While increasing  $Q_{\rm crit}$  and offering a  $250 \times$  reduction in SER, this is generally not enough of a reduction for many highreliability applications.

Radiation sensitivity can be reduced significantly by design and layout changes. Any change that increases  $Q_{\rm crit}$  while maintaining or reducing  $Q_{coll}$  will improve the SER performance of a device. For example, a typical high-density SRAM cell consists of six transistors; two allowing data to be read and written to-and-from the cell and four transistors making up the two cross-coupled inverters responsible for maintaining the data state.  $Q_{\rm crit}$  is a function of the storage node capacitance and voltage and of an additional term for the restoring charge supplied by the pull-up/pull-down transistor. This restoring term is proportional to the switching time of the cell and the current provided by the load transistor. Increasing the current drive of the load transistors and/or increasing the switching time of the SRAM cell will increase the robustness of the cell against corruption. Thus,  $Q_{crit}$  can be increased significantly if additional or larger drive transistors are added so that a larger restoring current can be provided during a radiationinduced transient. Resistance can also be added between the two inverters so that the time to flip the cell is increased [43], [44], thus effectively allowing the pull-up/pull-down transistor more time to restore the data state (this approach affects the write time of the cell and in high-speed technologies is not a realistic solution).

Another approach frequently used for mitigation is to use multiple transistors and storage nodes for each data bit stored within the device. While this would seem self-defeating, in that adding extra transistors also adds an additional sensitive area, the method actually works if the nodes have a physical separation larger than that of the range of the radiation events encountered in the terrestrial radiation environment. This range is usually limited to within a few microns of the struck node, so components with nodes physically separated to ensure that these "typical" events cannot effect both transistors driving the same node (data state) result in a device with robust data states [45] (this approach is based on the fact that the probability of having multiple events in the same device node at the same time is exceedingly small). This approach can be used effectively in sequential logic but is very expensive for embedded memories as there is a large area penalty and moderate power and speed penalties incurred.

By far, the most effective method of dealing with soft errors in memory components is by employing additional circuitry for error detection and/or correction. In its simplest form, error detection consists of adding a single bit to store the parity (odd or even) of each data word (regardless of word length). Whenever data are retrieved, a check is run comparing the parity of the stored data to its parity bit. If a single error has occurred, the check will reveal that the parity of the data does not match the parity bit. Thus, the parity system allows for the detection of a soft error for a minimal cost in terms of circuit complexity and memory width (only a single bit is added to each word). The two disadvantages of this system are that the detected error cannot be corrected and if a double error has occurred then the check will not reveal that anything is wrong since the parity will match. This is true for any even number of errors. For example, if the data were stored with odd parity, the first error changes the odd parity to even parity (detectable error), but the second error changes the parity back to odd (nondetectable error).

In order to address these shortcomings, error detection and correction (EDAC) or error correction codes (ECC) is employed. Typically, error correction is achieved by adding extra bits to each data vector encoding the data so that the "information distance" between any two possible data vectors is, at least, three. Larger information distances can be achieved with more parity bits and additional circuitry-but in general, the single error correction double error detection (SECDED) schemes are favored. In these systems, if a single error occurs (a change of plus or minus one in information space), there is no chance that the corrupted vector will be mistaken for its nearest neighbors (since the information distance is three). In fact, if two errors occur in the same "correction word," a valid error vector will still be produced. The only limitation is that with two errors the error vector will not be unique to a single data value, thus only detection of double-bit errors is supported.

A simplified representation of parity and SECDED coded systems and the effect of single and multiple errors on the data states are illustrated in Fig. 11(a) and (b), respectively. For a 64-bit wide memory, eight correction bits are required to allow two errors to be detected and a single error to be corrected. Since most soft error events are single-bit errors, EDAC/ECC protection will provide a significant reduction in soft failure rates<sup>1</sup> (typically >  $10000 \times$  reduction in effective error rates) but at a higher cost in terms of design complexity, the additional memory required, and the inherent latency introduced during access, parity check, and correction. To ensure low failure rates, the memory design must also account for multiple bit errors (MBEs) that can span two or more physical bits. In errorcorrected memories, it is recommended that the minimum row or column spacing between bits in the same logical "correction word" be at least 4 or 8 bits. The worst case design would be a high-density memory with adjacent bits in the same correction word. In this type of layout, the efficacy of ECC would be limited by the MBE rate, which although a fraction of the total SER would be orders of magnitude higher than a properly designed memory with ECC.

<sup>1</sup>ECC will not correct SEL since it typically affects a large number of bits at once. Voltage scaling can actually improve SEL; indeed, measurements confirm a sharp drop in SEL probability when the operating voltage approached 1 V. This is related to the minimum required voltage drop across the bipolar transistors—at or below 1 V the typical CMOS technology cannot sustain a steady-state latch-up condition. Mitigating SEL involves using adequate well taps to ensure that parasitic resistances are minimized.



Fig. 11. (a) Parity encoding in which an extra bit is added to the data. Note that a single bit error (gray arrows) leads to a nonunique error vector since a single bit error in either data state could have caused either of the possible error vectors. Also note that double-bit errors (black arrow) result in no error detected as the initial data state is erroneously mapped into another valid data state. (b) In contrast, ECC uses additional parity bits so that unique error vectors are generated for single bit errors, rendering these errors correctable since their initial state is known. Double-bit errors are detectable but not unique and not correctable.

Sequential and combinational logic can be hardened by design and layout tricks analogous to the SRAM hardening discussed previously. Since fewer logic gates are used as compared to high-density SRAM cells in most chips, the logic design solutions can be more comprehensive since bit density is not as crucial as in large memory arrays. As previously mentioned, most design approaches rely on multiple storage nodes being used for the data state. The storage nodes are typically laidout so that the probability of an event having enough energy to disrupt two or more nodes in the system is minimized [46]. Since the charge transients from radiation then only affect a single node, the additional storage node(s) restores the data state so that no error ensues.

The analog of error correction in sequential logic involves the use of multiple identical logic paths feeding into a majority voting (two out of three) circuit. Basically, this architecture allows a soft error in a single logic path to be ignored since the other two are the majority and, thus, the correct data "win" the vote. This method uses three times the chip area and requires specialized simulation tools to identify the critical logic paths (because of the high cost one wants to protect only the most sensitive paths). Time-multiplexed designs can also offer robustness against SEU and SET since the input is sampled several times before a decision on the output is made. At an increased cost, even more robustness can be built-in if time and spatial multiplexed designs are used [47], [48].

The final and most ambitious form of redundancy is the use of duplicate or redundant systems—where multiple identical components are run in lock-step (executing the same code at the same time). In a dual-component system, a restart is issued when a mismatch between devices is detected, while in systems with more than two units, a majority voting strategy can be used so that restarting is not necessary. This is the most expensive redundancy scheme, but it does reduce soft failure rates to nearzero levels, providing the necessary reliability for some longterm remote or mission-critical applications [49].

#### V. CONCLUSION

Ionization collected from terrestrial radiation events can cause data errors leading to failures in electronic devices. At terrestrial altitudes, three mechanisms are responsible for soft errors: the reaction of high-energy cosmic neutrons with silicon and other device materials, the reaction of low-energy cosmic neutrons with high concentrations of <sup>10</sup>B in the device, and alpha particles emitted from trace radioactive impurities in the device materials. The soft error sensitivity of various memory and logic devices used to create advanced commercial electronic systems as a function of technology scaling has been presented. The author has shown that while the SER of DRAM in a system is relatively unchanged by scaling, SRAM and peripheral logic system SER are increasing rapidly with each new technology node. The cost and efficacy of various methods to mitigate soft errors have also been reviewed along with the conclusion that the most effective way to improve memory system soft error reliability is to employ EDAC techniques, while sequential logic robustness can best be improved by design hardening and spatial and time redundancy. Finally, the impact of soft errors on terrestrial electronic systems has been shown to be extremely application dependent and that for single-user commercial applications soft errors are typically not a concern while for larger (multichip) or high-reliability applications error correction and/or redundancy techniques are mandatory.

#### ACKNOWLEDGMENT

The author wishes to thank X. Deng for simulation of the 180-nm logic and designing the logic test structures used to measure 130-nm and 90-nm logic SER data shown in Fig. 9. The author also appreciates the work of X. Zhu who generated better 130-nm logic SER data using actual product digital signal processing (DSP) scan chains.

#### REFERENCES

 P. E. Dodd and L. W. Massengill, "Basic mechanisms and modeling of single-event upset in digital microelectronics," *IEEE Trans. Nucl. Sci.*, vol. 50, no. 3, pp. 583–602, Jun. 2003.

- [2] F. W. Sexton, "Destructive single-event effects in semiconductor devices and ICs," *IEEE Trans. Nucl. Sci.*, vol. 50, no. 3, pp. 603–621, Jun. 2003.
- [3] S. Satoh, Y. Tosaka, and S. A. Wender, "Geometric effect of multiple-bit soft errors induced by cosmic ray neutrons on DRAM's," *IEEE Electron Device Lett.*, vol. 21, no. 6, pp. 310–312, Jun. 2000.
- [4] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, "Characterization of multi-bit soft error events in advanced SRAMs," in *Int. Electron Devices Meeting (IEDM) Tech. Dig.*, Washington, DC, Dec. 2003, pp. 21.4.1–21.4.4.
- [5] R. Koga, S. H. Penzin, K. B. Crawford, and W. R. Crain, "Single Event Functional Interrupt (SEFI) sensitivity in microcircuits," in *Proc. 4th Radiation and Effects Components and Systems (RADECS)*, Cannes, France, Sep. 1997, pp. 311–318.
- [6] J. Benedetto, P. Eaton, K. Avery, D. Mavis, M. Gadlage, T. Turflinger, P. Dodd, and G. Vizkelethyd, "Heavy ion-induced digital single-event transients in deep submicron processes," *IEEE Trans. Nucl. Sci.*, vol. 51, no. 6, pp. 3480–3485, Dec. 2004.
- [7] G. Bruguier and J.-M. Palau, "Single particle-induced latchup," *IEEE Trans. Nucl. Sci.*, vol. 43, no. 2, pp. 522–532, Apr. 1996.
- [8] P. E. Dodd, M. R. Shaneyfelt, J. R. Schwank, and G. L. Hash, "Neutroninduced latchup in SRAMs at ground level," in *Proc. 41st Int. Reliability Physics Symp. (IRPS), IEEE EDS*, Dallas, TX, Apr. 2003, pp. 51–55.
- [9] J. F. Ziegler and J. P. Biersack, "Stopping and Range of Ions in Matter (SRIM)," software (version 2000).
- [10] C. M. Hsieh, P. C. Murley, and R. O'Brien, "A field-funneling effect on the collection of alpha-particle-generated carriers in silicon devices," *IEEE Trans. Electron Device Lett.*, vol. 2, no. 4, pp. 686–693, Dec. 1981.
- [11] P. E. Dodd and F. W. Sexton, "Critical charge concepts for CMOS SRAMs," *IEEE Trans. Nucl. Sci.*, vol. 42, no. 6, pp. 1764–1771, Dec. 1996.
- [12] T. C. May and M. H. Woods, "A new physical mechanism for soft error in dynamic memories," in *Proc. 16th Int. Reliability Physics Symp. (IRPS), IEEE EDS*, San Diego, CA, 1978, pp. 33–40.
- [13] J. F. Ziegler and W. A. Lanford, "The effect of sea level cosmic rays on electronic devices," J. Appl. Phys., vol. 52, no. 6, pp. 4305–4318, 1981.
- [14] P. Goldhagen, "Cosmic-ray neutrons on the ground and in the atmosphere," *Mater. Res. Soc. Bull.*, vol. 28, no. 2, pp. 131–135, Feb. 2003.
- [15] M. S. Gordon, P. Goldhagen, K. P. Rodbell, T. H. Zabel, H. H. K. Tang, J. M. Clem, and P. Bailey, "Measurement of the flux and energy spectrum of cosmic-ray neutrons," *IEEE Trans. Nucl. Sci.*, vol. 51, no. 6, pp. 3427– 3434, Dec. 2004.
- [16] E. Normand, "Single event effects in avionics," *IEEE Trans. Nucl. Sci.*, vol. 43, no. 2, pp. 461–474, Apr. 1996.
- [17] J. F. Ziegler, "Terrestrial cosmic ray intensities," IBM J. Res. Develop., vol. 42, no. 1, pp. 117–139, 1998.
- [18] Y. Tosaka, H. Kanata, T. Itakura, and S. Satoh, "Simulation technologies for cosmic ray neutron-induced soft errors: Models and simulation systems," *IEEE Trans. Nucl. Sci.*, vol. 46, no. 3, pp. 774–779, Jun. 1999.
- [19] H. K. Tang and K. P. Rodbell, "Single-event upsets in microelectronics: Fundamental physics and issues," *Mater. Res. Soc. Bull.*, vol. 28, no. 2, pp. 111–116, Feb. 2003.
- [20] F. Wrobel, J. M. Palau, M. C. Calvet, O. Bersillon, and H. Duarte, "Incidence of multi-particle events on soft error rates caused by n-Si nuclear reactions," *IEEE Trans. Nucl. Sci.*, vol. 47, no. 6, pp. 2580–2585, Dec. 2000.
- [21] F. Wrobel, J. M. Palau, M. C. Calvet, and P. Iacconi, "Contribution of SiO/sub 2/ in neutron-induced SEU in SRAMs," *IEEE Trans. Nucl. Sci.*, vol. 50, no. 6, pp. 2055–2059, Dec. 2003.
- [22] J. R. Letaw and E. Normand, "Guidelines for predicting single-event upsets in neutron environments (RAM devices)," *IEEE Trans. Nucl. Sci.*, vol. 38, no. 6, pp. 1500–1506, Dec. 1991.
- [23] J. D. Dirk, M. E. Nelson, J. F. Ziegler, A. Thompson, and T. H. Zabel, "Terrestrial thermal neutrons," *IEEE Trans. Nucl. Sci.*, vol. 50, no. 6, pp. 2060–2064, Dec. 2003.
- [24] R. C. Baumann, T. Z. Hossain, S. Murata, and H. Kitagawa, "Boron compounds as a dominant source of alpha particles in semiconductor devices," in *Proc. 33rd Int. Reliability Physics Symp. (IRPS), IEEE EDS*, Las Vegas, NV, 1995, pp. 297–302.
- [25] R. C. Baumann and E. B. Smith, "Neutron-induced <sup>10</sup>B fission as a major source of soft errors in high density SRAMs," *Elsevier Microelectron. Reliab.*, vol. 41, no. 2, pp. 211–218, 2001.
- [26] R. C. Baumann and T. Z. Hossain, "Electronic device and process achieving a reduction in alpha particle emissions from Boron-based compounds essentially free of Boron-10," U.S. Patent 5 395 783, Mar. 7, 1995.

- [27] L. W. Massengill, "Cosmic and terrestrial single-event radiation effects in dynamic random access memories," *IEEE Trans. Nucl. Sci.*, vol. 43, no. 2, pp. 576–593, Apr. 1996.
- [28] S. Hareland, J. Maiz, M. Alavi, K. Mistry, S. Walstra, and C. Dai, "Impact of CMOS process and scaling and SOI on soft error rates of logic processors," in *Proc. Symp. VLSI Technology*, Kyoto, Japan, 2001, pp. 73–74.
- [29] R. Baumann, "The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction," in *Int. Electron Devices Meeting (IEDM) Tech. Dig.*, San Francisco, CA, Dec. 2002, pp. 329–332.
- [30] S. Buchner, M. Baze, D. Brown, D. McMorrow, and J. Melinger, "Comparison of error rates in combinational and sequential logic," *IEEE Trans. Nucl. Sci.*, vol. 44, no. 6, pp. 2209–2216, Dec. 1997.
- [31] X. Zhu, R. Baumann, C. Pilch, J. Zhou, J. Jones, and C. Cirba, "Comparison of product failure rate to component soft error rate in a multicore digital signal processor," in *Proc. 43rd Int. Reliability Physics Symp.* (*IRPS*), IEEE EDS, San Jose, CA, 2005, pp. 209–214.
- [32] P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi, "Modeling the effect of technology trends on the soft error rate of combinational logic," in *Proc. IEEE Dependable Systems and Networks Conf.*, Washington, DC, Jun. 2002, pp. 389–398.
- [33] M. J. Gadlage, R. D. Schrimpf, J. M. Benedetto, P. H. Eaton, and T. L. Turflinger, "Modeling and verification of single event transients in deep submicron technologies," in *Proc. 42nd Int. Reliability Physics Symp. (IRPS), IEEE EDS*, Phoenix, AZ, Apr. 2004, pp. 673–674.
- [34] P. E. Dodd, M. R. Shaneyfelt, J. R. Schwank, and G. L. Hash, "Neutroninduced soft errors, latchup, and comparison of SER test methods for SRAM technologies," in *Int. Electron Devices Meeting (IEDM) Tech. Dig.*, San Francisco, CA, Dec. 2002, pp. 333–336.
- [35] Measurement and Reporting of Alpha Particle and Terrestrial Cosmic Ray-Induced Soft Errors in Semiconductor Devices, JEDEC Test Standard No. 89, 2001.
- [36] H. T. Nguyen and Y. Yagil, "A systematic approach to SER estimation and solutions," in *Proc. 41st Int. Reliability Physics Symp. (IRPS), IEEE EDS*, Dallas, TX, Apr. 2003, pp. 60–70.
- [37] N. Seifert and N. Tam, "Timing vulnerability factors of sequentials," *IEEE Trans. Device Mater. Reliab.*, vol. 4, no. 3, pp. 516–522, Sep. 2004.
- [38] M. P. Baze, S. P. Buchner, and D. McMorrow, "A digital CMOS design technique for SEU hardening," *IEEE Trans. Nucl. Sci.*, vol. 47, no. 6, pp. 2603–2608, Dec. 2000.
- [39] O. Musseau, "Single-event effects in SOI technologies and devices," *IEEE Trans. Nucl. Sci.*, vol. 43, no. 2, pp. 603–613, Apr. 1996.
- [40] P. Roche, G. Gasiot, K. Forbes, V. O'Sullivan, and V. Ferlet, "Comparisons of soft error rate for SRAMs in commercial SOI and bulk below the 130-nm technology node," *IEEE Trans. Nucl. Sci.*, vol. 50, no. 6, pp. 2046–2054, Dec. 2003.
- [41] E. H. Cannon, D. D. Reinhardt, M. S. Gordon, and P. S. Makowenskyj, "SRAM SER in 90, 130, and 180 nm bulk and SOI technologies," in *Proc. 42nd Int. Reliability Physics Symp. (IRPS), IEEE EDS*, Phoenix, AZ, 2004, pp. 300–304.
- [42] R. Wilson. (2000, Jan. 13). ST Tames soft errors in SRAM by adding capacitors. *Electron. Eng. Times* [Online]. Available: http://www.eetimes. com/semi/news/show Article.jhtml?articleID=18310682
- [43] L. W. Massengill, "SEU-hardened resistive-load static RAMs," IEEE Trans. Nucl. Sci., vol. 38, no. 6, pp. 1478–1485, Dec. 1991.
- [44] R. Rockett, "Simulated SEU hardened scaled CMOS SRAM cell design using gated resistors," *IEEE Trans. Nucl. Sci.*, vol. 39, no. 5, pp. 1532– 1541, Oct. 1992.
- [45] D. Bessot and R. Velazco, "Design of SEU-hardened CMOS memory cells: The HIT cell," in *Proc. 2nd Radiation and Effects Components and Systems (RADECS)*, St. Malo, France, Sep. 1993, pp. 563–570.
- [46] T. Calin et al., "Topology-related upset mechanisms in design hardened storage cells," in Proc. 4th Radiation and Effects Components and Systems (RADECS), Cannes, France, Sep. 1997, pp. 484–488.
- [47] L. Anghel and M. Nicolaidis, "Cost reduction and evaluation of a temporary faults detecting technique," in *Proc. Design, Automation and Test European Conf.*, Paris, France, Mar. 2000, pp. 591–598.
- [48] D. G. Mavis and P. H. Eaton, "Soft error rate mitigation techniques for modern microcircuits," in *Proc. 40th Int. Reliability Physics Symp.* (*IRPS*), *IEEE EDS*, Dallas, TX, Apr. 2002, pp. 216–225.
- [49] J. F. Ziegler and H. Puchner, SER-History, Trends, and Challenges: A Guide for Designing With Memory ICs. San Jose, CA: Cypress Semiconductor, 2004.



**Robert C. Baumann** (S'84–M'89–SM'01–F'05) was born in New York City, NY, on 1962. He received the B.A. degree in physics (*cum laude*), focused on developing microcalorimetry for studying phase changes in liquid crystals, from Bowdoin College, Brunswick, ME, in 1984, and the Ph.D. degree in electrical engineering, focused on process development and integration of ferroelectric thin films in CMOS technologies, from Rice University, Houston, TX, in 1990.

In 1989, he joined Texas Instruments, Dallas, TX, as a Reliability Engineer focused on characterizing the reliability of novel dielectrics and radiation effects in advanced dynamic random access memories (DRAMs). From 1993 to 1998, he was involved in transistor and radiation effects reliability and advanced physical and electrical failure analysis in Texas Instruments Mihomura Fab and Tsukuba R&D Center, Japan. He is currently a Distinguished Member of the Technical Staff at Texas Instruments, where he is responsible for the radiation effects program and interacting with customers worldwide on radiation issues.

Dr. Baumann is the Cochair of an SIA Expert's Panel on the impact of International Traffic in Arms Regulations (ITAR) and was the Sematech Chairman of the Radiation Effects Group responsible for drafting the now defacto commercial radiation testing standard JEDEC JESD89. He was chosen to chair a revision committee for the JESD89. He has served as Committee Member and Technical Session Chair for IEEE International Reliability Physics Symposium (IRPS) and Nuclear and Space Radiation Effects Conference (NSREC), and is a frequent Reviewer for the IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY (TDMR).