Computer Testing1 IntroductionThe goal of this work is to focus on the assurance for computer testing (both hardware and software). Frequently there is uncertainty about the principal cause of hardware and software faults, especially when intermittent, occasional, and random faults are encountered. Generally, such types of faults are very difficult to reproduce, and thus are very difficult to debug. The burn-in method usually applied to hardware testing, can also be applied to software, in order to debug such types of software faults. There are many tools around, that can be used to run burn-in sequences, the great majority of tools are software tools, and a few hardware cards are available. In this essay we provide a list of tools. In the second section of this essay we depict the basics of the DCL Model. *** DCL has been moved out to a separate paper *** The rest of this essay aims to explain the backgrounds of the testing area, and the details of the many test methods that we have explored, before, and during this work. This essay is directly related to both hardware and software testing, and thus, should be considered directly related to operating system testing. 2 AssuranceWe are introducing the concept of minimal consistency. We will also expose the concepts of Rotation Test, Diversity Test, Stress Testing, Stability Test, and Confidence Test. 2.1 Separate component testsEach support component must pass a separate burn-in test, until every component is well tested and convergence of test results is achieved for all support components. We want to emphasize the importance of the traceability of components and the importance of cross-checks between burn-in tests to make the assurance effective. 2.2 Minimal set of test support componentsEach test needs a number of support components to be executed. In order to be effective each test must be executed with the minimal set of support components mounted on the system board. In general, the minimal set of support components is composed by the the system board itself, the CPU(s), the RAM modules, the video card, and the boot device. Usually, at least one floppy disk is required, and at least one hard disk may be required for certain tests, but not always. 2.3 Minimal ConsistencyIn general form, the concept of consistency requires that the same conditions are established in each test. Establishing the same conditions when testing components on various system boards, requires the identification characteristics of the various system boards to be comparable. The basic set of identification characteristics of a motherboard is composed by the Brand, the Type, The Model, The Chipset type and version. Depending on which components are being tested, on which type of tests are being executed, and on which level of test is attained, the manufacture date might also be important. This is true because a certain type and model of motherboard might be constructed with the same chipset type and version, but the manufacturer may decide to use different electronic components. The manufacturers change the electronic components quite often, and this is a source of random errors, especially for hardware drivers written for motherboards of previous manufacture. We have seen this often. 2.3.1 Cross-check for Rotation TestsIn order for a component rotation test to be minimally consistent, it is necessary that the same burn-in sequence is executed up to completion at least one time on two identical system boards (same brand - type - model, same chipset type and version), with the same BIOS configuration, and with the same CPU. In order to be minimally consistent, a component rotation test executed with two distinct and identical CPUs, must be cross-checked, by rotating the CPUs between the two system boards, this ensures that the rotation test is also effective for the CPUs. 2.3.2 Cross-check for Operating System Test CasesIn order for an operating system test case to be minimally consistent, it is necessary that the same test case is executed up to completion at least one time on two identical machines, with the same BIOS configuration, and with the same operating system configuration. NOTE: While we are temporarily relaxing the constraints on test support components for operating systems tests, we expect to introduce later a DCL set for operating system test cases. 3 POSTThis section provides information on the procedures to follow when a computer gives no sign of life, or refuses to boot. The POST is a set of mechanisms, built inside the BIOS by the manufacturers, that help in the determination of the causes of the most severe hardware faults. 3.1 What is the POST ?The POST (Power On Self Test) is a set of software routines, part of the BIOS, that is started at the moment of system power on, or after a reset, to run a set of tests on principal components. At the end of those tests, if the sequence is completed without evident errors, the control is passed to the part of the BIOS that executes the bootstrap from the disk or the floppy. [FDC] *** FIX THAT *** 3.2 How the POST sequence worksFrom the moment when voltage is applied to the system, the following operations are executed:
The sequence of tests is approximately the following:
At the end of the sequence, here only quickly summarized, the BIOS passes the control to the INT 19H vector for operating system boot. Each step of the POST sequence is identified with a binary number sent by the processor to a location readable with the POST-card. The structure of the POST tests is quite complex, but their reading is fundamental to understand the principal reasons of the failed boot of a system. Unfortunately, the majority of tests is executed without evident signaling while other tests give video prompts or audible prompts via the system speaker. If the sequence of POST tests is executed without errors, the user rarely notices its presence, because it is executed very quickly; the most evident part is, usually, the memory count on video and the prompt containing the system configuration. Only when the system has passed the earliest phases and can control some resources, the results of the POST tests are made audible with a serie of sounds of the system speaker (Beep Codes) and, if the video peripheral is operating, even with displayed messages. The correct sequence of tests can be made visible by using a POST-card with a display, because the POST routines send the codes associated with each executed test to an I/O location (usually 80H) from where it is possible to capture them for the visualization. Such codes are different for each type of BIOS and the related documentation can be requested to the constructor or is present in the appendices of motherboard manuals. So, if the display is turned off, it isn't sure that the system is non-operating, the system could be blocked on a failed test, not due to the motherboard, but to some other defective component. It is therefore erroneous to consider "dead" the motherboard of a PC that does not present video activity, because the motherboard could be operating correctly and the problem may be located in a component that has blocked the POST sequence, for example a SIMM, an IDE connector inserted wrongly, a card on the bus, etc., in a place not yet visible from the outside. It must be said, anyway, that the tests of the POST phase can detect the principal faults that can impede the correct operation of the system, but cannot reveal problems due to parts where the fault is evidenced only randomly or after a certain period of heating or in presence of software or other hardware. To detect such other problems there are specific diagnostic programs, and it is necessary to run a so-called burn-in, which is a test cycle that durates many hours. 3.3 What is the POST-card ?The POST-card is a little board with a display controlled by appropriate circuitry. The BIOS, during the POST phase, before the execution of each test, sends to an I/O location a code which is captured by the POST-card electronics and represented by the display as an hexadecimal number. Usage of the POST-card is extremely simple and efficient, and it is the fundamental and most practical way to determinate the cause of the failed boot of a system. Everything must be removed from the motherboard, except the RAM and the video card. With the system turned off, the POST-card must be inserted in a free slot, in a position where it is easy to read the display. There is no priority in the ISA slots, so if the POST-card is an ISA card, the choice of the slot is indifferent. The system is turned on, and immediately the display will start to present a serie more or less fast of numbers. If the sequence stops on a number, this number will indicate the cause of the probable fault. For example, before starting the RAM test, the AWARD BIOS sends to the I/O location with hexadecimal address 80H, the code 07. If the display stops while displaying the code 07, the fault is in the RAM. If the display stops while displaying 0B, this indicates a fault in the battery or in the CMOS RAM. The sequence terminates typically with FF, after which the control is passed from the BIOS to the INT 19H for the boot phase. In this way it is possible to identify with notable precision and in very short time the causes of a system fault. A POST-card is the only way to evidence faults in the pre-boot phase in a secure manner. Without the POST-card, one can only proceed by experience, or in limited manner. For example, the constructors of the BIOS provided an help in case one don't dispose of a POST-card, by using the system speaker and/or the video display of the graphic card. This two systems are practical and efficient, but obviously limited: beforethe system can take control of the speaker it is necessary that a consistent part of the system hardware is operating efficiently; for the video, it is necessary that the quasi-totality of the system is operating correctly. If the hardware fault occurs before the system can take the control of the speaker or the VGA, they can't be used to signal the fault to the user. In any case, the acoustic error signals emitted by the speaker (BEEP CODES) and the BIOS Error Messages, remain a valid way to determinate quickly the hardware causes of a failed boot. BEEP CODES and BIOS Error MessagesThe Beep Codes of the AMI BIOS are typically the following:
All beep sequences identify fatal errors that impede the continuation of the boot operation, except the number 8, since the system can always be booted even without display (inside the setup it is possible to exclude the display and keyboard from the tests, such options are provided for typical industrial usage systems). 3.4 Suggestions for intervention on Beep Codes
AMI BIOS Error Messages
It happens frequently that the faults are due to causes not directly dependent from the component identified as faulty. For example the message 19 can simply be due to the hard disk, per-se operating correctly and containing the operating system, but in which there is no active boot partition. It is therefore important, before pointing the finger to a specific component, to verify accurately all other possible alternatives. 3.5 Where can I obtain further information ?Usually, the information on the motherboard's manuals is sufficient for a common use. The Wim's BIOS site has useful information. 3.5.1 Open-source BIOS3.5.2 Commercial BIOS4 Terminology[FDC] Burn-in period - A factory test designed to catch systems with marginal components before they get out the door; the theory is that burn-in will protect customers by outwaiting the steepest part of the bathtub curve (see infant mortality). A lot of free burn-in software is present on the Internet, and it is really difficult to compile an exhaustive listing, although the list we provide here should be sufficient to find what you need to get started if you want to do burn-in testing on your machines. Also, we have listed some advanced software and other useful sites. 5 Linux Diagnostics Software
6 Memory Test Hardware7 Related WorkThere is a great deal of previous and related work on computer testing, specifically on burn-in and BIOS. While our goal is to focus on consistency and assurance of tests, there are other sites that have better or extensive information on technical details. We list here the sites known to us. 8 References
|