WordNet

a worker whose job is to locate and fix sources of trouble (especially in mechanical devices) (同)trouble shooter

PrepTutorEJDIC

機械の故障を発見して修理する / 紛争を解決する / …‘の'故障を発見して修理する / …‘の'紛争を解決(仲裁)する
紛争解決者,調停委員 / (機械などの)故障検査員

Wikipedia preview

出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2014/07/07 17:01:33」(JST)

wiki en

Troubleshooting is a form of problem solving, often applied to repair failed products or processes. It is a logical, systematic search for the source of a problem so that it can be solved, and so the product or process can be made operational again. Troubleshooting is needed to develop and maintain complex systems where the symptoms of a problem can have many possible causes. Troubleshooting is used in many fields such as engineering, system administration, electronics, automotive repair, and diagnostic medicine. Troubleshooting requires identification of the malfunction(s) or symptoms within a system. Then, experience is commonly used to generate possible causes of the symptoms. Determining the most likely cause is a process of elimination - eliminating potential causes of a problem. Finally, troubleshooting requires confirmation that the solution restores the product or process to its working state.

In general, troubleshooting is the identification of, or diagnosis of "trouble" in the management flow of a corporation or a system caused by a failure of some kind. The problem is initially described as symptoms of malfunction, and troubleshooting is the process of determining and remedying the causes of these symptoms.

A system can be described in terms of its expected, desired or intended behavior (usually, for artificial systems, its purpose). Events or inputs to the system are expected to generate specific results or outputs. (For example selecting the "print" option from various computer applications is intended to result in a hardcopy emerging from some specific device). Any unexpected or undesirable behavior is a symptom. Troubleshooting is the process of isolating the specific cause or causes of the symptom. Frequently the symptom is a failure of the product or process to produce any results. (Nothing was printed, for example).

The methods of forensic engineering are especially useful in tracing problems in products or processes, and a wide range of analytical techniques are available to determine the cause or causes of specific failures. Corrective action can then be taken to prevent further failures of a similar kind. Preventative action is possible using failure mode and effects analysis (FMEA) and fault tree analysis (FTA) before full scale production, and these methods can also be used for failure analysis.

Aspects

Even most discussion of troubleshooting, and especially training in formal troubleshooting procedures, tends to be domain specific, even though the basic principles are universally applicable.

Usually troubleshooting is applied to something that has suddenly stopped working, since its previously working state forms the expectations about its continued behavior. So the initial focus is often on recent changes to the system or to the environment in which it exists. (For example a printer that "was working when it was plugged in over there"). However, there is a well known principle that correlation does not imply causality. (For example the failure of a device shortly after it's been plugged into a different outlet doesn't necessarily mean that the events were related. The failure could have been a matter of coincidence.) Therefore troubleshooting demands critical thinking rather than magical thinking.

It's useful to consider the common experiences we have with light bulbs. Light bulbs "burn out" more or less at random; eventually the repeated heating and cooling of its filament, and fluctuations in the power supplied to it cause the filament to crack or vaporize. The same principle applies to most other electronic devices and similar principles apply to mechanical devices. Some failures are part of the normal wear-and-tear of components in a system.

A basic principle in troubleshooting is to start from the simplest and most probable possible problems first. This is illustrated by the old saying "When you see hoof prints, look for horses, not zebras", or to use another maxim, use the KISS principle. This principle results in the common complaint about help desks or manuals, that they sometimes first ask: "Is it plugged in and does that receptacle have power?", but this should not be taken as an affront, rather it should serve as a reminder or conditioning to always check the simple things first before calling for help.

A troubleshooter could check each component in a system one by one, substituting known good components for each potentially suspect one. However, this process of "serial substitution" can be considered degenerate when components are substituted without regard to a hypothesis concerning how their failure could result in the symptoms being diagnosed.

Simple and intermediate systems are characterized by lists or trees of dependencies among their components or subsystems. More complex systems contain cyclical dependencies or interactions (feedback loops). Such systems are less amenable to "bisection" troubleshooting techniques.

It also helps to start from a known good state, the best example being a computer reboot. A cognitive walkthrough is also a good thing to try. Comprehensive documentation produced by proficient technical writers is very helpful, especially if it provides a theory of operation for the subject device or system.

A common cause of problems is bad design, for example bad human factors design, where a device could be inserted backward or upside down due to the lack of an appropriate forcing function (behavior-shaping constraint), or a lack of error-tolerant design. This is especially bad if accompanied by habituation, where the user just doesn't notice the incorrect usage, for instance if two parts have different functions but share a common case so that it isn't apparent on a casual inspection which part is being used.

Troubleshooting can also take the form of a systematic checklist, troubleshooting procedure, flowchart or table that is made before a problem occurs. Developing troubleshooting procedures in advance allows sufficient thought about the steps to take in troubleshooting and organizing the troubleshooting into the most efficient troubleshooting process. Troubleshooting tables can be computerized to make them more efficient for users.

Some computerized troubleshooting services (such as Primefax, later renamed Maxserve), immediately show the top 10 solutions with the highest probability of fixing the underlying problem. The technician can either answer additional questions to advance through the troubleshooting procedure, each step narrowing the list of solutions, or immediately implement the solution he feels will fix the problem. These services give a rebate if the technician takes an additional step after the problem is solved: report back the solution that actually fixed the problem. The computer uses these reports to update its estimates of which solutions have the highest probability of fixing that particular set of symptoms.^[1]

Half-splitting

Efficient methodical troubleshooting starts with a clear understanding of the expected behavior of the system and the symptoms being observed. From there the troubleshooter forms hypotheses on potential causes, and devises (or perhaps references a standardized checklist of) tests to eliminate these prospective causes. This approach is often called "Divide and Conquer".

Two common strategies used by troubleshooters are to check for frequently encountered or easily tested conditions first (for example, checking to ensure that a printer's light is on and that its cable is firmly seated at both ends). This is often referred to as "milking the front panel."^[2]

Then, "bisect" the system (for example in a network printing system, checking to see if the job reached the server to determine whether a problem exists in the subsystems "towards" the user's end or "towards" the device).

This latter technique can be particularly efficient in systems with long chains of serialized dependencies or interactions among its components. It's simply the application of a binary search across the range of dependencies and is often referred to as "half-splitting".^[3]

Reproducing symptoms

One of the core principles of troubleshooting is that reproducible problems can be reliably isolated and resolved. Often considerable effort and emphasis in troubleshooting is placed on reproducibility ... on finding a procedure to reliably induce the symptom to occur.

Once this is done then systematic strategies can be employed to isolate the cause or causes of a problem; and the resolution generally involves repairing or replacing those components which are at fault.

Intermittent symptoms

Some of the most difficult troubleshooting issues relate to symptoms which occur intermittently. In electronics this often is the result of components that are thermally sensitive (since resistance of a circuit varies with the temperature of the conductors in it). Compressed air can be used to cool specific spots on a circuit board and a heat gun can be used to raise the temperatures; thus troubleshooting of electronics systems frequently entails applying these tools in order to reproduce a problem.

In computer programming race conditions often lead to intermittent symptoms which are extremely difficult to reproduce; various techniques can be used to force the particular function or module to be called more rapidly than it would be in normal operation (analogous to "heating up" a component in a hardware circuit) while other techniques can be used to introduce greater delays in, or force synchronization among, other modules or interacting processes.

Intermittent issues can be thus defined:

An intermittent is a problem for which there is no known procedure to consistently reproduce its symptom.

—Steven Litt, [1]

In particular he asserts that there is a distinction between frequency of occurrence and a "known procedure to consistently reproduce" an issue. For example knowing that an intermittent problem occurs "within" an hour of a particular stimulus or event ... but that sometimes it happens in five minutes and other times it takes almost an hour ... does not constitute a "known procedure" even if the stimulus does increase the frequency of observable exhibitions of the symptom.

Nevertheless, sometimes troubleshooters must resort to statistical methods ... and can only find procedures to increase the symptom's occurrence to a point at which serial substitution or some other technique is feasible. In such cases, even when the symptom seems to disappear for significantly longer periods, there is a low confidence that the root cause has been found and that the problem is truly solved.

Also, tests may be run to stress certain components to determine if those components have failed. ^[4]

Multiple problems

Isolating single component failures which cause reproducible symptoms is relatively straightforward.

However, many problems only occur as a result of multiple failures or errors. This is particularly true of fault tolerant systems, or those with built-in redundancy. Features which add redundancy, fault detection and failover to a system may also be subject to failure, and enough different component failures in any system will "take it down."

Even in simple systems the troubleshooter must always consider the possibility that there is more than one fault. (Replacing each component, using serial substitution, and then swapping each new component back out for the old one when the symptom is found to persist, can fail to resolve such cases. More importantly the replacement of any component with a defective one can actually increase the number of problems rather than eliminating them).

Note that, while we talk about "replacing components" the resolution of many problems involves adjustments or tuning rather than "replacement." For example, intermittent breaks in conductors --- or "dirty or loose contacts" might simply need to be cleaned and/or tightened. All discussion of "replacement" should be taken to mean "replacement or adjustment or other maintenance."

References

^ "Troubleshooting at your fingertips" by Nils Conrad Persson. "Electronics Servicing and Technology" magazine 1982 June.
^ "Hewlett Packard Bench Briefs". Hewlett Packard. Retrieved 14 October 2011.
^ Sullivan, Mike (Nov 15, 2000). "Secrets of a super geek: Use half splitting to solve difficult problems". TechRepublic. Archived from the original on 8 July 2012. Retrieved 22 October 2010.
^ http://www.ocf.berkeley.edu/~joyoung/trouble/page1.shtml

UpToDate Contents

全文を閲覧するには購読必要です。 To read the full text you will need to subscribe.

1. 閉塞性睡眠時無呼吸のある成人の術後マネージメント postoperative management of adults with obstructive sleep apnea
2. Sleep disorders in hospitalized adults: Evaluation and management
3. Long-term neurodevelopmental outcome of preterm infants: Management
4. Mode selection for positive airway pressure titration in adult patients with central sleep apnea syndromes
5. 成人の急性呼吸不全における非侵襲的陽圧換気 noninvasive positive pressure ventilation in acute respiratory failure in adults

English Journal

The effects of timing of exposure to principles and procedural instruction specificity on learning an electrical troubleshooting skill.

Eiriksdottir E1, Catrambone R2.
Journal of experimental psychology. Applied.J Exp Psychol Appl.2015 Dec;21(4):383-94. doi: 10.1037/xap0000065. Epub 2015 Oct 26.
Domain principles provided in task instructions are assumed to help performance as learners can later apply this knowledge when faced with new tasks. The goal of the research was to investigate whether the timing of the exposure to principles-studying the principles before or while completing traini
PMID 26501503

Yttrium-90 Infusion: Incidence and Outcome of Delivery System Occlusions during 885 Deliveries.

Savin MA1, Chehab M2, Campbell JM2, Savin JH3, Cash C4, Wong CY2, Schultz CC5.
Journal of vascular and interventional radiology : JVIR.J Vasc Interv Radiol.2015 Dec;26(12):1769-76. doi: 10.1016/j.jvir.2015.08.003. Epub 2015 Oct 9.
PURPOSE: To evaluate the incidence, cause, and management of delivery system occlusions during yttrium-90 ((90)Y) microsphere infusions and to identify techniques to prevent occlusions.MATERIALS AND METHODS: A retrospective review was conducted of 885 consecutive radioembolization deliveries during
PMID 26481823

Decision Aid Use in Primary Care: An Overview and Theory-Based Framework.

Shultz CG1, Jimbo M.
Family medicine.Fam Med.2015 Oct;47(9):679-92.
BACKGROUND AND OBJECTIVES: Increasing patients' participation in health care is a commonly cited goal. While patient decision aids can promote participation, they remain underutilized. Theory-based models that assess barriers and facilitators to sustained decision aid use are needed. The ready, will
PMID 26473560

Japanese Journal

原因と症状の双方向ブラウジングに基づくトラブルシューティング支援法(プロダクト(もの)・サービス(こと)のデザインとコミュニケーション及び一般)

山下遼,高山千尋,大野健彦
電子情報通信学会技術研究報告. HCS, ヒューマンコミュニケーション基礎 113(462), 107-111, 2014-02-25
問題解決を効率的に行うためには,知識共有によって経験を活かし合うことが有効である.知識共有においては,作業時に利用できるように知識を提示することが特に重要である.本稿では,オンサイトネットワークのトラブルシューティングを対象として,現場観察および実験室実験を実施した.その結果から,知識の提示方法について,従来多く用いられてきたキーワード検索システムでは不十分であり,(1)多様かつ断片的な知識を一度 …
NAID 110009862485

ユーザ端末を対象とした機器名特定システムの開発

美原義行,山本隆二,佐久間聡,山崎毅文,岡本学,佐藤敦
情報処理学会論文誌コンシューマ・デバイス＆システム（CDS） 3(1), 64-76, 2013-03-13
NAID 170000076165

IP電話網で用いる制御信号プロトコル群の解析手法に関する検討(SIPサーバ)

永瀬高志,小林丈朗,三輪直人
電子情報通信学会技術研究報告. NS, ネットワークシステム 112(463), 645-650, 2013-02-28
通信キャリアの提供する大規模IP電話網の保守・運用では,セッション制御の主信号であるSIP信号を解析することで,網の障害状態の多くが切り分けが可能となる.このため網内のSIPパケットをキャプチャし,キャプチャデータの検索やシーケンス表示を可能とするシステムは,障害発生時の一次切り分け等に大きな効果がある.しかし,IP電話のセッション制御では,帯域制御に用いる信号(MEGACO)やPSTN(公衆交換 …
NAID 110009712180

Related Pictures

down and click on Troubleshoot problems How-To: Troubleshoot your Android Device How to troubleshoot your computer. Troubleshoot any Issue Using Windows Help TROUBLESHOOT ] Troubleshoot (Windows 8) Troubleshoot Problems with installing