Failure cause categories
What are failure cause categories
Failure cause categories are overall reasons for having failure modes. Recommended failure cause categories are:
- Fundamentally wrong design
- Manufacturing failures
- Operational condition failures
- Human errors during operation
- Interface failures
An introduction to the above failure cause categories follows.
Fundamentally wrong component design
Failure modes may be caused by fundamentally wrong component design. These failures will exist in the component from the first day of operation and can only be eliminated by modifications of the design or some times modification of the operational procedures. These failures may be detected during the functional testing of the prototype before the product is released on the market. But not always. These failures may also be detected during the first period on the marked due to inadequate prototype testing.
General examples of fundamentally wrong component design are:
Response not achieved at all
- The product function is missing
- The product components are badly fitted
- The product is badly fitted to required product accessories
- The product is badly fitted to its interfaces
- The product receives no input signal
Response outside performance limits
- The product receives limited power
- The product components are badly fitted
Response not intended
- The product function is executed in wrong sequence
- The product function is executed with wrong timing
- The product collides with its interface
- The product is made to something completely different
Specific examples of fundamentally wrong component design are:
Response not achieved at all
- Wrong functional structure worked out during the conceptual design phase
- Bad geometric tolerances between valve stem and guidance (HH friction)
- Wrong thread types and thread tolerances to required product accessories
- Wrong thread types or thread tolerances to fit with interfaces
- Wrong seal bore diameter or tolerances to fit with interfaces
- Pressure sensors for detecting start signal placed in an area without pressure
Response outside performance limits
- Small diameter and long hydraulic actuator power line (pressure drop)
- Small diameter and long electric actuator power cables (voltage drop)
- The gearbox has too high gear ratio (strong but slow)
- Bad geometric tolerances between valve stem and guidance (H friction)
Response not intended
- The packer is set prior to the anchor
- The liner packer is set after the cement has cured
- Premature actuation caused by dynamic pressure build-up in hydraulic return line
- 90 mm actuator piston stroke instead of 100 mm as required
- The micro-controller is programmed to open when it should close
Failure modes caused by fundamentally wrong design, where the root causes are human errors within the design team are typical warranty cases. Failure modes caused by fundamentally wrong design, where the operator has specified the design in the requirement specification (e.g. 90mm piston stroke when 100mm is required) is normally not warranty cases. Warranty can be discussed in cases where the operator has not informed the vendor about issues critical for the design (e.g. interfaces) and the vendor has not asked for such information or given such information in their product specifications.
Relevant links are:
Manufacturing failures
Manufacturing failures include:
- Manufacturing not in accordance with the manufacturing drawings
- Assembling not in accordance with the assembling procedure or inadequate procedure
- Materials not in accordance with material specifications
- Parts exposed to aggressive chemicals during machining and assembling
These failures will exist in the component from the first day of operation and can only be eliminated by acceptable machining complexity for the applied machining equipment, qualified operators of the machining equipment, assembling procedures that are correct and easy to understand and accuracy when making the material certificates. Manufacturing failures are normally detected during the quality control (QC) or the factory acceptance test (FAT). But not always. These failures may also be detected during installation or early operational use.
Operational condition failure
Failure modes may be caused by operational conditions (environmental conditions and physical loads) during any product life cycle. Failures during the manufacturing phase are not included since these already are included in the above failure cause category.
Failure modes may be caused by operational conditions during:
- Factory acceptance test (FAT)
- Storing
- Transportation
- Platform handling
Failure modes caused by operational conditions during the above life cycle steps typically occur early in the products operational life time. Examples are water used during the FAT that may cause corrosion during the following product storing, shock loads during transportation, shock loads during platform handling and electromagnetic noise during transportation. Spurious activation of electronic equipment caused by electromagnetic noise is not always detected by ordinary qualification testing in the laboratory, since the source could be missing during these tests. Example: A micro-controller operated actuator may be spurious activated during storage, transportation or installation due to intensive electromagnetic noise in these places combined with limited protection.
Failure modes may finally be caused by operational conditions during:
- Run in hole
- Installation
- Operation
- Maintenance
Failure modes during these operational phases may be cased by time related failure mechanisms that typically occur after some time in operation. Failure modes during these operational phases may also occur rapidly if there is bad compatibility between the materials and the environment. One example is a gas lift valve (GLV) with the failure mode 'Leakage between the well and the annulus' during the installation test. The reason in this case is swelling of the seal stack elastomers during run in hole, resulting in cut seals when jarred into the side pocket. This failure mode will typically be detected during the installation itself or during the installation test.
Failure modes during the above operational phases may also be caused by accidental exposure of e.g. aggressive chemicals or unexpected shock loads during the operation. These failure modes will typical have a random distribution along the product operational lifetime. Have in mind that all known worst case scenarios during and after run in hole shall be included in the requirement specification and the equipment shall be designed to resist these loads. Example: Slam open and slam shut are normal loads for a downhole safety valve (DHSV) and should always be included in the requirement specification. At least one cycle and some times multiple cycles.
Failures related to the above operational conditions are typically found as damages in the component surface (like cracking, wear, corrosion and other degradation mechanisms) or in the gap between components without damaging the part itself (like particle wedging, deposits, etc.)
Specific examples of operational condition failures are:
Response not achieved at all
- Valve stem broken due to stress corrosion cracking
- Valve flow tube buckling during slam shut (flapper)
- Particle wedging between valve stem and guidance
- Deposits like scale and corrosion products on stem or flow tube
Response outside performance limits
- Deposits like wax resulting in increased friction
- Valve stem galling due to peak loads during assembling
- Valve stem galling due to peak loads during operation
Response not intended
- Premature actuation due to inadequate protection against electromagnetic noise
- Wrong sensor signal due to deposits like scale on the transducer
- Wrong sensor signal due to induced current in electric wires
- Leak current due to dielectric breakdown in CMOS transistor
- Short circuit due to electromigration (hillock) in integrated circuit
- Open circuit due to electromigration (voids) in wire connection
Failure modes caused by operational condition can only be eliminated by modifications of the design (gap, tolerances, etc), material selection, surface treatments or by controlling the operational conditions. Failure modes caused by operational conditions can to some extent be determined during the reliability testing before the product is released on the market. But only if the operational conditions are known and the tests are realistic.
Failure modes caused by operational conditions are typical warranty cases when these conditions are specified as acceptable in vendor's product specification. Failure modes will also be warranty case if the operational conditions are specified by the operator in their requirement specification or if the failure modes are caused by operational conditions prior to run in hole if the vendor is responsible for these operations.
Relevant links are:
Human errors during operation
The equipment may be exposed to human errors during any product life cycles. Failures during the manufacturing phase are not included since these are included in a dedicated failure cause category. Remaining life cycle phases are:
- Factory acceptance test (FAT)
- Storing
- Transportation
- Platform handling
- Run in hole (RIH)
- Installation
- Operation
- Maintenance
Human errors during these life cycle phases could be a direct source for operational conditions. Examples are shock loads if the operator is opening the product packing with inadequate tools, corrosive fluids used during product preparation, scratch in the seal surface when the operator is changing an o-rings with a sharp screw driver, etc. This failure cause category shall be used instead of 'Operational condition failures' only in cases where human errors are the direct source of the operational conditions.
Specific examples of failures related to human errors during operation are:
Response not achieved at all
- No movement of valve due to wrong calibration of actuator transducer
- No movement of valve due to physical coverage of sensors (plug, etc)
- API flange leakage due to low preload of flange bolts
- API flange leakage due to incorrectly fitted flange gasket
- API flange leakage due to misalignment of flange surfaces
- API flange leakage due to scratches in seal surface made by the installation tool
Response not intended
- Compressive hydraulic oil due to rapid filling resulting in air bubbles
- Compressive hydraulic oil due to forgotten vacuum in chamber prior to oil filling
- The valve is left in wrong position during maintenance resulting in leakage during start-up
- The valve is accidental activation by an unintended push on the activation button
The operator who is directly involved in the situation is normally blamed for failures caused by human errors without going into the root causes like training, procedures, management, etc. This is not a recommended approach. Always study the root causes. The links below provides additional knowledge about human errors and potential causes.
Interface failures
Component failures may also be caused by failures in the component interfaces. The interface failures may be caused by fundamentally wrong interface design, manufacturing failures, operational conditions failures or human errors during operation. Typical interface failures are:
- Wrong interface thread type
- Wrong interface tolerances
- Corrosion in seal bores
- The component receives wrong signal from its interfaces
Example 1: A gas lift valve (GLV) from vendor A is to be installed in a side pocket mandrel (SPM) from vendor B. The diameter of the side pocket is smaller than specified in the standard. The GLV will register the failure mode 'Fail to drift GLV into SPM', but the failure cause category for this GLV failure mode will be 'interface failure' caused by fundamentally wrong interface design.
Example 2: A gas lift valve (GLV) from vendor A is to be installed in a side pocket mandrel (SPM) from vendor B. The seal bore in the side pocket is degraded by e.g. pitting corrosion. The GLV will register the failure mode 'Leak between the well and the annulus', but the failure cause category for this GLV failure mode will be 'interface failure' caused by operational conditions.
Example 3: An actuator from vendor A receives a false electric signal from the transducer made by vendor B during a normal operation or during a situation where the transducer has a failure mode (wrong interface failure mode response).
Updated: 07.05.2010
Copyright © 2009 ExproSoft AS - Phone: +47 73 200 400.
Any use of information on this web site is subject to terms of creative commons license. The information on the site are under no circumstances intended to substitute individual counseling.
ExproSoft will accept no liability for any type of use of this information as a result of information being inaccurate or incorrect.