Recently on a popular Six Sigma site the following question appeared:
"I have a question if I have variable data ( that is not normally
distributed) I then transferred it to Attribute data and worked out the
DPMO from the opps/defects. If the DPMO is normally distributed can I
carry on using stats such at t – tests etc. Or because it is originally
attribute data I should use chi squared etc? Any advise appreciated."
From this question, you could run a three day workshop. My short attempt at an answer included:
"As others have said, stay with the continuous data. Before doing
anything else put the data on an appropriate control chart and learn
from the special causes. As Shewhart noted: things in nature are stable,
man made processes are inherently unstable. I have taken this from
Shewhart’s postulates. T test and other tests all rest on the assumption
of IID; Independent and Identically Distributed. If there are special
causes present these assumptions are violated and the tests are useless.
Even though the “control chart” show up in DMAIC under C for many
novices, it should be used early. Getting the process that produced the
data stable is an achievement. It is also where the learning should
start. Calculating DPMO, and other outcome measures can come later;
after learning and some work. Best, Cliff"
Why the fixation on outcomes, calculating capability, DPMO and the like? Without any knowledge about stability of the data such calculations are very misleading. In 1989, I sat in a workshop where Dr. W. Edwards Deming made the following comment, "It will take another 60 years before Shewhart's ideas are appreciated." At the time, I thought he was nuts. Control charts were everywhere. Then they disappeared. Now I see Deming as a prophet.
Historically, we are going through a period in improvement science that is not unlike the dark ages. We have people grasping for easy path and quick answers generated by the computer that might as well be "unmanned." Getting the process stable is an achievement! Our first move with statistical software should not be a normality check, but a check of the data to see if we have data that is stable and predictable. If we have such a state, then our quality, costs and productivity are predictable. Without this evidence, we are flying blind.