On a mid-term recently I was asked if nutrition labeling was required by law and I answered No. This is because I have been reading labels since my mother taught me to back in the 1970s. It became a law in 1990. Who knew?
Okay, so I read in the text book that the FDA required labeling. Since I knew from research in the 80s that labeling was voluntary I just figured my text book was wrong. Should have checked my facts. Once again, amazed by ignorance.
So now that I know there is a law I decide, "hey, lets go check it out at FDA.gov". I'm publishing some screen grabs from FDA.gov based on the fair use copyright law. Get over it.
Interesting experience. Being a manufacturing engineer I crunch numbers using basic statistics all day long, or I used to. I can look at a range of numbers and tell you if it makes sense, or, if there is a problem.
So I checked out:
and reviewed how they calculate the info on labels. There was a line on this page that made no sense to me:
So here I am, thinking, "well, if it is the minimum value or the maximum value how can it also be the mean?" A mean, or average, (by definition) is always going to be greater than the minimum and less than the maximum. So I scratched me head. "The mean is the maximum or the minimum, depending". I must have missed that day in grade school.
The "cv" or "coefficient of variation" is just a percentage based on the standard deviation used when negative numbers make no sense. No problem there, how they use it though....
Being me, I checked their math and all the math on the page between the sample values and the ridiculous "mean is the minimum some cases and maximum in other cases" statement is correct. But, my experience is not in math and compared to a lot of people I know my math sucks. My expertise is recognizing problems and developing solutions.
Lets take a look at the values in their 12 item sample:
Looks good? No, looks very bad! (so I have been studying Spanish lately, sue me)
Why? This is not a normal distribution so typical rules don't apply. This is a skewed distribution which means I need a larger sample size. All distributions are normal distributions IF the sample size is large enough. When I have a skewed distribution it tells me there is something influencing the sample distribution (like a manufacturing process issue) OR the sample size is too small. A skewed distribution means the sample IS NOT representative. Maybe the protein level is higher because some of the sample units was pissed on and there was a lot of protein in the piss OR maybe the sample size isn't big enough. I don't know. In any case, this sample is fucked.
I charted the information because I am lame and I have no fricking life OR maybe just because I don't like stuff that does not add up and I like to check it out.
So what does this confusing bunch of numbers mean? I used Libre Office which has no analysis pak or histogram chart creating procedure.
Look at the numbers on the left. Those are the values from the FDA web page I gave the URL to at the beginning of this blog. I included a screen grab and you can check me if you want.
Underneath that set of numbers is a bunch of pretty self explanatory numbers. Average is =average(A2:A13). STDEV is =stdev(A2:A13). Max is =max(A2:A13). Min is =Min(A2:A13). Range is =(max-min). the number without a title, 3.3 is the minimum plus half of the range. =A16+(A18/2). This gives us a Median or middle. The mean does not equal the median so the distribution is skewed.
The last two numbers, calculated max and calculated min, are based on a normal distribution. I take the mean and add three standard deviations and subtract three standard deviations. =(A14+(3*A15)) and =A14-(3*A15).
With a normal or Gaussian distribution (Bell Curve) 997 out of 1000 samples will fit within plus or minus 3 standard deviations or a range of 6 sigma. Huh? If this distribution was normal, about 66% of sample items would fit within plus or minus One standard deviation of the mean, about 92% of items would fit within plus or minus Two standard deviations and 99.7% of samples items would fit within plus or minus Three standard deviations.
Let's look at my homemade histogram chart. A histogram is just figuring out how many times a particular value occurs. If we look at the chart the value 2.8 occurs 3 times and the value 3.1 occurs 3 times. 4.1 occurs 1 time and that value skews the chart. I figure that 4.1 is the sample some meat eating rodent pissed on. The Modality, or most items, in this chart is about 2.95. So my mean is 3.1, my median is 3.3 and my modality is 2.95. This analysis is a FAIL.
If this were a normal distribution we could fit a bell shaped curve over those columns. It isn't normal so either my sample is fucked or it isn't big enough. I need to sample more items in this case. I should probably clean everything up and re-do the experiment and then compare the results.
Sampling is expensive so typically a manager will just say "pitch the out-liar(sic)" since it makes more "bean-counter" sense to throw data away than spend money increasing the sample size. Also, sometimes samples are contaminated in the lab so the 4.1 out-liar might be contaminated. I don't know so my best option is to spend money like it is going out of style and sample a few hundred items. I just need to get my boss to approve increasing my budget ten fold so I can test the protein level in broccoli. Heck, we are in a deficit and spending money like crazy anyway.....
I'm leaning toward the "some rodent came and pissed on the broccoli and contaminated my sample" theory. I could be wrong though so I should wash the next sample, which won't help because the piss soaked in. Never thought of that? Too bad, now that image is in your head and now every time you eat organic foods you know that the shit they spread to fertilize the crop got onto and into the crop. So sorry. Get over it.
Enough frivolity (got to love Uncharted).
Lets look at the minimum and maximum for a moment. If the FDA was really interested in publishing either the minimum or the maximum they would publish a number based on TWO or THREE standard deviations from the Mean of a NORMAL distribution. In this case the minimum amount of protein should be 1.86 grams.
In other words they would be publishing the "calc min" or the "calc max" and not some "mean is actually the minimum or maximum depending on how you look at it" kind of number.
But wait, is publishing the minimum number a good idea? Norflok and Wayman! Some people, like those who have had a kidney transplant, are protein intolerant and should know the Maximum protein that they could be eating. IMO, the calc max and calc min should be published on the label, after sampling enough to develop a normal curve. Sample size is important in developing accurate statistical predictions.
None of this is happening so I figure labeling is screwed. Better than nothing I guess. Oh well, I figured that labeling was screwed anyway so no news here, just proving my theory that labeling is screwed and the FDA has some interesting ways of justifying their methods like calling the "mean" the minimum or the maximum, depending.