Talk:Design of experiments
From Wikipedia, the free encyclopedia
Design of Experiments topic is a can of worms. For my work on nonlinear empirical modelling, none of the design approaches in literature suit me. I too need to maximise the information content in the experiment results, but my idea of information seems a lot different from the simplistic statistical measures.
In industrial settings, you cannot always rely on experimental results. Repeatability is often unknown. Instead of repeating several experiments, I prefer a different approach which is based on the measurability of second partial derivatives.
Any comments welcome.
Taguchi Methodology is a powerful tool in Experimental Design, It takes advantage of Orthogonal Arrays to predect the best combination of factors in industral manufacturing specifications. The real power comes from the fact that it will predict the best combination of factors even if these did not take place as one of the tests. This is because the balanced arrays take into account interaction between factors and too a large extent cancel them out. Using only eight tests on can acheive similar result to a full factorial test. These factors can then be verified alongside the current specificantion, to confirm the findings. G4sxe 00:37, 6 January 2006 (UTC)G4sxe
[edit] Dispute with AbsolutDan
I have just added again a link to http://www.6sigma101.com/glossary/design_of_experiments.htm with definitions related to Design of Experiments. If AbsolutDan wants to remove this link again, can he justify the removal as an editor knowledgeable on the topic of Design of Experiments? Can anyone see anything wrong with the contents of http://www.6sigma101.com/glossary/design_of_experiments.htm? TIA
6sigma101.com has been spammed across multiple articles by several usernames and IPs which appear to be working in concert. Please see the following:
- Special:Contributions/203.214.69.7
- Special:Contributions/Goskan
- Special:Contributions/Glen netherwood
There was a heated discussion about the link here: Talk:Six Sigma where it was determined the link was not extremely helpful and is to a site that is intended to promote their services (6sigma training). If you do a WHOIS on 203.214.69.7 ([1]) and 203.214.51.192 ([2]), you can see they both come from the same ISP. It seems apparent that this is simply the new IP of the above contributor(s), back to try to include links to their site, in blatent violation of guidelines. --AbsolutDan (talk) 17:31, 4 August 2006 (UTC)
As always AbsolutDans argument totally ignores the merits of the link and the content, and instead focuses on continuing his war on perceived spam. Can anyone help me here with a professional review of the proposed link? TIA
[edit] Professional Review Requested
I would like to focus on the suitability of the link to Design of Experiments Terms in Six Sigma Glossary - http://www.6sigma101.com/glossary/design_of_experiments.htm. Is it suitable content for inclusion? TIA
- A link is not content. After your aggressive campaigning to include links to your company dies down, editors might even find consensus to cite it, though it doesn't look like a primary source. Femto 12:53, 5 August 2006 (UTC)
[edit] Opening sentence
At this time, the opening sentence says this:
- Experimental design is a research design in which the researcher has control over the selection of participants in the study, and these participants are randomly assigned to treatment and control groups.
The appears to presuppose that the statitical units are "participants", a term usually used only when they are humans, as opposed to stalks of wheat, machines, apples, pizza recipes, mice, etc. That assumes too much. "Research design" is presumed to be understood by the reader who is not acquainted with this subject. That also assumes too much. That all experimental designs involve treatment and control groups also assumes too much (see in particular the concrete example I recently added). I'd re-write it if I were confident of what it ought to say instead, and maybe at some point I will, unless someone else does it first. Michael Hardy 02:12, 16 December 2006 (UTC)
[edit] comment moved from article
An unsigned comment by user:192.88.212.44 was added to the article, in parentheses:
- Weigh each object in one pan, with the other pan empty. Call the measured weight of the ith object Xi for i = 1, ..., 8. (This is a confusing statement. How can you measure any object in one pan, with no standard mass in other pan?)
You just have to read what it says above about the nature of this device: it's a scale that reports the difference between the two weights. Michael Hardy 23:01, 5 January 2007 (UTC)
- Yikes, I had exactly the same confusion when reading the article more than a year later. I expand on my confusion in discussion further below. Briefly, you can say there is "a scale that reports the difference between the two weights", but if I cannot envision how such a scale could possibly exist, I end up going through gyrations to re-interpret what you must mean by "measuring" and so on. Or I am frustrated that you are telling me to hypothetically believe something which can't exist, does exist, and then how could this example possibly have any practical application. See further below for detailed explanation of why i, and apparently others, don't get what is being explained. doncram (talk) 05:32, 28 March 2008 (UTC)
[edit] note about the example experiment
The article states that according to modern experimental design standards, the only important element missing from the experiment with the sailors is randomization of the treatments. However, another pair of sailors who received no treatment is needed for this to be an "experiment".
[edit] Distinction between replication and repetition
A distinction needs to be made between replicates and repetitions. For example, three readings from the same experimental unit are repetitions while the readings from three separate experimental units are replicates. The error variance from the former is less than that from the latter because repeated readings only measure the variation due to errors in reading while the latter also measure unit-to-unit variation. -- Jeff Wu, Michael Hamada, Experiments. Planning Analysis, and Parameter Design Optimization. In this article, the subsection titled replication actually talks about repetition. As far as I remember, the terms are sometimes used interchangeably in literature, but I think a definition like this should be used. Sergivs-en 03:07, 1 May 2007 (UTC)
[edit] The scale example is a little confusing
Part of the advantage of the "designed" experiment is that the scale gives the difference between two measurements for "free." A better example, I think, is to use a single pan scale. In this case you have to take the difference between two measurements and thus you have 4 random variables with variance 2*v rather than 8 random variables with variance v. The designed experiment is still better than the single factor but only by a factor of 2, not the factor of 8 the example shows. —Preceding unsigned comment added by JustinShriver (talk • contribs) 21:18, 22 March 2008 (UTC)
- You're mistaken. In every case, what is reported is the difference between the two pans. That's what it says. You need to read carefully. The variance of the difference is σ2. In what you call the "designed experiment" (although really, they're both designed experiments), you have a sum of eight differences, each with variance σ2. The variance of the sum is 8σ2. Then you divide the sum by 8, and hence its variance by 82 = 64. So the result in the article is correct. Michael Hardy (talk) 00:07, 23 March 2008 (UTC)
-
- I don't find the example "simple" at all, and because it is described as being simple, and then I don't get it, then I don't like something (the article? myself? the writer of the article? statistics in general?). It rubs me the wrong way. I do appreciate that there are inefficient and surprisingly efficient approaches to designing experiments, having to do with orthogonal arrays and Latin Square designs and so on. I think another example should be given, or this example should be explained better, or it should be labelled something other than "simple". I suspect it is a clever example, one that is appreciated by some elite who are proud that they understand it thoroughly. And it is indeed impressive to get a factor of eight improvement in some efficiency measure. But I would prefer a more understandable example, and be happy to settle for much less than an 8-fold efficiency difference between good and bad design. doncram (talk) 02:49, 28 March 2008 (UTC)
- Well, OK, you've succeeded in taking me by surprise. I'll see if I can rephrase it. Michael Hardy (talk) 02:52, 28 March 2008 (UTC)
-
- Oh my, what a quick response! I think i should apologize for my tone a bit, I got a bit carried away there. It's just I was irritated that i did not understand the example, while I think I should have. Please don't take offense. I expect that if I tried to pick a simple example, and then explained it, it wouldn't be so simple after all. :) (Hmm, which would I pick?) I will try to read the example again now. doncram (talk) 03:05, 28 March 2008 (UTC)
-
-
- Okay, I could now explain a couple specific ways in which I think the example could be explained better. I find it takes me some time to get this out, so I'll just explain one problem that I have:
-
-
-
- First, it is hard for me to understand the basic setup, what the weighing system actually is. I think i am like a lot of readers will be, without a lot of familiarity with pan balances. It is actually stated that we'll use "a pan balance that measures the difference between the weight of the objects in the two pans." I can't picture how it could actually measure the difference. I can see that it can detect which side is heavier, that is all, but not to measure the difference in weight. So I tend to think you didn't mean that it measures the difference, and you will just compare which is heavier. Maybe by "measure a difference" you mean detect that there is a difference, like take a 0-1 measure of which is heavier. Then you go off into multiple weighoff comparisons, and I think you are doing some kind of clever binary search. I just can't get what you mean, because I don't see how you could measure the difference with just a two-pan scale and the objects you are trying to compare. And I am right, you can't with just a simple mechanical scale, and just the objects to be weighed. When I think about how you actually could measure a difference, I realize you could do it if you had a set of little measured gram weight thingies, and the way you would take a measure is to put a bunch of those thingies on the lighter side of the scale, until it balanced evenly, and that would take some time to get the balance right, so like there is some cost to taking a measurement. But you didn't say you had the set of little gram weight thingies. Now I realize if you had those, you could use those to take a measure of each of the individual objects, one by one, and use up eight weighing opportunities. And that is in fact measuring the difference between the weight of one object, vs. no weight on the empty pan (except for the little weighing thingies you put on). Or for the same number of weighing opportunities, you could make the eight other comparisons, or no I mean you could take the eight other difference measurements. So anyhow, you need to explain how the weighing system works, and that there are the little gram weight thingies available. P.S. I know you gave a wikilink for pan balance, but I didn't go off there, and I still have not. I thought I understood naturally, that what a 2-pan balance system does is compare which side is heavier, not that it measured a difference in weight. (And that is what I still think, too. But I suspect that there may exist some kind of electronic pan-balance tool which gives out a measure of differences, and maybe that's what you think we should know what that is like. But if you want to tell me about that, I think I'll get stuck on the electronic scales at the deli counter, where it gives a readout of weight, and maybe there's an issue about tare or not, but it is not measuring a difference of anything, it is just measuring the weight of something directly as far as I know.) Does this help? doncram (talk) 04:59, 28 March 2008 (UTC)
-
-
-
-
- Continuing just on that, now I go see pan balance, and it turns out to have redirected me to Weighing scale with a picture of the kitchen/deli counter type scale, which does not help me at all!
-
-
Okay, yes later in the article it talks about other scales, too. But this pic is very explicit about showing you have the whole set of little gram weight thingies, maybe they are called "standard weights", not just the 2-pan scale. Or is this not the two-pan scale? There is a different kind of two pan scale pictured further down the article. Hmm, maybe that one has dials and things you can use to take a measure of difference. Well if that is the case, then, why, I never would have believed it could exist. Can you in fact take a measure of difference by moving dials on a mechanical 2-pan scale?! To keep it "simple", why not use the "historical" scale with the weight thingies, as in the picture here, whatever that is properly called? :) doncram (talk) 05:14, 28 March 2008 (UTC)
-
-
- Second, I was going to inform you that the stated example had a display problem that showed 1 2 3 4 5 6 7 8 all together in the first weighoff, while you mean to show 1 2 3 4 was balanced against 5 6 7 8. I was going to tell you what internet browser I was using, and so on, to explain to you that what I was seeing must be different than what you are seeing or intended. However, now I see that in the edit history, User:Mdevelle was already fixing it, as in this recent version where it is balanced properly and you are expressing frustration that Mdevelle was not understanding that it was intended to be imbalanced that way. I think that Mdevelle and my perception that this must be a mistake in your presentation (or whoever authored it first), is because we really really do not believe you are talking about a scale that takes a continuous measure of difference, rather than a binary (lighter vs. heavier) measure of difference. It is pointless to compare all the weight vs. none of the weight, you get no information from that, of course all of the weight is heavier. So your first comparison must be wrong, it must not be what you intended to convey. The problem is with the browser, or with the formatting. I think it is great that Mdevelle figured out how to fix the format! Thanks Mdevelle! However, if you do succeed in getting us to believe you have a scale that measures a difference, as in a "historical" scale balance with an auxiliary set of standard weights, then we could understand that in the first trial you are taking a measure of the total of all eight of the items. Although it still may help for you to narrate that in the first trial you are doing that, you are taking a measure of the total weight of all eight items combined. Because it certainly does look different than all the other trials. Visually, it looks like you are making a mistake, so you need to counter that by explicit narration of what you are doing. Hope this helps. doncram (talk) 06:05, 28 March 2008 (UTC)
-
It's definitely NOT a binary measure. It does not merely report which is heavier. If it were, then what would be the point of talking about the variance? Mdevelle was clearly wrong. Please look up the reference by Hermann Chernoff. Michael Hardy (talk) 16:02, 28 March 2008 (UTC)
- You misunderstood me. I do now understand that the scale is to be believed to measure, on an interval measurement scale, the difference between the weights of objects in two pans. I was trying to explain to you how a reader, myself included the first time through, could easily misunderstand what was meant to be conveyed in the example. I do think the exposition of the example needs to be improved. If you read my discussion above, I think I make it clear why it would be helpful to mention having an auxiliary set of standard weights as part of the available equipment in the example. Given that no set of standard weights is provided, it is highly reasonable for a reader to cast about for an alternative explanation of something, as something is clearly wrong in the setup. It is highly reasonable, I think, for a reader to hypothesize that the example is talking about a binary measure, which measures on an ordinal measurement scale, which merely compares which is heavier. True, this hypothetical understanding of the situation does not hold up either, for example because it would not be consistent with talking about the variance, as you point out. But the fact that you want the reader to understand differently than the reader does understand is a problem with the exposition of the example, not a problem with the reader. I may try making a few edits myself to clarify the example in the article, perhaps, later. I am sorry if I upset you by first actually misunderstanding and then seeming to continue to misunderstand. doncram (talk) 17:22, 28 March 2008 (UTC)
[edit] "Vandalism"?
Now I'm getting accused of "vandalism". I have a Ph.D. in statistics and I care about the subject. Very few people have more experience editing Wikipedia math articles than I do. Very few people have more experience editing Wikipedia generally than I do.
- Please actually check the math.
- Please tell us why it could make sense to talk about "variance" if it's only a binary measure reporting which is heavier.
- Please go to the library and check the cited reference by Chernoff. It will back me up.
- Michael Hardy (talk) 16:17, 28 March 2008 (UTC)
-
- I am sorry if my comments on Talk:Design of experiments contributed to Hot200245 believing that Hardy's edits constituted vandalism, and then justified Hot200245 rolling back several edits by Hardy to Mdevelle's last contribution. Hardy was certainly not vandalizing, and Mdevelle's edit that Hardy reverted was clearly wrong. I am perhaps disagreeing with Hardy about the exposition of the example, in terms of what is the best way to explain it, but Mdevelle's attempt to "correct" the example is wrong. I'm sorry if my stated appreciation of Mdevelle's edit was misleading. I found it amusing in that Mdevelle showed that he/she had the same difficulty that I did in understanding the exposition, and I appreciated Mdevelle's effort. But what Mdevelle actually changed the exposition to was wrong.
- I think it was understandable that Hot200245 could have believed it was helpful to roll back the edits, though, so let's not take offense, let's just talk here about the example and then improve it in the article, okay? doncram (talk) 17:10, 28 March 2008 (UTC)
- It is "obvious" that Michael Hardy is correct here. Just consider the contribution of object 8 in Michael's weighing scheme. It is always in the left pan, so makes the same contribution to each of the Y's. It follows from the formula for the estimate, which has 4 plus signs and 4 negative signs, that the contribution of object 8 to the estimate for object 1 entirely cancels out, as it should. If object 8 were in the right pan on the first weighing, this cancellation would not happen.
- However perhaps the description of the experiment should be clarified a little by:
- saying that additional weights are added to make the pans balance;
- saying explicitly that the result of the first step is an estimate of the total weight of the objects so as to emphasize that what appears is definitely what is meant.
- Melcombe (talk) 17:25, 28 March 2008 (UTC)
[edit] some answers
A rather large amount of material appears above. I'll answer them one by one but not all of them right now. I'll start with the one about how the mechanical balance can measure the difference. The point is that it doesn't matter. It's like simple arithmetic problems that say "If Johnny has 14 apples in each of his three pockets, then how many apples does he have in his pockets?". The question of whether anyone could really fit that many apples into a pocket is beside the point. More later...... Michael Hardy (talk) 03:34, 29 March 2008 (UTC)
[edit] the straightforward approach is 8 times as good as the indirect approach
Please take this lightly. As explained in the example, the straightforward approach to using the scales is to measure each of the eight objects once, directly yielding 8 measures. The indirect approach involves comparing 8 sets of objects according to a complicated schedule that yields nothing of direct use. But that schedule is cleverly arranged so that with additional calculations one can retrieve an estimate of the weight of the first object, after all. It is asserted that it provides a relatively good estimate of that one object's weight. To believe this assertion, you have to believe a goodly host of somewhat unlikely matters:
- that the schedule is set up cleverly and correctly to accomplish what it intends
- that the person implementing the weighings is able to adhere to the schedule, perfectly, in the series of weighings
- that the person implementing the weighings WANTS to adhere properly to the schedule (but consider the motivations: if one of eight fishes is being weighed to be sold, wouldn't the weigher have an incentive to switch a fish or two here or there, to reach a more favorable result in the end?)
- that there is no recording error of what the 8 weighings measured (and note, there could possibly be very odd numbers, including negative numbers, which do not meet a basic check test like "oh, here's a big one, this is what is measured, that sounds about right, yes it hefts just about that heavy" which would apply for weighing a single object)
- that there is a computer or abacus or other adding machine available
- that there is a computer or other dividing machine available
- that the person operating the computer or other calculation machines can perform the calculations with no errors
- that the person operating the calculation machines WANTS to reach the accurate answer, and is not in cahoots with the seller or the buyer
- That the scales and the calculator have adequate firewall protection from random hackers
- Etc.
Okay, but suppose you believe ALL of the above conditions are met, what do you get? From the first, straightforward approach, you get measurements of 8 objects. From the indirect approach, as explained, you get one (better) measurement of one object. Therefore, I submit the straightforward approach is EIGHT TIMES BETTER!
Again, please take this lightly. :) doncram (talk) 15:39, 29 March 2008 (UTC)
-
- Yes, and are we to believe that Johnny really has 14 apples in each of his three pockets? Michael Hardy (talk) 18:26, 29 March 2008 (UTC)
-
-
- Well, if I don't believe that the apples can fit in a pocket, I can reinterpret what you meant to say as, the apples are in a satchel. Or I can suppose that maybe in the writer's part of the world, a "pocket" means a satchel. And these interpretations don't interfere too much with my understanding the example. :) doncram (talk) 19:11, 29 March 2008 (UTC)
- By the way, I meant to say Thank you, for editing the intro to the example so that it no longer asserts it is simple. That actually does help me with the readability of it, and I think it will help other readers to get into trying to understand it, too. Thanks. doncram (talk) 19:11, 29 March 2008 (UTC)
-
(unindent) I guess i expected the example would be edited by now. My point that eight items are measured in the first approach, and only one item is measured in the second approach, still is apparent in the example presented. I am supposing that perhaps there are other calculations you can do, to extract clever measures of the other seven items, given the eight weighings. But I am not sure, a formula to compute the weight of item 2 is not obvious to me. So I am tending to think that the example is unfair again, it is saying only that the complicated approach gives a good measure of item 1 weight, and compares that to a single measure of item 1's weight, when it should compare it to the info value of eight weighings of item 1. doncram (talk) 01:47, 18 April 2008 (UTC)
[edit] Trying again
The example explicitly compares one weighing of one object (with variance sigma-squared) vs. an approach involving eight weighings of eight objects (with variance for the one object at sigma-squared/eight). If you weighed one object eight times, what variance do you get, is it sigma-squared/(sqrt(8) ? If so, the relevant improvement is not 8 times but it is sqrt(8) instead. Anyhow, the improvement is not eightfold. You have to compare eight weighings of one object vs. the suggested schedule of eight weighings of multiple objects, or the example is ridiculous and makes no valid point at all. doncram (talk) 16:22, 6 May 2008 (UTC)

