Discussion:
Real-world bridge dealing results
(too old to reply)
Douglas
2017-02-24 20:09:28 UTC
Permalink
https://www.nbbclubsites.nl/sites/default/files/groepen/5073/bestanden/kaartverdeling%20na%208544%20handen%20per%2013-2-2017.pdf

This Internet link should bring you a PDF page of more than 2000 hand-dealt bridge deal records from a club in the Netherlands.

Here is my statistical summary of the 8544 bridge hand-types listed: SDem is short for "standard normal deviation from the expected mean."

4432 & 5332: 3,270 @ 2.29 SDem.

Lest common 34: 2,240 @ <5.48> SDem.

Conclusion: They are purported to be a faithful recording of hand-dealt club deals for several years now. Category 1 fully exceeds 1.64. Category 2 is way, way beyond <1.64> in the negative. It has me reconsidering whether the Abingdon study data contained a transcription error, or not, after all.

Clearly hand-dealt deals. Check out the overage in 4333 hands. I wonder if those club members mutter about how many there are all this time!

Douglas
Douglas
2017-02-26 04:56:40 UTC
Permalink
Post by Douglas
https://www.nbbclubsites.nl/sites/default/files/groepen/5073/bestanden/kaartverdeling%20na%208544%20handen%20per%2013-2-2017.pdf
This Internet link should bring you a PDF page of more than 2000 hand-dealt bridge deal records from a club in the Netherlands.
Here is my statistical summary of the 8544 bridge hand-types listed: SDem is short for "standard normal deviation from the expected mean."
Conclusion: They are purported to be a faithful recording of hand-dealt club deals for several years now. Category 1 fully exceeds 1.64. Category 2 is way, way beyond <1.64> in the negative. It has me reconsidering whether the Abingdon study data contained a transcription error, or not, after all.
Clearly hand-dealt deals. Check out the overage in 4333 hands. I wonder if those club members mutter about how many there are all this time!
Douglas
Here are exactly comparable 8544 bridge hand-type analyses from various bridge dealing programs currently in general use in North America.

Cat 1 is 4432 & 5332 hand-types combined. Expected probability is 37.068%. Cat 2 is least common 34 hand-types combined. Expected probability is 28.8855%. SDem is "Standard normal deviation from the expected mean."

Cat 1: 3,156 @ <0.24> SDem.
Cat 2: 2,454 @ <0.32> SDem.

This is the one for those of you who believe conforming to expected probabilities is the way to go. It is as close as I have seen to date. It would be near perfect in classic math thought if category 1 was a positive 0.24 to 0.32, or so. It is very restricted in variability though.

Cat 1: 3,203 @ 0.82 SDem.
Cat 2: 2,822 @ 1.23 SDem.

Cat 2 is supposed to approximate a negative <1.65>. This is nearly 3 SDem's from that outcome. But it does mean an abundance of rarer hands. I suppose that will make some players happy.

Cat 1: 3,142 @ <0.55> SDem.
Cat 2: 2,577 @ 2.60 SDem.

This is grossly opposite from ordinary bridge card dealing.

Cat 1: 3,082 @ <1.90> SDem.
Cat 2: 2,903 @ 2.18 SDem.

This is even more grossly opposite ordinary bridge card dealing.

Cat 1: 3,187 @ 0.46 SDem.
Cat 2: 2,410 @ <1.37> SDem.

At least the SDem's are oriented correctly, even if it comes up somewhat shy in variability. If I was limited to these five choices, this is the one I would begin with. Of course, I would run at last nine more of this particular experiment as a minimum check on result reliability.

Rather obviously, all five of these bridge dealing programs are using pseudo-random computer programs as their input number sources. They do not even come close to the variability found in properly measured hand-dealt bridge deals.

If you want to dispute these facts and assertions, as two of you already have, I suggest a cogent counter-example, or two, is the classic way to demonstrate my error(s).

Douglas
l***@gmail.com
2017-02-27 05:30:30 UTC
Permalink
OK, let me see if I understand. Taking the Category 2 data for the third set, out of 8544 hands, you would expect 28.89% to fall in this category, but you observed 2577. Divided by 8544 yields 30.16%. Computing z = (.3016-.2889)/sqrt(.2889*.7111/8544) gives 2.60, which you report as sdem.
Douglas
2017-02-27 06:55:40 UTC
Permalink
Post by l***@gmail.com
OK, let me see if I understand. Taking the Category 2 data for the third set, out of 8544 hands, you would expect 28.89% to fall in this category, but you observed 2577. Divided by 8544 yields 30.16%. Computing z = (.3016-.2889)/sqrt(.2889*.7111/8544) gives 2.60, which you report as sdem.
Thank you for the cross check; I used your exact digits and formulation, and returned 2.58997... I reviewed my calculation, and discover I forgot to correct for the glitch that occurs whenever the returned cumulative p is 0.5, or greater.

Here is the formulation I use: =binom.dist(2577-1,8544,0.288855,true) in Excel 2016 which returns 0.995076. I then transform that p value using =norm.s.inv(0.995076) into 2.58109. Without the glitch correction, a rounded 2.60 is returned.

I would say you understand very well.

Douglas
p***@infi.net
2017-02-27 15:58:21 UTC
Permalink
Post by Douglas
Post by l***@gmail.com
OK, let me see if I understand. Taking the Category 2 data for the third set, out of 8544 hands, you would expect 28.89% to fall in this category, but you observed 2577. Divided by 8544 yields 30.16%. Computing z = (.3016-.2889)/sqrt(.2889*.7111/8544) gives 2.60, which you report as sdem.
Thank you for the cross check; I used your exact digits and formulation, and returned 2.58997... I reviewed my calculation, and discover I forgot to correct for the glitch that occurs whenever the returned cumulative p is 0.5, or greater.
Here is the formulation I use: =binom.dist(2577-1,8544,0.288855,true) in Excel 2016 which returns 0.995076. I then transform that p value using =norm.s.inv(0.995076) into 2.58109. Without the glitch correction, a rounded 2.60 is returned.
I would say you understand very well.
Douglas
OK, you are actually using the binomial cdf, which makes sense to me, though I can't find any online explanation of the proper method. Re-reading your answer here, I see you recommend subtracting one from the observed number of successes if the binomial cdf computes to .5 or greater (took me awhile to grasp what you meant by "p" here, that letter gets used so many different ways in statistics.) Can you point to an online reference for this correction? Minor point, as the normal approximation is fairly good for these large samples, but I see no reason to use an approximation when we can compute a probability directly with modern technology. For the 22.5 billion sample from your other post, I think using the normal approximation is reasonable.
Douglas
2017-02-27 18:52:52 UTC
Permalink
Post by p***@infi.net
OK, you are actually using the binomial cdf, which makes sense to me, though I can't find any online explanation of the proper method. Re-reading your answer here, I see you recommend subtracting one from the observed number of successes if the binomial cdf computes to .5 or greater (took me awhile to grasp what you meant by "p" here, that letter gets used so many different ways in statistics.) Can you point to an online reference for this correction? Minor point, as the normal approximation is fairly good for these large samples, but I see no reason to use an approximation when we can compute a probability directly with modern technology. For the 22.5 billion sample from your other post, I think using the normal approximation is reasonable.
I discovered the "glitch" when double checking a New York Times article several years ago. It posited 527 heads in 1,000 coin tosses. I came to find there were at least four generally accepted different answers as to what is the cumulative probability (p) of at least 527 heads occurring every 1,000 coin tosses. In this instant example, the differences are small, but they are different. It came to pass that I entered =1-binom.dist(527,1000,0.5,true) into Excel. It returns 0.04097 (rounded).

Make any other different calculation that seems best to you. See if your answer is different, or the same. If you know differing calculation formulations, try them for comparative results.

Lastly, if we have 527 heads, we must also have exactly 473 tails. So I entered =binom.dist(473,1000,0.5,true) into Excel. It returns 0.04684 (rounded).
Then after some thought, I entered =1-binom.dist(527-1,1000,0.5,true), and it returned an identical 0.04684 (rounded). Since then I have applied that correction when applicable.

To see where the "gltch" begins in the cumulative p distribution curve, first enter =binom.dist(5,10,0.5,true) into Excel. See if this returns 0.5 p as it should. Now enter =binom.dist(4,9,0.5,true). See if this returns some value less than 0.5 p as it should. These are simple examples of even and odd trial numbers respectively.

I think you are right about normal curve approximation for use here. I now realize the internal math weakness in using the cumulative binomial formulation. I will probably go back and reevaluate my results to see if there are substantial changes to be had.

Douglas
p***@infi.net
2017-02-27 20:58:51 UTC
Permalink
Post by Douglas
Post by p***@infi.net
OK, you are actually using the binomial cdf, which makes sense to me, though I can't find any online explanation of the proper method. Re-reading your answer here, I see you recommend subtracting one from the observed number of successes if the binomial cdf computes to .5 or greater (took me awhile to grasp what you meant by "p" here, that letter gets used so many different ways in statistics.) Can you point to an online reference for this correction? Minor point, as the normal approximation is fairly good for these large samples, but I see no reason to use an approximation when we can compute a probability directly with modern technology. For the 22.5 billion sample from your other post, I think using the normal approximation is reasonable.
I discovered the "glitch" when double checking a New York Times article several years ago. It posited 527 heads in 1,000 coin tosses. I came to find there were at least four generally accepted different answers as to what is the cumulative probability (p) of at least 527 heads occurring every 1,000 coin tosses. In this instant example, the differences are small, but they are different. It came to pass that I entered =1-binom.dist(527,1000,0.5,true) into Excel. It returns 0.04097 (rounded).
Make any other different calculation that seems best to you. See if your answer is different, or the same. If you know differing calculation formulations, try them for comparative results.
Lastly, if we have 527 heads, we must also have exactly 473 tails. So I entered =binom.dist(473,1000,0.5,true) into Excel. It returns 0.04684 (rounded).
Then after some thought, I entered =1-binom.dist(527-1,1000,0.5,true), and it returned an identical 0.04684 (rounded). Since then I have applied that correction when applicable.
To see where the "gltch" begins in the cumulative p distribution curve, first enter =binom.dist(5,10,0.5,true) into Excel. See if this returns 0.5 p as it should. Now enter =binom.dist(4,9,0.5,true). See if this returns some value less than 0.5 p as it should. These are simple examples of even and odd trial numbers respectively.
I think you are right about normal curve approximation for use here. I now realize the internal math weakness in using the cumulative binomial formulation. I will probably go back and reevaluate my results to see if there are substantial changes to be had.
Douglas
Side note: thanks to Google's mania for "One Account, All of Google" and trying to use my tablet, I see that I mistakenly posted under a live email account, as Lex Logan. Hope I don't get buried with spam.

Now that I'm clear about your calculations, I have two objections:
(1) It appears from your original comments about "more than 2000 [deals]" and "8554 [hands]" that you are using all four hands from each deal. Obviously, the four hands are not independent, so you do not appear to have 8544 data points for any of the six groups. More on this later.
(2) Your two categories are also interdependent: a sample that has an unusually high number of long suits will tend to have fewer than expected balanced hands.

On the first point, I would guess you have the equivalent of about half the sample size, or 4272. I'll explain my reasoning later, back to work for now.
p***@infi.net
2017-02-27 21:36:48 UTC
Permalink
Post by p***@infi.net
Post by Douglas
Post by p***@infi.net
OK, you are actually using the binomial cdf, which makes sense to me, though I can't find any online explanation of the proper method. Re-reading your answer here, I see you recommend subtracting one from the observed number of successes if the binomial cdf computes to .5 or greater (took me awhile to grasp what you meant by "p" here, that letter gets used so many different ways in statistics.) Can you point to an online reference for this correction? Minor point, as the normal approximation is fairly good for these large samples, but I see no reason to use an approximation when we can compute a probability directly with modern technology. For the 22.5 billion sample from your other post, I think using the normal approximation is reasonable.
I discovered the "glitch" when double checking a New York Times article several years ago. It posited 527 heads in 1,000 coin tosses. I came to find there were at least four generally accepted different answers as to what is the cumulative probability (p) of at least 527 heads occurring every 1,000 coin tosses. In this instant example, the differences are small, but they are different. It came to pass that I entered =1-binom.dist(527,1000,0.5,true) into Excel. It returns 0.04097 (rounded).
Make any other different calculation that seems best to you. See if your answer is different, or the same. If you know differing calculation formulations, try them for comparative results.
Lastly, if we have 527 heads, we must also have exactly 473 tails. So I entered =binom.dist(473,1000,0.5,true) into Excel. It returns 0.04684 (rounded).
Then after some thought, I entered =1-binom.dist(527-1,1000,0.5,true), and it returned an identical 0.04684 (rounded). Since then I have applied that correction when applicable.
To see where the "gltch" begins in the cumulative p distribution curve, first enter =binom.dist(5,10,0.5,true) into Excel. See if this returns 0.5 p as it should. Now enter =binom.dist(4,9,0.5,true). See if this returns some value less than 0.5 p as it should. These are simple examples of even and odd trial numbers respectively.
I think you are right about normal curve approximation for use here. I now realize the internal math weakness in using the cumulative binomial formulation. I will probably go back and reevaluate my results to see if there are substantial changes to be had.
Douglas
Side note: thanks to Google's mania for "One Account, All of Google" and trying to use my tablet, I see that I mistakenly posted under a live email account, as Lex Logan. Hope I don't get buried with spam.
(1) It appears from your original comments about "more than 2000 [deals]" and "8554 [hands]" that you are using all four hands from each deal. Obviously, the four hands are not independent, so you do not appear to have 8544 data points for any of the six groups. More on this later.
(2) Your two categories are also interdependent: a sample that has an unusually high number of long suits will tend to have fewer than expected balanced hands.
On the first point, I would guess you have the equivalent of about half the sample size, or 4272. I'll explain my reasoning later, back to work for now.
Consider a deal with four hands. One hand, say North, can be treated as independent. A second hand from the same deal, say East, will have it's probabilities of distributions limited by the 39 cards remaining after removing North's cards. South's hand, likewise, will be drawn from 26 cards, and West's 13 will be completely determined by the other three hands. So we can view North as 1 full data point, East as, say, 2/3rds of a point, South as 1/3rd, and West's hand as providing no additional data. So I might guess that using all four hands from 2000 deals would be about the same as using a single hand from 4000 deals. But this is just a surmise and I don't know that it is mathematically valid.

If this adjustment were valid, we could recompute all your sdem's by simply dividing by the square root of 2, of multiplying by .707107. That 2.60 would become 1.84, for exaample.

i haven't come up with any comparable adjustment for reporting your two categories from the same data.

Oh, and I remembered the thrid point: you mention values such as 1.64 (positive or negative) and 1.96, typical critical values of the normal distribution corresponding to 5% significance for one or two tailed testss. But in reporting multiple results, I believe it is necessary to adjust the significance level. If you were to run 100 studies using a 5% significance level, you would expect 5 to show "statistical significance" just by chance. This is known as the Bonferroni correction. With five studies, we should use a 5%/5 = 1% significance level, or a critical value of 2.576 . None of dealing program results cross that threshold.

I assume the 8544 sample size was selected simply to match the hand-dealt results. If possible, I suggest get a fresh sample, say Category 2 for the third program (the 2.60 sdem case.) It is not necessary to match the 8544 sample size, just report whatever is practical -- do you have to hand-count these things?
Douglas
2017-02-28 02:12:39 UTC
Permalink
Post by p***@infi.net
Post by p***@infi.net
Side note: thanks to Google's mania for "One Account, All of Google" and trying to use my tablet, I see that I mistakenly posted under a live email account, as Lex Logan. Hope I don't get buried with spam.
(1) It appears from your original comments about "more than 2000 [deals]" and "8554 [hands]" that you are using all four hands from each deal. Obviously, the four hands are not independent, so you do not appear to have 8544 data points for any of the six groups. More on this later.
(2) Your two categories are also interdependent: a sample that has an unusually high number of long suits will tend to have fewer than expected balanced hands.
On the first point, I would guess you have the equivalent of about half the sample size, or 4272. I'll explain my reasoning later, back to work for now.
Consider a deal with four hands. One hand, say North, can be treated as independent. A second hand from the same deal, say East, will have it's probabilities of distributions limited by the 39 cards remaining after removing North's cards. South's hand, likewise, will be drawn from 26 cards, and West's 13 will be completely determined by the other three hands. So we can view North as 1 full data point, East as, say, 2/3rds of a point, South as 1/3rd, and West's hand as providing no additional data. So I might guess that using all four hands from 2000 deals would be about the same as using a single hand from 4000 deals. But this is just a surmise and I don't know that it is mathematically valid.
If this adjustment were valid, we could recompute all your sdem's by simply dividing by the square root of 2, of multiplying by .707107. That 2.60 would become 1.84, for exaample.
i haven't come up with any comparable adjustment for reporting your two categories from the same data.
Oh, and I remembered the thrid point: you mention values such as 1.64 (positive or negative) and 1.96, typical critical values of the normal distribution corresponding to 5% significance for one or two tailed testss. But in reporting multiple results, I believe it is necessary to adjust the significance level. If you were to run 100 studies using a 5% significance level, you would expect 5 to show "statistical significance" just by chance. This is known as the Bonferroni correction. With five studies, we should use a 5%/5 = 1% significance level, or a critical value of 2.576 . None of dealing program results cross that threshold.
I assume the 8544 sample size was selected simply to match the hand-dealt results. If possible, I suggest get a fresh sample, say Category 2 for the third program (the 2.60 sdem case.) It is not necessary to match the 8544 sample size, just report whatever is practical -- do you have to hand-count these things?
(1) The independence measured here is whether each deal has zero, one, two, three, or four countable hand-types. That is true for the first deal, and for each and every subsequent deal, including the last. That is complete variation independence. It is all that is required here.

(2) You ignore the missing middle 34%. Also that this is an extremely skewed one-sided expected probability distribution. So an abundance in Cat 2 can have little, or no, effect on Cat 1.

What you are supposed to note is NONE of the five computer dealing programs come even close to 1.65+ Z value for Cat 1, or <1.65- > Z value for Cat 2. Whereas my illustrative hand-dealt deals easily surpass those critical values. The same is true for the Abingdon deals and my 494 thoroughly mixed experimental deals. That is the basic discrimination. Or you could think of it as a line of demarcation.

I do have one large set of hand-dealt deals which misses one of the critical values by only a little. I expect it to be updated in the near future with more deals. I intend to discuss it separately for several reasons.

I think your "Bonferroni correction" idea is misapplied in some manner, or is simply nutty.

Fortunately, I no longer have to hand count most of this kind of data.

Douglas
p***@infi.net
2017-02-28 03:04:12 UTC
Permalink
Post by Douglas
Post by p***@infi.net
Post by p***@infi.net
Side note: thanks to Google's mania for "One Account, All of Google" and trying to use my tablet, I see that I mistakenly posted under a live email account, as Lex Logan. Hope I don't get buried with spam.
(1) It appears from your original comments about "more than 2000 [deals]" and "8554 [hands]" that you are using all four hands from each deal. Obviously, the four hands are not independent, so you do not appear to have 8544 data points for any of the six groups. More on this later.
(2) Your two categories are also interdependent: a sample that has an unusually high number of long suits will tend to have fewer than expected balanced hands.
On the first point, I would guess you have the equivalent of about half the sample size, or 4272. I'll explain my reasoning later, back to work for now.
Consider a deal with four hands. One hand, say North, can be treated as independent. A second hand from the same deal, say East, will have it's probabilities of distributions limited by the 39 cards remaining after removing North's cards. South's hand, likewise, will be drawn from 26 cards, and West's 13 will be completely determined by the other three hands. So we can view North as 1 full data point, East as, say, 2/3rds of a point, South as 1/3rd, and West's hand as providing no additional data. So I might guess that using all four hands from 2000 deals would be about the same as using a single hand from 4000 deals. But this is just a surmise and I don't know that it is mathematically valid.
If this adjustment were valid, we could recompute all your sdem's by simply dividing by the square root of 2, of multiplying by .707107. That 2.60 would become 1.84, for exaample.
i haven't come up with any comparable adjustment for reporting your two categories from the same data.
Oh, and I remembered the thrid point: you mention values such as 1.64 (positive or negative) and 1.96, typical critical values of the normal distribution corresponding to 5% significance for one or two tailed testss. But in reporting multiple results, I believe it is necessary to adjust the significance level. If you were to run 100 studies using a 5% significance level, you would expect 5 to show "statistical significance" just by chance. This is known as the Bonferroni correction. With five studies, we should use a 5%/5 = 1% significance level, or a critical value of 2.576 . None of dealing program results cross that threshold.
I assume the 8544 sample size was selected simply to match the hand-dealt results. If possible, I suggest get a fresh sample, say Category 2 for the third program (the 2.60 sdem case.) It is not necessary to match the 8544 sample size, just report whatever is practical -- do you have to hand-count these things?
(1) The independence measured here is whether each deal has zero, one, two, three, or four countable hand-types. That is true for the first deal, and for each and every subsequent deal, including the last. That is complete variation independence. It is all that is required here.
(2) You ignore the missing middle 34%. Also that this is an extremely skewed one-sided expected probability distribution. So an abundance in Cat 2 can have little, or no, effect on Cat 1.
What you are supposed to note is NONE of the five computer dealing programs come even close to 1.65+ Z value for Cat 1, or <1.65- > Z value for Cat 2. Whereas my illustrative hand-dealt deals easily surpass those critical values. The same is true for the Abingdon deals and my 494 thoroughly mixed experimental deals. That is the basic discrimination. Or you could think of it as a line of demarcation.
I do have one large set of hand-dealt deals which misses one of the critical values by only a little. I expect it to be updated in the near future with more deals. I intend to discuss it separately for several reasons.
I think your "Bonferroni correction" idea is misapplied in some manner, or is simply nutty.
Fortunately, I no longer have to hand count most of this kind of data.
Douglas
Sorry, Douglas. Do you dispute that the claim that counting a single hand (North, say, or dealer) from a set of 8544 deals would give a proper sample of size 8544? If so, your claim that you can get just as many data points from 2136 deals is simply untenable. I challenge you to implement my proposed test -- sorry, I don't have the inclination to pour through hundreds of printouts to generate data which you would refuse to accept anyway. Your methodology is simply flawed, and I have explained how you can demonstrate that to yourself. If I'm wrong and you are right, you could expect to see results similar to what you reported by using a single hand from each deal rather than four. (Note: you must use different hands than those already included in your reported data. Can't use the same data more than once.) As for your claim that the two categories have no effect on each other, I suggest you examine a series of, say, 100 hands and select the first one you encounter with 5% extra Category 2, then test how many Category 1 are in that set.
p***@infi.net
2017-02-28 03:46:47 UTC
Permalink
Post by Douglas
Post by p***@infi.net
Post by p***@infi.net
Side note: thanks to Google's mania for "One Account, All of Google" and trying to use my tablet, I see that I mistakenly posted under a live email account, as Lex Logan. Hope I don't get buried with spam.
(1) It appears from your original comments about "more than 2000 [deals]" and "8554 [hands]" that you are using all four hands from each deal. Obviously, the four hands are not independent, so you do not appear to have 8544 data points for any of the six groups. More on this later.
(2) Your two categories are also interdependent: a sample that has an unusually high number of long suits will tend to have fewer than expected balanced hands.
On the first point, I would guess you have the equivalent of about half the sample size, or 4272. I'll explain my reasoning later, back to work for now.
Consider a deal with four hands. One hand, say North, can be treated as independent. A second hand from the same deal, say East, will have it's probabilities of distributions limited by the 39 cards remaining after removing North's cards. South's hand, likewise, will be drawn from 26 cards, and West's 13 will be completely determined by the other three hands. So we can view North as 1 full data point, East as, say, 2/3rds of a point, South as 1/3rd, and West's hand as providing no additional data. So I might guess that using all four hands from 2000 deals would be about the same as using a single hand from 4000 deals. But this is just a surmise and I don't know that it is mathematically valid.
If this adjustment were valid, we could recompute all your sdem's by simply dividing by the square root of 2, of multiplying by .707107. That 2.60 would become 1.84, for exaample.
i haven't come up with any comparable adjustment for reporting your two categories from the same data.
Oh, and I remembered the thrid point: you mention values such as 1.64 (positive or negative) and 1.96, typical critical values of the normal distribution corresponding to 5% significance for one or two tailed testss. But in reporting multiple results, I believe it is necessary to adjust the significance level. If you were to run 100 studies using a 5% significance level, you would expect 5 to show "statistical significance" just by chance. This is known as the Bonferroni correction. With five studies, we should use a 5%/5 = 1% significance level, or a critical value of 2.576 . None of dealing program results cross that threshold.
I assume the 8544 sample size was selected simply to match the hand-dealt results. If possible, I suggest get a fresh sample, say Category 2 for the third program (the 2.60 sdem case.) It is not necessary to match the 8544 sample size, just report whatever is practical -- do you have to hand-count these things?
(1) The independence measured here is whether each deal has zero, one, two, three, or four countable hand-types. That is true for the first deal, and for each and every subsequent deal, including the last. That is complete variation independence. It is all that is required here.
...
You give the expected frequency of Category 1 hands (4432 and 5332) as 37.068%, which matches the frequency of those hand patterns as given in the Bridge Encyclopedia. You then count the number of such hands in 2136 deals from each of several dealing programs and test whether the deviation from expected frequency is sufficient to reject the null hypothesis that these programs generate hand patterns with the expected frequency. It is certainly true that the expected proportion of such hands in a sample of any size would be 37.068%. Where we vehemently disagree is regarding the standard deviation of the sample proportion using using your method of counting all four hands from each deal. You think it is the same as if you counted one hand from 8544 deals, It isn't.
Douglas
2017-02-28 05:14:04 UTC
Permalink
Post by p***@infi.net
You give the expected frequency of Category 1 hands (4432 and 5332) as 37.068%, which matches the frequency of those hand patterns as given in the Bridge Encyclopedia. You then count the number of such hands in 2136 deals from each of several dealing programs and test whether the deviation from expected frequency is sufficient to reject the null hypothesis that these programs generate hand patterns with the expected frequency. It is certainly true that the expected proportion of such hands in a sample of any size would be 37.068%. Where we vehemently disagree is regarding the standard deviation of the sample proportion using using your method of counting all four hands from each deal. You think it is the same as if you counted one hand from 8544 deals, It isn't.
I did what you propose extensively several years ago. I even went so far as doing it for all four hands separately so I ended with five comparable results. All it does is "thin" the sampling, and make it seem less reliable. I never once had a noticeable stat difference between single hand analyses and total deal analyses. There were variations in all five, but that is what a standardized Z value does for us. It makes all five results comparable, not just the four separate hand analyses.

One of the few times I got agreement with the online dominator at sci.stat.math is when I presented him with the same sort of explanation about sampling independence I gave you. One can lead another to water, but one can be hard pressed to get that other to drink.

But my biggest objection has to do with integrity, particularly related to my experiences in this group. Say I select south hands. I do my analysis, and I get a result someone, or many ones, object to. Like right now. I have no defense against the charge that I picked the one of the four hands which suited my purpose(s). No thank you.

Douglas
p***@infi.net
2017-02-28 16:49:47 UTC
Permalink
Post by Douglas
Post by p***@infi.net
You give the expected frequency of Category 1 hands (4432 and 5332) as 37.068%, which matches the frequency of those hand patterns as given in the Bridge Encyclopedia. You then count the number of such hands in 2136 deals from each of several dealing programs and test whether the deviation from expected frequency is sufficient to reject the null hypothesis that these programs generate hand patterns with the expected frequency. It is certainly true that the expected proportion of such hands in a sample of any size would be 37.068%. Where we vehemently disagree is regarding the standard deviation of the sample proportion using using your method of counting all four hands from each deal. You think it is the same as if you counted one hand from 8544 deals, It isn't.
I did what you propose extensively several years ago. I even went so far as doing it for all four hands separately so I ended with five comparable results. All it does is "thin" the sampling, and make it seem less reliable. I never once had a noticeable stat difference between single hand analyses and total deal analyses. There were variations in all five, but that is what a standardized Z value does for us. It makes all five results comparable, not just the four separate hand analyses.
One of the few times I got agreement with the online dominator at sci.stat.math is when I presented him with the same sort of explanation about sampling independence I gave you. One can lead another to water, but one can be hard pressed to get that other to drink.
But my biggest objection has to do with integrity, particularly related to my experiences in this group. Say I select south hands. I do my analysis, and I get a result someone, or many ones, object to. Like right now. I have no defense against the charge that I picked the one of the four hands which suited my purpose(s). No thank you.
Douglas
I'll continue trying to lead you to water. You claim independence on the 2136 deals, which have 0 to 4 hands falling in the selected category. This is a multinomial distribution with 2136 data points; the population mean will correspond to the expected frequency of such hands among all bridge hands, so (roughly) 28% of 8544 hands, but you then proceed to assume that the variation in this multinomial can be described by the variance in the binomial applied to 8544 hands. Mathematically, this is clearly false. For example, the frequency of 4432 hands a priori is 21.55%; if I fix North to be 7222, the proportion of 4432 hands for East falls to 17.7% . So if per ordinary sampling variablility the 2136 deals contain more than the usual number of 7222 hands, they can be expected to contain less than the usual number of 4432's.

For this sort of analysis. we'd like to use all 8544 hands -- we don't care about just North or just the dealer or just one hand selected at random (which should avoid any potential bias.) But we do not know the proper formula for the variance when using 8544 partially dependent data points. And we don't know anything about the variance of the 2136 deals. The only obvious way to analyze the multinomial would be to compute the mean the and standard deviation of the average number of Category X hands and apply the usual t-statistic test. Can you do that? By the way, am I clear what your hypotheses are? I've been asssuming:
H0: the frequency of Cagtegory X hands equals the theoretical proportion
Ha: the frequency of Category X hands differs from the theoretical proportion

As for integrity, I personally have no reason to assume you are posting cooked data -- why would you? I think you sincerely believe that dealing programs are flawed, and that your data supports that. I have a strong prior that other people who have investigated this know more than you do and that your methodology is flawed, but I'm willing to be convinced otherwise.
Douglas
2017-02-28 18:55:58 UTC
Permalink
Post by p***@infi.net
I'll continue trying to lead you to water. You claim independence on the 2136 deals, which have 0 to 4 hands falling in the selected category. This is a multinomial distribution with 2136 data points; the population mean will correspond to the expected frequency of such hands among all bridge hands, so (roughly) 28% of 8544 hands, but you then proceed to assume that the variation in this multinomial can be described by the variance in the binomial applied to 8544 hands. Mathematically, this is clearly false. For example, the frequency of 4432 hands a priori is 21.55%; if I fix North to be 7222, the proportion of 4432 hands for East falls to 17.7% . So if per ordinary sampling variablility the 2136 deals contain more than the usual number of 7222 hands, they can be expected to contain less than the usual number of 4432's.
H0: the frequency of Cagtegory X hands equals the theoretical proportion
Ha: the frequency of Category X hands differs from the theoretical proportion
As for integrity, I personally have no reason to assume you are posting cooked data -- why would you? I think you sincerely believe that dealing programs are flawed, and that your data supports that. I have a strong prior that other people who have investigated this know more than you do and that your methodology is flawed, but I'm willing to be convinced otherwise.
Paul:

I do not claim the independence of the 2136 deals. I full well know there are hypergeometric probabilities 1/52, 1/51, 1/50, ... with cards being dealt for each deal. The deals themselves are wholly dependent.

What is being measured in a uniform and consistent way is usually called a characteristic in the stat world. And this particular characteristic exhibits all the usual hallmarks of independence.

Consider: I look at the west hand (which is where I start when manually counting), and it is 4432. Now I look at the north hand. What is the continuing expected probability that it will also be 4432? You are arguing it has changed, however slight, because what was in west's hand. I do not think so. Because if that were the case, the south hand in the final deal would be fully determined. I hope you can satisfy yourself that is not the case.

Yes, I know what H0 (Null hypothesis) and HA (Alternative hypothesis) are. I also know they are frequently the refuge of intellectual stat ogres.

Name at least one of these other person's who have investigated "this."

Douglas

jogs
2017-02-28 00:01:05 UTC
Permalink
Post by Douglas
Lastly, if we have 527 heads, we must also have exactly 473 tails. So I entered =binom.dist(473,1000,0.5,true) into Excel. It returns 0.04684 (rounded).
After taking STAT 100 doesn't make you an expert on statistics.
Assuming a fair unbiased coin after 1,000 trials, there should be 500 +/-32 heads 68% of the time.
One should expect 532+ heads 16% of the time. That's from the normal approximation of the binomial.
p***@infi.net
2017-02-28 03:06:43 UTC
Permalink
Post by jogs
Post by Douglas
Lastly, if we have 527 heads, we must also have exactly 473 tails. So I entered =binom.dist(473,1000,0.5,true) into Excel. It returns 0.04684 (rounded).
After taking STAT 100 doesn't make you an expert on statistics.
Assuming a fair unbiased coin after 1,000 trials, there should be 500 +/-32 heads 68% of the time.
One should expect 532+ heads 16% of the time. That's from the normal approximation of the binomial.
I think you meant 95%, not 68%, but why use the normal approximation when modern software such as Excel can compute the binomial cdf directly?
Loading...