1 | 1. Title: Poker Hand Dataset
|
---|
2 |
|
---|
3 | 2. Source Information
|
---|
4 |
|
---|
5 | a) Creators:
|
---|
6 |
|
---|
7 | Robert Cattral (cattral@gmail.com)
|
---|
8 |
|
---|
9 | Franz Oppacher (oppacher@scs.carleton.ca)
|
---|
10 | Carleton University, Department of Computer Science
|
---|
11 | Intelligent Systems Research Unit
|
---|
12 | 1125 Colonel By Drive, Ottawa, Ontario, Canada, K1S5B6
|
---|
13 |
|
---|
14 | c) Date of release: Jan 2007
|
---|
15 |
|
---|
16 | 3. Past Usage:
|
---|
17 | 1. R. Cattral, F. Oppacher, D. Deugo. Evolutionary Data Mining
|
---|
18 | with Automatic Rule Generalization. Recent Advances in Computers,
|
---|
19 | Computing and Communications, pp.296-300, WSEAS Press, 2002.
|
---|
20 | - Note: This was a slightly different dataset that had more
|
---|
21 | classes, and was considerably more difficult.
|
---|
22 |
|
---|
23 | - Predictive attribute: Poker Hand (labeled class)
|
---|
24 | - Found to be a challenging dataset for classification algorithms
|
---|
25 | - Relational learners have an advantage for some classes
|
---|
26 | - The ability to learn high level constructs has an advantage
|
---|
27 |
|
---|
28 | 4. Relevant Information:
|
---|
29 | Each record is an example of a hand consisting of five playing
|
---|
30 | cards drawn from a standard deck of 52. Each card is described
|
---|
31 | using two attributes (suit and rank), for a total of 10 predictive
|
---|
32 | attributes. There is one Class attribute that describes the
|
---|
33 | Poker Hand. The order of cards is important, which is why there
|
---|
34 | are 480 possible Royal Flush hands as compared to 4 (one for each
|
---|
35 | suit explained in more detail below).
|
---|
36 |
|
---|
37 | 5. Number of Instances: 25010 training, 1,000,000 testing
|
---|
38 |
|
---|
39 | 6. Number of Attributes: 10 predictive attributes, 1 goal attribute
|
---|
40 |
|
---|
41 | 7. Attribute Information:
|
---|
42 | 1) S1 Suit of card #1
|
---|
43 | Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
---|
44 |
|
---|
45 | 2) C1 Rank of card #1
|
---|
46 | Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
---|
47 |
|
---|
48 | 3) S2 Suit of card #2
|
---|
49 | Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
---|
50 |
|
---|
51 | 4) C2 Rank of card #2
|
---|
52 | Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
---|
53 |
|
---|
54 | 5) S3 Suit of card #3
|
---|
55 | Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
---|
56 |
|
---|
57 | 6) C3 Rank of card #3
|
---|
58 | Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
---|
59 |
|
---|
60 | 7) S4 Suit of card #4
|
---|
61 | Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
---|
62 |
|
---|
63 | 8) C4 Rank of card #4
|
---|
64 | Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
---|
65 |
|
---|
66 | 9) S5 Suit of card #5
|
---|
67 | Ordinal (1-4) representing {Hearts, Spades, Diamonds, Clubs}
|
---|
68 |
|
---|
69 | 10) C5 Rank of card 5
|
---|
70 | Numerical (1-13) representing (Ace, 2, 3, ... , Queen, King)
|
---|
71 |
|
---|
72 | 11) CLASS Poker Hand
|
---|
73 | Ordinal (0-9)
|
---|
74 |
|
---|
75 | 0: Nothing in hand; not a recognized poker hand
|
---|
76 | 1: One pair; one pair of equal ranks within five cards
|
---|
77 | 2: Two pairs; two pairs of equal ranks within five cards
|
---|
78 | 3: Three of a kind; three equal ranks within five cards
|
---|
79 | 4: Straight; five cards, sequentially ranked with no gaps
|
---|
80 | 5: Flush; five cards with the same suit
|
---|
81 | 6: Full house; pair + different rank three of a kind
|
---|
82 | 7: Four of a kind; four equal ranks within five cards
|
---|
83 | 8: Straight flush; straight + flush
|
---|
84 | 9: Royal flush; {Ace, King, Queen, Jack, Ten} + flush
|
---|
85 |
|
---|
86 |
|
---|
87 | 8. Missing Attribute Values: None
|
---|
88 |
|
---|
89 | 9. Class Distribution:
|
---|
90 |
|
---|
91 | The first percentage in parenthesis is the representation
|
---|
92 | within the training set. The second is the probability in the full domain.
|
---|
93 |
|
---|
94 | Training set:
|
---|
95 |
|
---|
96 | 0: Nothing in hand, 12493 instances (49.95202% / 50.117739%)
|
---|
97 | 1: One pair, 10599 instances, (42.37905% / 42.256903%)
|
---|
98 | 2: Two pairs, 1206 instances, (4.82207% / 4.753902%)
|
---|
99 | 3: Three of a kind, 513 instances, (2.05118% / 2.112845%)
|
---|
100 | 4: Straight, 93 instances, (0.37185% / 0.392465%)
|
---|
101 | 5: Flush, 54 instances, (0.21591% / 0.19654%)
|
---|
102 | 6: Full house, 36 instances, (0.14394% / 0.144058%)
|
---|
103 | 7: Four of a kind, 6 instances, (0.02399% / 0.02401%)
|
---|
104 | 8: Straight flush, 5 instances, (0.01999% / 0.001385%)
|
---|
105 | 9: Royal flush, 5 instances, (0.01999% / 0.000154%)
|
---|
106 |
|
---|
107 | The Straight flush and Royal flush hands are not as representative of
|
---|
108 | the true domain because they have been over-sampled. The Straight flush
|
---|
109 | is 14.43 times more likely to occur in the training set, while the
|
---|
110 | Royal flush is 129.82 times more likely.
|
---|
111 |
|
---|
112 | Total of 25010 instances in a domain of 311,875,200.
|
---|
113 |
|
---|
114 | Testing set:
|
---|
115 |
|
---|
116 | The value inside parenthesis indicates the representation within the test
|
---|
117 | set as compared to the entire domain. 1.0 would be perfect representation,
|
---|
118 | while <1.0 are under-represented and >1.0 are over-represented.
|
---|
119 |
|
---|
120 | 0: Nothing in hand, 501209 instances,(1.000063)
|
---|
121 | 1: One pair, 422498 instances,(0.999832)
|
---|
122 | 2: Two pairs, 47622 instances, (1.001746)
|
---|
123 | 3: Three of a kind, 21121 instances, (0.999647)
|
---|
124 | 4: Straight, 3885 instances, (0.989897)
|
---|
125 | 5: Flush, 1996 instances, (1.015569)
|
---|
126 | 6: Full house, 1424 instances, (0.988491)
|
---|
127 | 7: Four of a kind, 230 instances, (0.957934)
|
---|
128 | 8: Straight flush, 12 instances, (0.866426)
|
---|
129 | 9: Royal flush, 3 instances, (1.948052)
|
---|
130 |
|
---|
131 | Total of one million instances in a domain of 311,875,200.
|
---|
132 |
|
---|
133 |
|
---|
134 | 10. Statistics
|
---|
135 |
|
---|
136 | Poker Hand # of hands Probability # of combinations
|
---|
137 | Royal Flush 4 0.00000154 480
|
---|
138 | Straight Flush 36 0.00001385 4320
|
---|
139 | Four of a kind 624 0.0002401 74880
|
---|
140 | Full house 3744 0.00144058 449280
|
---|
141 | Flush 5108 0.0019654 612960
|
---|
142 | Straight 10200 0.00392464 1224000
|
---|
143 | Three of a kind 54912 0.02112845 6589440
|
---|
144 | Two pairs 123552 0.04753902 14826240
|
---|
145 | One pair 1098240 0.42256903 131788800
|
---|
146 | Nothing 1302540 0.50117739 156304800
|
---|
147 |
|
---|
148 | Total 2598960 1.0 311875200
|
---|
149 |
|
---|
150 | The number of combinations represents the number of instances in the entire domain.
|
---|
151 |
|
---|