source: liacs/dbdm/dbdm_4/unfair_casino_example.py@ 235

Last change on this file since 235 was 60, checked in by Rick van der Zwet, 15 years ago

Playing around with GHMM and seems to work pretty well

  • Property svn:executable set to *
File size: 4.2 KB
RevLine 
[60]1#!/usr/bin/env python
2#
3# Tutorial on using GHMM with Python
4# http://ghmm.sourceforge.net/ghmm-python-tutorial.html
5# Ported to python code for lazy programmers ;-)
6#
7# In the following, we assume that you have installed GHMM including the Python
8# bindings. Download the UnfairCasino.py-file.
9#
10# You might have seen the unfair casino example (Chair Biological Sequence
11# Analysis, Durbin et. al, 1998), where a dealer in a casino occasionally
12# exchanges a fair dice with a loaded one. We do not know if and when this
13# exchange happens, we only can observe the throws. This stochastic process we
14# will model with a HMM.
15
16import ghmm
17print "For aditional help use: help('ghmm')"
18
19#Creating the first model:
20
21# There are two states in our example. Either the dice is fair (state 0; Python
22# indexes arrays like C and C++ from 0) or it is loaded (state 1). We will
23# define the transition and emission matrices explicitly.
24
25# To save us some typing (namely ghmm. before any class or function from GHMM),
26# we will import all symbols from ghmm directly.
27
28from ghmm import *
29
30# Our alphabet are the numbers on the dice {1,2,3,4,5,6}. To be consistent with
31# Python's range function the upper limit is not part of a range.
32
33sigma = IntegerRange(1,7)
34
35# The transition matrix A is chosen such that we will stay longer in the fair
36# than the loaded state. The emission probabilities are set accordingly in B ---
37# guessing equal probabilities for the fair dice and higher probabilities for
38# ones and twos etc. for the loaded dice --- and the initial state distribution
39# pi has no preference for either state.
40
41A = [[0.9, 0.1], [0.3, 0.7]]
42efair = [1.0 / 6] * 6
43print "Fair probabilities:", efair
44
45eloaded = [3.0 / 13, 3.0 / 13, 2.0 / 13, 2.0 / 13, 2.0 / 13, 1.0 / 13]
46B = [efair, eloaded]
47pi = [0.5] * 2
48m = HMMFromMatrices(sigma, DiscreteDistribution(sigma), A, B, pi)
49print "Inital HHM", m
50
51# Generating sequences
52
53# At this time GHMM still exposes some internals. For example, symbols from
54# discrete alphabets are internally represented as 0,...,|alphabet| -1 and the
55# alphabet is responsible for translating between internal and external
56# representations.
57
58# First we sample from the model, specifying the length of the sequence, and
59# then transfer the observations to the external representation.
60
61obs_seq = m.sampleSingle(20)
62print "Observation sequence : ", obs_seq
63obs = map(sigma.external, obs_seq)
64print "Observations : ", obs
65
66# Learning from data
67
68# In a real application we would want to train the model. That is, we would
69# like to estimate the parameters of the model from data with the goal to
70# maximize the likelihood of that data given the model. This is done for HMMs
71# with the Baum-Welch algorithm which is actually an instance of the well-known
72# Expectation-Maximization procedure for missing data problems in statistics.
73# The process is iterative and hence sometime called re-estimation.
74
75# We can use m as a starting point and use some 'real' data to train it. The
76# variable train_seq is an EmissionSequence or SequenceSet object.
77
78
79from UnfairCasino import train_seq
80m.baumWelch(train_seq)
81print "Trained using baumWelch and 'train_seq'"
82print m
83
84# If you want more fine-grained control over the learning procedure, you can do
85# single steps and monitor the relevant diagnostics yourself, or employ
86# meta-heuristics such as noise-injection to avoid getting stuck in local
87# maxima. [Currently for continous emissions only] Computing a Viterbi-path
88
89# A Viterbi-path is a sequence of states Q maximizing the probability
90# P[Q|observations, model]. If we compute a Viterbi-path for a unfair casino
91# sequence of throws, the state sequence tells us when the casino was fair
92# (state 0) and when it was unfair (state 1).
93
94
95from UnfairCasino import test_seq
96v = m.viterbi(test_seq)
97print "fairness of test_seq: ", v
98
99# You can investigate how successful your model is at detecting cheating. Just
100# create artificial observations and see if you can detect them.
101
102my_seq = EmissionSequence(sigma, [1] * 20 + [6] * 10 + [1] * 40)
103print "artificial sequence : ", my_seq
104print "artificial observation : ", m.viterbi(my_seq)
105
106print '''
107#
108# XXX: Some error on closing data, save to ignore as we are finished anyways
109#
110'''
111
112# EOF
Note: See TracBrowser for help on using the repository browser.