Varying POS sequence length

Table 3: Results varying the number of words in the POS sequence window and how many are before and after the juncture

Experiment	Phrase break model	Breaks-correct	Junctures-correct	Juncture-insertions
L = 2, M = 1	1-gram	61.040	91.424	1.589
L = 3, M = 2	1-gram	68.376	91.464	3.227
L = 4, M = 3	1-gram	61.895	90.145	3.358
L = 4, M = 2	1-gram	62.037	90.478	2.981
L = 2, M = 1	6-gram	78.134	91.104	5.913
L = 3, M = 2	6-gram	79.274	91.597	5.569
L = 4, M = 3	6-gram	73.148	90.025	6.093
L = 4, M = 2	6-gram	71.937	89.786	6.110

Equation 2 shows the general POS sequence formula which is expressed in terms of a window of L tags with M of these tags before the juncture and L-M tags after. We can expect longer sequences to be potentially more discriminative, but more prone to sparse data problems. Table 3 shows results from experiments which varied L and M. These were performed on the 23 POS tagset, using smoothing and a 1-gram and 6-gram phrase break model. For both phrase model conditions the L = 3, M=2 condition outperforms the others.