Finite state transducers

An FSA of the kind we have discussed, used to analyze some existing input, is a recognizer, not a parser or a transducer, so all it can do is decide on the well-formedness of a string. If it can reach the end of the string in a final state, then the string is well formed; if it cannot reach the end of the string, or if cannot reach the end of the string and simultaneously be in a final state, then the string is not well formed. That is all the information it provides. To get more information, we need a parser or a transducer, not a recognizer.

Although we can interpret an FSTN as a machine that produces a simple yes or no output for any input, with some small extensions to the notation, we can interpret one as a finite-state transducer (FST). An FST is a more interesting kind of FSA that allows a string of output symbols to be produced as an input string is recognized.

One way to think of an FST is as a special kind of FSA that inspects two (or more) symbols at a time and proceeds accordingly. To put some intuitive flesh on this rather barren statement, we describe a family of hypothetical children's games. Imagine a long, straight pavement made up of coloured, square paving stones set two abreast. At any given time, apart from when a player actually makes a move forward, the player must have their left foot on a left-side paving stone and their right foot on the right-side paving stone that is immediately adjacent to it. And, at any given time, the player must be in one of a (finite) number of states. For simplicity, we will assume just two states: that of having your hand in your pocket, and that of not having it there. Each variant of the game consists of a collection of rules, of which the following are examples:

If your left foot is on a red paving stone and your right foot is on a green paving stone and your hand is in your pocket, then you can advance to the next pair of stones and keep your hand in your pocket.

If your left foot is on a blue paving stone and your right foot is on a blue paving stone and your hand is not in your pocket, then you can advance to the next pair of stones and put your hand in your pocket.

Given such a collection of rules, the player's goal is to move from the beginning of the pavement to the end, without making any move that is not expressly permitted by the rules. A child playing this game will be implementing an FST.

As we have just presented it, an FST is a kind of inspector that runs along checking if two strings of symbols stand in an appropriate correspondence. As such, it seems very little different, and hardly more useful, than an FSA running as a recognizer. But just as an FSTN can be interpreted both as a string recognizer and a string generator, so a network representing an FST can be interpreted, for instance, as both a string correspondence checker and as a machine that reads one string while writing another. We will use the term FST to refer to any such finite-state machine that makes use of multiple tapes, even though it is the latter interpretation that is of most interest and utility in the natural language domain. How, then, are we to conceive of an FST applying to a linguistic domain? Here is an example of the kind of rule that we might use in a linguistic application:

If your current English word is 'fish' and you are in state 13, then you may print 'poisson' and advance to the next English word and shift into state 7.

But before we get overambitious and attempt English-French machine translation with an FST, let us step back and equip ourselves with a notation for talking about FSTs.

An FST can be specified by an FSTN whose arcs are labelled with pairs of symbols, as opposed to single symbols, and as before we will blur the distinction between the abstract notation, which describes sets of pairs of strings, and the possible machines that it might specify. To specify an FST, then, the simplest thing to do is just to augment the NATR language we already have for defining FSTNs. Let us imagine that the symbols we are dealing with are being read from, or printed on to, tapes. There is no need to make any changes in our formalism for expressing initial and final states. But we do need to allow our arc description statements to specify pairs of symbols as labels. We will use expressions like A_a, where 'A' is a tape 1 symbol and 'a' is a tape 2 symbol, for such symbol pairs.

In fact, there is no need to restrict ourselves to two tapes. Both our syntax for transducer symbol declarations and our notation for symbol pairs are designed to generalize to the n-tape situation, for n greater than 2. Hence, A_a_alpha could be used for a symbol triple in a three-tape situation, for example. However, in what follows, we will restrict ourselves to two-tape machines.

An arc description statement for an FST will appear thus:




From 1 to 2 by A_a

What about cases where we wish to read, or write, a symbol on one tape, but do nothing on the other? Here, we can use our jumping symbol; so, #_a and A_# will be well-formed pairs. (Thus, the hash remains a special symbol, part of our language for talking about FSTs. If you want to have the FST manipulate the real hash symbol, then it will need to be set off in quotes.)

This minor generalization from single symbols to pairs - or, more generally, n-tuples - of symbols is all we really need to define FSTs.

Here is an example of an abbreviation for an FST:




DIGIT abbreviates: one_un, two_deux, three_trois, four_quatre, five_cinq, six_six, seven_sept, eight_huit, nine_neuf, ten_dix.

Having introduced these additions to our existing FSTN language, we can consider ENG_FRE-1, which is an example of a complete FST specified in this augmented NATR notation.


BOX:




Name ENG_FRE-1: Initial 1 Final 5 From 1 to 2 by WHERE From 2 to 3 by BV From 3 to 4 by DET From 4 to 5 by NOUN. WHERE abbreviates: where_ou. BV abbreviates: is_est. DET abbreviates: the_#. NOUN abbreviates: exit_la sortie, policeman_le gendarme, shop_la boutique, toilet_la toilette.



The little FST described in ENG_FRE-1 would be completely trivial were it not for one wrinkle: in French the form of the determiner or definite article varies with the gender of the noun it accompanies, whereas English shows no such variation. This FST gets round the problem by letting the English 'the' map into nothing, and then requiring English nouns to map into the relevant French noun preceded by a determiner marked for the appropriate gender. Thus la sortie and le gendarme, for example, are single symbols as far as this FST is concerned. This is not a very elegant solution to the problem since it obfuscates the correspondence that holds between the English 'the' and French 'la/le'.

An alternative FST for this English-French translation task, one that is arguably more perspicuous, despite the introduction of an additional state and two additional arcs, is given in ENG_FRE-2.


BOX:




Name ENG_FRE-2: Initial 1 Final 5 From 1 to 2 by WHERE From 2 to 3 by BV From 3 to 4 by DET-FEMN From 4 to 5 by N-FEMN From 3 to 6 by DET-MASC From 6 to 5 by N-MASC. WHERE abbreviates: where_ou. BV abbreviates: is_est. DET-FEMN abbreviates: the_la. DET-MASC abbreviates: the_le. N-FEMN abbreviates: exit_sortie, shop_boutique, toilet_toilette. N-MASC abbreviates: policeman_gendarme.



Notice in ENG_FRE-2 how the gender distinction has been, in effect, encoded into the states: if we traverse the net via state 4, then we must have a feminine determiner and a feminine noun, whereas if we go via state 6, then we must have a masculine determiner and a masculine noun. There are no other possibilities.

Notice also that if we use this transducer to translate from French to English, then it will operate deterministically - it will never need to make a choice. But if we use it to translate from English to French, its operation will not be deterministic: in translating 'Where is the policeman?' it will be faced with a choice when it reaches the determiner. It can either traverse the DET-MASC arc or the DET-FEMN arc and it has no basis for deciding which. If it goes the DET-FEMN route, then it will fail when it reaches the N-FEMN arc, since policeman has no French counterpart in N-FEMN. So, any algorithm that we devise to employ FSTs that exhibit such non-determinism will need to incorporate either an ability to backtrack or an ability to explore multiple arcs in parallel. It is worth noting that our earlier and uglier English-French FST was deterministic in both directions.

The examples of FSTs that we have just been playing with are misleading in that nobody nowadays would dream of attempting to do serious English-French machine translation with an FST, for reasons that will begin to emerge in the final section of this chapter and which should be self-evident by the time you have reached the end of this book. But FSTs are potentially well suited to providing efficient solutions to certain small self-contained areas of linguistic analysis. Examples that spring to mind are the translation or interpretation of number names and time of day expressions (although not in all languages), text to speech transduction in languages with fairly well-behaved orthographies and the inflectional analysis of word forms. We will look briefly at the latter here, returning to our earlier Swahili example, reconstructed as an FST mapping between the Swahili morphemes and reasonably perspicuous representations of their syntactic and semantic content (SWAHILI-2).


BOX:




Name SWAHILI-2: Initial 10 Final 90 From 10 to 21 by Subj_ni From 10 to 22 by Subj_u From 10 to 23 by Subj_a From 10 to 24 by Subj_tu From 10 to 25 by Subj_wa From 21 to 31 by 1ST From 22 to 31 by 2ND From 23 to 31 by 3RD From 24 to 32 by 1ST From 25 to 32 by 3RD From 31 to 40 by SING From 32 to 40 by PLUR From 40 to 50 by TENSE From 50 to 61 by Obj_ni From 50 to 62 by Obj_ku From 50 to 63 by Obj_m From 50 to 64 by Obj_tu From 50 to 65 by Obj_wa From 61 to 71 by 1ST From 62 to 71 by 2ND From 63 to 71 by 3RD From 64 to 72 by 1ST From 65 to 72 by 3RD From 71 to 80 by SING From 72 to 80 by PLUR From 80 to 90 by STEM.

1ST abbreviates: 1st_#. 2ND abbreviates: 2nd_#. 3RD abbreviates: 3rd_#. SING abbreviates: Sing_#. PLUR abbreviates: Plur_#. TENSE abbreviates: Future_ta, Present_na, Perfect_me, Past_li. STEM abbreviates: LIKE_penda, BEAT_piga, ANNOY_sumbua, PAY_lipa.



This network will map a Swahili expression like 'wa-me-ni-sumbua' (on the second tape) into 'Subj-3rd-Plur-Perfect-Obj-1st-Sing-ANNOY' (on the first tape) or conversely, and it can be seen why this might well be a useful thing to do in a system designed to analyze or synthesize inflected Swahili words. Notice that the FST given is deterministic when we use it to map from Swahili into our analysis expressions, but not when we go in the opposite direction. Notice also that we have been cheating in our discussion so far: real Swahili words come to us as 'wamenisumbua' not as 'wa-me-ni-sumbua' with all the morpheme breaks conveniently marked. But we can solve this problem with the FST machinery that we already have: we simply need to transduce 'w a m e n i s u m b u a' into 'wa me ni sumbua'.

Design an equivalent FST that will run deterministically from the analysis expressions to the Swahili words.

Elaborate the ENG_FRE-2 FST to cover a small but useful subset of the phrases that you might need in a foreign airport.

Design an FST that will transduce well-formed English number names, such as 'twenty two', into the corresponding Arabic numerals - that is, '22'.

Design an FST that will transduce well-formed English time-of-day expressions into their French counterparts, or those of some other language. Thus, for example, it should map 'seven minutes past noon' into 'midi sept'.

Design an FST that will transduce well-formed English number names into well-formed French number names, or those of some other language. Thus, for example, it should map 'ninety four' into 'quatre vingt quatorze'.

Design an FST to transduce 'w a m e n i s u m b u a' into 'wa me ni sumbua', etc., for all the Swahili data.

Imagine two parallel sequences of phonemes and phones, set side by side so that each phoneme is paired with a phone (ignoring null elements, for the sake of simplicity).

An FST can crawl down this sequence of phoneme-phone pairs obeying rules such as the following:

If your current phoneme is /d/ and your current phone is [t] and you are in state 17, then you may advance to the next pair and shift into state 11.

In this example, state 17 might well be the state that the automaton gets into as it emerges from an unvoiced consonant, say. If the FST is able to reach the end of the sequence of pairs in an approved final state, then the phone sequence is a well-formed realization of the phoneme sequence according to the rules that the FST embodies (and conversely). We can reformulate the inspection rule as one of transduction:

If your current phoneme is /d/ and you are in state 17, then you may print [t] and advance to the next phoneme and shift into state 11.

Devise a small FST incorporating such phoneme-phone matching rules and show how it can handle some standard phonological problems.

Design an FST that will translate three-character strings consisting of any letter of the alphabet, a mid-string marker character, say '^', and any letter of the alphabet into its mirror image. So, for instance, the string 'a^h' would be mapped into the string 'h^a'. Roughly how many states would you need for a similar FST that reversed strings like 'ak^yp' consisting of four alphabetic characters with a mid-string marker in the middle?

Deterministic FSTs, like deterministic FSTNs, can be (partially) represented by state-transition tables. The difference is that whereas the state-transition table for an FSTN is a two-dimensional object, that for a two-tape transducer is a three-dimensional object, with one dimension for the states and one dimension for each of the symbol sets associated with the tapes. This is illustrated here with a two-dimensional slice of the three-dimensional matrix for the FST that we used as an example, above:




est | 1 2 3 4 5 ____________________ where 0 0 0 0 0 is 0 3 0 0 0 the 0 0 0 0 0 exit 0 0 0 0 0 policeman 0 0 0 0 0 shop 0 0 0 0 0 toilet 0 0 0 0 0

This can be interpreted as saying that if you are in state 2 and the next word on your input tape is 'est', then you can print 'is' on your output tape and change into state 3. Otherwise, you are stuck. The full matrix would require five more of these two-dimensional slices, one for each word from the symbols vocabulary of the second tape. [An alternative to this three-dimensional approach would be to use a two-dimensional table with one dimension for the states and the other dimension for symbol-pairs]. Write a program that will compile deterministic two-tape FST descriptions given in our formal language into three-dimensional state-transition matrices. Now write a program that will use these matrices, and information about initial and final states, to translate from the language used on one tape to that used on the other. What happens if your compiler is given a non-deterministic FST to compile?

~!~

Send us a comment.



[Contents] [Previous] [Next]
This document was translated by troff2html v0.21 on October 22, 1996.