Table of Contents

Automata - Deterministic finite-state automata (DFA)

About

A Deterministic finite-state automata (DFA) is a finite automaton that cannot be in more than one state at any one time.

The term deterministic refers to the fact that on each input there is one and only one state to which the automaton can transition from its current state.

The inverse of Deterministic finite-state automata (DFA) is Nondeterministic finite-state automata (NFA) that may be in several states at once.

They are fast, without expensive backtracking.

Parsing Example

A Finite State Machine (FSM) will:

A FSM versus a regular expression approach can come handy if you need to deal with matched nested parentheses or with deeper parsing level (different types of parentheses, etc.)

Definition

A DFA is denoted in the Five tuple notation as being a set of the following element.

<MATH> A = (Q, \sigma, \delta, q0, F) </MATH> where:

Finite

A finite automaton has the following function composition property

For the deterministic finite automata q, if the input string <math>s = y+z</math> , then processing a string s composed of two strings y and z is the same as processing first y then processing z.

<MATH>\delta(q,s) = \delta(\delta(q,y),z)</MATH>

where <math>\delta</math> is the state of an automaton q after processing a string (here s, y or z)

Regular Expression

Most regexp library are based on nondeterministic automata (NFA)

Lexer

A lexer 1) defines all token via several regular expression. To efficiently match the input against the possible tokens, all regexes are compiled together into a Deterministic Finite State Machine. The DFA is then walked character by character over the input string and fires a “match” event whenever one of the expressions is matched.

Library

Documentation / Reference