Automata — What’s it?
The term “Automata” comes from the Greek word “αὐτόματα” which implies “self-acting”. An automaton (Automata in plural) is an abstract self-propelled machine which follows a predetermined sequence of operations automatically.
An automaton with a finite number of states is named a Finite Automaton (FA) or Finite State Machine (FSM).
Formal definition of a Finite Automaton
An automaton is represented by a 5-tuple (Q, ∑, δ, q0, F), where −
Q may be a finite set of states.
∑ may be a finite set of symbols, called the alphabet of the automaton.
δ is that the transition function.
q0 is that the initial state from where any input is processed (q0 ∈ Q).
F may be a set of ultimate state/states of Q (F ⊆ Q).
Finite Automaton is classified into two types −
- Deterministic Finite Automaton (DFA)
- Non-deterministic Finite Automaton (NDFA / NFA)
Deterministic Finite Automaton (DFA)
In DFA, for every input symbol, one can determine the state to which the machine will move. Hence, it’s called Deterministic Automaton. Because it features a finite number of states, the machine is named Deterministic Finite Machine or Deterministic Finite Automaton.
Non-Deterministic Finite Automaton (NDFA)
NDFA, for a specific input symbol, the machine can move to any combination of the states within the machine. In other words, the precise state to which the machine moves can’t be determined. Hence, it’s called Non-deterministic Automaton. Because it has finite number of states, the machine is named Non-deterministic Finite Machine or Non-deterministic Finite Automaton.
APPLICATIONS — REAL WORLD
DFAs as a model of computation
Above: A DFA requires O(1) memory, irrespective of the length of the input.
You might have heard of Turing machines, which abstracts the thought of a “computer”. In a very similar vein, regular languages describe what’s possible to try to to with a computer with little memory. regardless of how long the input is, a DFA only keeps track of what state it’s currently in, so it only requires a continuing amount of memory.
By studying properties of normal languages, we gain a stronger understanding of what’s and what isn’t possible with computers with little memory.with a computer with very little memory.
This explains why theorists care about regular languages — but what are some universe applications?
DFAs and regular expressions
Regular expressions are a great tool that each programmer should know. If you wanted to test if a string may be a valid email address, you may write something like:
/^([a-z0–9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
Behind the scenes, this regular expression gets converted into an NFA, which might be quickly evaluated to supply a solution.
You don’t have to understand the internals of this so as to use regular expressions, but it’s useful to understand some theory so you understand its limitations. Some programmers may try and use regular expressions to parse HTML, but if you’ve seen the Pumping Lemma, you may understand why this is often fundamentally impossible.
REGULAR EXPRESSIONS IN LEXICAL ANALYSIS
When you write a program to perform some task in some editor on a computer, have you ever wondered how does your program get compiled? How does the computer decipher the high-level language? Through this blog we will go through the first stage of compiler design and how regular expressions are used in it.
A Regular expression, often called a pattern, is an expression used to specify a set of strings required for a particular purpose. Regular expressions have the capability to express finite languages by defining a pattern for finite strings of symbols. These languages are known as Regular languages.
For example: digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0–9] This accepts digits only.
Now, regular expressions are widely used and play a very significant role in compiler design. One of the major applications of Regular Expression is Lexical Analysis. Let’s have a look at what is Lexical Analysis?
Lexical analysis is the very first phase in the compiler designing. It is a process of taking an input string of characters such as the source code of a high-level computer program and producing a sequence of symbols or patterns called as tokens. Programs that perform lexical analysis are called lexical analyzers or lexers. A lexical analyzer reads the source text and identifies each token one by one.
Terms one needs to know:
➢ What is a Token? Token is a sequence of characters which represents a unit of information in the source program. Example: Keywords, Identifiers, Operators and Separators.
➢ What is a Lexeme? A lexeme is a sequence of characters that are included in the source program according to the matching pattern of a token. It is nothing but an instance of a token. Example: “int”/“a”/“+”/“,”
➢ What is Pattern? A pattern is rule that describes the set of strings associated with the Token. It is specified by using Regular Expressions to describe how a token can be formed. Example: [a-z]*
The most basic and important part is recognizing tokens as it constitutes the first stage in Lexical Analysis. Regular expressions are used to represent the language for lexical analyzer. They assist in finding the type of token that accounts for a particular lexeme. Regular Expressions are used in the programs to build lexical analyzers to give lexical analyzer the power to accept a set of strings and identify them as tokens.
Lexical Analyzer Architecture:
The lexical Analyzer reads the entire source code and generates tokens. It is also called as Scanning. Scanners are always implemented to produce the tokens only when requested by the parser or the syntax analyzer (which is the second phase of compiler design).
The lexical analyser works as:
- The parser requests the lexical analyzer to get token.
- On receiving this request, the lexical analyzer scans the input until it finds the next token.
- It returns the token to Parser.
Lexical Analyzer skips whitespaces and comments while creating these tokens. If any error is present, then Lexical analyzer will correlate that error with the source file and line number.
The programming language tokens can be described by Regular expressions. As the lexer proceeds with the scanning of the source code it recognizes the tokens according to the definition provided by the regular expression. It checks whether the lexeme lies in the range of strings in the provided regular language of the programming language. Once it identifies the lexeme or the pattern it checks the symbol table to find what type of token it is and then returns it to the parser.
How to use Regular Expressions in specifying the Lexical Analysis Structure: 1. Select a set of tokens:
Keyword, identifier, integer, openPar, ….
2. Define a regular expression for the lexemes of these tokens for a programming language:
Keyword = ‘if’ , ‘else’, ….
Identifier/ variable = letter(letter + digit)*
Integer = digit + (positive closure)
openPar = ‘(‘
3. Construct R matching all lexemes for all token
R = Keyword + Integer + Identifier + ……
R = R1 + R2 + R3 + …….
4. Let input be x1…xn Check whether x1…xi ∈ L(R) ?
5. If x1…xi ∈ L(R) then the lexeme is matched in the symbol table and sent to the parser.
6. If the token is a whitespace we can define a regular expression for it as — Whitespace = (“ “ + “\n” + ”\t”)+ (positive closure)
if x1….xi ∈ L(whitespace) then the lexical analyzer ignores the space.
Summary:
➢ In the above-mentioned steps, by using various algebraic rules obeyed by regular expressions we can manipulate the range of patterns of strings according to the programming language.
➢ Regular Expressions contribute greatly in compiler designs. It eases the process of lexical analysis and the syntax analysis by eliminating unwanted tokens.
➢ Lexical analyzer is used by web browsers to format and display a web page with the help of parsed data from JavaScript, HTML, CSS.
DFAs in compilers
In every programming language, the primary step within the compiler or interpreter is that the lexer. The lexer reads in a very file of your favorite artificial language, and produces a sequence of tokens. for instance, if you’ve got this line in C++:
1
cout << "Hello World" << endl;
The lexer generates something like this:
1
2
3
4
5
6
IDENTIFIER cout
LSHIFT <<
STRING "Hello World"
LSHIFT <<
IDENTIFIER endl
SEMICOLON ;
The lexer uses a DFA to travel through the source file, one character at a time, and emit tokens. If you ever design your own artificial language, this may be one amongst the primary stuff you will write.
Above: Lexer description for JSON numbers, like -3.05
DFAs for Artificial Language :
Another application of finite automata is programming simple agents to retort to inputs and produce actions in how. you’ll be able to write a full program, but a DFA is commonly enough to try to to the duty. DFAs are easier to reason about and easier to implement.
State machines are certainly not the most sophisticated means of implementing artificially intelligent agents in games, but many games include characters with simple, state-based behaviors that are easily and effectively modeled using state machines.
Here we consider the classic game, Pac-Man. For those unfamiliar with the gameplay, Pac-Man requires the player to navigate through a maze, eating pellets and avoiding the ghosts who chase him through the maze. Occasionally, Pac-Man can turn the tables on his pursuers by eating a power pellet, which temporarily grants him the power to eat the ghosts. When this occurs, the ghosts’ behavior changes, and instead of chasing Pac-Man they try to avoid him.
The ghosts in Pac-Man have four behaviors:
- Randomly wander the maze
- Chase Pac-Man, when he is within line of sight
- Flee Pac-Man, after Pac-Man has consumed a power pellet
- Return to the central base to regenerate
Transitions are dictated by the situation in the game. For instance, a ghost DFA in state 2 (Chase Pac-Man) will transition to state 3 (Flee) when Pac-Man consumes a power pellet.
Typically this sort of automaton is named a Finite State Machine (FSM) instead of a DFA. The difference is that in a very FSM, we do an action looking on the state, whereas in a very DFA, we care about accepting or rejecting a string — but they’re the identical concept.
DFAs in probability
What if we took a DFA, but rather than fixed transition rules, the transitions were probabilistic? This is often called as Markov Chain!
Markov chains are frequently utilized in probability and statistics, and have many applications in finance and computing. Google’s PageRank algorithm uses an enormous Markoff process to see the relative importance of web pages!
You can calculate things just like the probability of being in a very state after a particular number of your time steps, or the expected number of steps to achieve a particular state.
In summary, DFAs are powerful and versatile tools with myriad real-world applications. Research in formal language theory is efficacious, because it helps us better understand DFAs and what they will do.
Authors,
Sudesh Raina
Vishwas Meshram
Yash Soni
Ashutosh Taide