• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/44

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

44 Cards in this Set

  • Front
  • Back
Syntax Analyzer
Checks the syntax of a program and creates a parse tree from it.
Advantages of using BNF
BNF descriptions are clear and concise, both for humans and software systems.
Syntax analyzers can be generated directly from BNF.
Implementations based on BNF are easy to maintain
The two distinct parts of analyzing syntax
1) The lexical analyzer deals with small-scale language constructs, such as names and numeric literals.

2) The syntax analyzer deals with large-scale constructs, such as expressions, statements, and program units.
Reasons for the separation of the two parts of analyzing syntax
Simplicity—Removing the details of lexical analysis from the syntax analyzer
makes it smaller and less complex.
Efficiency—It beomes easier to optimize the lexical analyzer.
Portability—The lexical analyzer reads source files, so it may be platformdependent.
A lexical analyzer collects input characters into groups (Called what?) and assigns an
internal code (Called what?) to each group.
1) Lexemes
2) Token
How are Lexemes recognized?
By matching the input against patterns
What are the Tokens and lexemes of this statement?

result = oldsum - value / 100;
Token Lexeme
IDENT result
ASSIGN_OP =
IDENT oldsum
SUB_OP -
IDENT value
DIV_OP /
INT_LIT 100
SEMICOLON ;
Other tasks performed by a lexical analyzer?
Skipping comments and white space between lexemes.
Inserting lexemes for user-defined names into the symbol table.
Detecting syntactic errors in tokens, such as ill-formed floating-point literals.
Approaches to building a lexical analyzer:
Write a formal description of the token patterns of the language and use a software tool such as lex to automatically generate a lexical analyzer.
Design a state transition diagram that describes the token patterns of the language and write a program that implements the diagram.
Design a state transition diagram that describes the token patterns of the language and hand-construct a table-driven implementation of the state diagram
A state transition diagram, or state diagram, is a directed graph
The nodes are labeled with state names.
The arcs are labeled with input characters.
An arc may also include actions to be done when the transition is taken.
Syntax analysis is often referred to as:
Parsing
Top-down parsers
Build the tree from the root downward to the leaves.
Bottom-up parsers
Build the tree from the leaves upward to the root.
Terminal symbols convention
Lowercase letters at the beginning of the alphabet (a, b, …)
Nonterminal symbols convention
Uppercase letters at the beginning of the alphabet (A, B,
…)
Terminals or nonterminals convention
Uppercase letters at the end of the alphabet (W, X,
Y, Z)
Strings of terminals convention
Lowercase letters at the end of the alphabet (w, x, y, z)
Mixed strings (terminals and/or nonterminals) convention
Lowercase Greek letters (α, β,
γ, δ)
A top-down parser traces or builds the parse tree:
Traces or builds the parse tree in preorder: each node is visited
before its branches are followed.
The actions taken by a top-down parser correspond to a
Actions correspond to a leftmost derivation
recursive-descent parser
coded directly from the BNF description of the
syntax of a language.
An alternative is to use a parsing table rather than code
LL algorithm
The first L in LL specifies a left-to-right scan of the input; the second L specifies that a leftmost derivation is generated.
A bottom-up parser constructs a parse tree by:
Constructs a parse tree by beginning at the leaves and progressing toward the root.
The actions taken by a botom-up parser correspond to a
Actions correspond to a right-most derivation
Handle
The correct RHS to reduce
Recursive-descent parser
consists of a collection of subprograms, many of
which are recursive; it produces a parse tree in top-down order.
A Recursive-descent parser has
has one subprogram for each nonterminal in the
grammar.
Pairwise disjointness test
is used to test a non-left-recursive grammar to
determine whether it can be parsed in a top-down fashion. This test requires
computing FIRST sets, where
FIRST(α) = {a | α =>* aβ}
The symbol =>* indicates a derivation of zero or more steps. If α =>* ε, then ε
is also in FIRST(α), where ε is the empty string.
Shift-reduce algorithms
Bottom-up parsers are often called this, because shift and
reduce are their two fundamental actions.
The shift action
moves the next input token onto the parser’s stack.
A reduce action
replaces a RHS (the handle) on top of the parser’s stack by its
corresponding LHS.
Most bottom-up parsing algorithms belong to the
belong to the LR family. LR parsers use a
relatively small amount of code and a parsing table.
The original LR algorithm was designed by
designed by Donald Knuth, who published it in
1965
Canonical LR algorithm
was not widely used because producing the
parsing table required large amounts of computer time and memory.
Advantages of LR parsers:
They can be built for all programming languages.
They can detect syntax errors as soon as possible in a left-to-right scan.
The LR class of grammars is a proper superset of the class parsable by LL parsers.
LR Parser
type of bottom-up parsers. he L means that the parser reads input text in one direction without backing up; that direction is typically Left to right within each line, and top to bottom across the lines of the full input file. (This is true for most parsers.) The R means that the parser produces a reversed Rightmost derivation; it does a bottom-up parse, not a top-down LL parse or ad-hoc parse.
the parser can avoid examining the entire stack if
it keeps a summary of the stack contents in a “state” symbol on top of the stack.
Knuth’s insight was that it was only necessary to look to the
left of the suspected
handle (by examining the stack) to determine whether it was the handle.
An LR parser configuration
a pair of strings representing the stack and the
remaining input:
(S0Xl
SlX2
S2…XmSm, a
i
a
i+1…an
$)
The dollar sign is an end-of-input marker.
The LR parsing process is based on the parsing table, which has two parts:
ACTION and GOTO.
The ACTION part has
has state symbols as its row labels and terminal symbols as its
column labels.
The two primary actions are
shift (shift the next input symbol onto the stack) and
reduce (replace the handle on top of the stack by the LHS of the matching rule).
Two other actions are possible:
accept (parsing is complete) and error (a syntax
error has been detected).
Informal definition of parser actions:
Shift: The next input symbol is pushed onto the stack, along with the state symbol specified in the ACTION table.

Reduce: First, the handle is removed from the stack. For every grammar symbol
on the stack there is a state symbol, so the number of symbols removed is twice
the number of symbols in the handle. Next, the LHS of the rule is pushed onto
the stack. Finally, the GOTO table is used to determine which state must be
pushed onto the stack.

Accept: The parse is complete and no errors were found.

Error: The parser calls an error-handling routine.