Talk:Lexical analysis

From Wikipedia, the free encyclopedia


This article is not clear enough! It needs more examples!

Robert A.


The link to ocaml-ulex in 'Links' is broken.

Frank S.

Contents

[edit] Types of tokens

Can someone explain to me what types of tokens there are? (Keywords, identifiers, literals? Or are there more) And do each of these types go into a symbol table of that type? i.e. an identifier table, keyword table and literal table? Or are they just stored into one uniform symbol table? —Dudboi 02:17, 6 November 2006 (UTC)

It really depends. For instance, for the example language (PL/0 - see the appropriate for example source code) in the article, here are the available token types:

multi-character operators: '+', '-', '*', '/', '=', '(', ')', ':=', '<', '<=', '<>', '>', '>=' language required punctuation: ',', ';', '.', literal numbers - specifically, integers identifiers: a-zA-Z {a-zA-Z0-9} keywords: "begin", "call", "const", "do", "end", "if", "odd", "procedure", "then", "var", "while"

Other languages might have more token types. For instance, in the C language, you would have string literals, character literals, floating point numbers, hexadecimal numbers, directives, and so on.

I've seen them all stored in a single table, and I've also seen them stored in multiple tables. I don't know if there is a standard, but from the books I've read, and the source code I've seen, multiple tables appears to be popular.

second task performed during lexical analysis is to make entry of tokens into a symbol table if it is there.Some other task performed during lexical analysis are: 1.to remove all comments,tab,blank spaces and machin characters. 2.to produce error massages occerrd in a source programs.

See the following links for simple approachable compiler sources:

http://en.wikipedia.org/wiki/PL/0 http://www.246.dk/pascals1.html http://www.246.dk/pascals5.html

See http://www.246.dk/pl0.html for more information on PL/0. 208.253.91.250 18:07, 13 November 2006 (UTC)

[edit] merger and clean up

I've ``merged`` token (parser) here. The page is a bit of a mess now though. The headings I've made should help sort that out. The examples should be improved so that they take up less space. The lex file should probably be moved to the flex page. --MarSch 15:37, 30 April 2007 (UTC)

done the move of the example. --MarSch 15:39, 30 April 2007 (UTC)

[edit] Next?

It would be nice to see what is involved in the next step, or at least see a link to the page describing the whole process of turning high level code into low level code

-Dusan B.

you mean compiling? --MarSch 10:53, 5 May 2007 (UTC)

[edit] Lexical Errors

There's a mention that scanning fails on an "invalid token". It doesn't seem particularly clear what constitutes a lexical error other than a string of garbage characters. Any ideas? --138.16.23.227 (talk) 04:54, 27 November 2007 (UTC)

If the lexer finds an invalid token, it will report an error.
The comment needs some context. Generally speaking, a lexical analyzer may report an error. But that is usually (for instance in Lex programming tool) under control of the person who designs the rules for the analysis. The analyzer itelf may reject the rules because they're inconsistent. On the other hand, the parser is more likely to have built-in behavior -- yacc for instance fails on any mismatch, requires the developer to specify how to handle errors (updating the article to reflect these comments requires citing reliable sources of course - talk pages aren't a source) Tedickey (talk) 13:56, 27 November 2007 (UTC)

[edit] Lexer, Tokenizer, Scanner, Parser

There should be some more clearly explaned what exactly differs those modules, what are their particular jobs and how they're composed together (maybe some illustration?) —Preceding unsigned comment added by Sasq777 (talk • contribs) 16:06, 11 June 2008 (UTC)