Lex Language Project ((better)) Page

Rather than writing a complex state machine by hand to read characters one by one, a developer using the Lex Language Project writes a set of regular expression rules. Lex then generates a C program (typically lex.yy.c ) that acts as a deterministic finite automaton (DFA). This generated program is incredibly fast and efficient, scanning input text at hundreds of megabytes per second on modern hardware.

% #include <stdio.h> % %% "if" printf("Keyword: IF\n"); [a-zA-Z][a-zA-Z0-9]* printf("Identifier: %s\n", yytext); [0-9]+ printf("Integer: %s\n", yytext); \n /* ignore newlines */ . /* ignore other chars */ %% int main() yylex(); return 0; lex language project

Tools like snort (intrusion detection) and custom TCP/IP analyzers use Lex-style tokenization to parse HTTP headers, SMTP commands, and FTP responses. Rather than writing a complex state machine by