Module Higlo.Lang

Syntax highligthing

type token =
| Bcomment of string(*

block comment

*)
| Constant of string
| Directive of string
| Escape of string(*

Escape sequence like \123

*)
| Id of string
| Keyword of int * string
| Lcomment of string(*

one line comment

*)
| Numeric of string
| String of string
| Symbol of int * string
| Text of string(*

Used for everything else

*)

Tokens read in the given code. These names are inspired from the highlight tool. Keyword and Symbol are parametrized by an integer to be able to distinguish different families of keywords and symbols, as kwa, kwb, ..., in highlight.

val string_of_token : token -> string

For debug printing.

type error =
| Unknown_lang of string(*

when the required language is not found.

*)
| Lex_error of Location.t * string
exception Error of error
val string_of_error : error -> string
val pp : Stdlib.Format.formatter -> error -> unit
type lexer = Sedlexing.lexbuf -> token list

Lexers are based on Sedlex. A lexer returns a list of tokens, in the same order they appear in the read string. Text tokens are merged by the parse function.

val get_lexer : string -> lexer

get_lexer lang returns the lexer registered for the given language lang or raises Unknown_lang if no such language was registered.

val register_lang : string -> lexer -> unit

If a lexer was registered for the same language, it is not available any more.

val parse : ?raise_exn:bool -> lang:string -> string -> token list

parse ?raise_exn ~lang code gets the lexer associated to lang and uses it to build a list of tokens. Consecutive Text tokens are merged. If no lexer is associated to the given language, then the function returns [Text code].

  • parameter raise

    defaults to false. If true, raise exceptions rather than returning [Text code].