Programming language: Crystal
License: GNU Lesser General Public License v3.0 only
Tags: Misc    

acorn alternatives and similar shards

Based on the "Misc" category.
Alternatively, view acorn alternatives based on common mentions on social networks and blogs.

Do you think we are missing an alternative of acorn or a related project?

Add another 'Misc' Shard


Acorn Build Status

๐Ÿšง Under Construction ๐Ÿ‘ท

A state machine compiler with no runtime dependency. Define a grammar using a subset of regular expression notation, then compile it into a blazing-fast state machine. Acorn supports lexers or custom string-based state machines.


Add this to your application's shard.yml:

    github: "rmosolgo/acorn"


  • Define the grammar in a build file:
  # ./build/my_lexer.cr
  require "acorn"

  class MyLexer < Acorn::Lexer
    # optional, rename the generated class:
    # name("Namespace::MyLexer")
    token :letter "a-z"
    token :number "0-9"
    generate "./src/my_lexer.cr"
  • Generate the lexer:
  crystal run ./build/my_lexer.cr
  • Use the compiled lexer:
  require "./src/my_lexer.cr"
  MyLexer.scan(input) # => Array(Tuple(Symbol, String))

Regular Expressions

Tokens are defined with a small regular expression language:

Feature Example
Character a, 1, โค๏ธ
Sequence ab, 123
Alternation a|b
Grouping (ab)|c
Any character .
One of [abc]
Not one of [^abc]
Escape \[, \.
Unicode character range a-z, 0-9
Zero-or-more a*
One-or-more a+
Zero-or-one a?
Specific number a{3}
Between numbers a{3,4}
At least a{3,}

Build Step

An Acorn module is a Crystal program that generates code. To get a lexer, you have to run the Acorn module. Then, your main program should use the generated code.

For example, if you define a lexer:

# build/my_lexer.cr
class MyLexer < Acorn::Lexer
  # ...

You should run the file with Crystal to generate the specified file:

crystal run build/my_lexer.cr

Then, your main program should require the generated file:

# my_app.cr
require "app/my_lexer"
MyLexer.scan(input) # => Array(Tuple(Symbol, String))

The generated code has no dependency on Acorn, so you only need this library during development.


Acorn returns an of array tokens. Each token is a tuple with:

  • Symbol: the name of this token in the lexer definition
  • String: the segment of input which matched the pattern
  • {Int32, Int32}: line number and column number where the token began
  • {Int32, Int32}: line number and column number where the token ended

Line numbers and column numbers are 1-indexed, so the first character in the input is 1:1.

Custom Machines

Acorn lexers are actually a special case of state machine. You can specify a custom machine, too.

  • class MyMachine < Acorn::Machine to bring in the macros
  • alias Accumulator = ... to specify the data that will be modified during the process
    • It will be initialized with .new
    • It will be returned from .scan
  • use action :name, "pattern" { |acc, str, ts, te| ... } to define patterns
    • acc is an instance of your Accumulator
    • str is the original input
    • ts is the index in str where this token began
    • te is the index in str where this token ended (if the match is one character long, ts == te)
  • optionally, name("MyNamespace::MachineName") to rename the generated Crystal class


  • rebuild fixtures with crystal run spec/prepare.cr
  • crystal spec

Goals & Non-Goals


  • Great runtime performance
  • Linear time complexity
  • Reasonable code size (.cr file and memory usage)
  • No runtime dependency
  • Plain-Crystal API (source files are .cr, not a special format)
  • Easy to write a lexer, possible to write something else


  • Great compile-time performance (choose simplicity over performance)
  • Fancy regexp features


  • Finish regexp language ((...), ., [^...])
    • A move on a is also a move on :any, how is that handled?
  • Better line/col awareness:
    • Add line/col to tokens
    • Add line/col to errors



*Note that all licence references and agreements mentioned in the acorn README section above are relevant to that project's source code only.