No description
  • HTML 62.1%
  • Elixir 19.5%
  • C 16.6%
  • Makefile 1.8%
Find a file
Haelwenn 578145591f Merge branch 'lexbor-2.5.0' into 'master'
Release 2.5.0

See merge request pleroma/elixir-libraries/fast_html!35
2025-08-15 01:33:14 +00:00
bench_fixtures bench_fixtures: Add Copyright information 2022-09-25 22:29:47 +02:00
c_src c_src/lexbor: Bump lexbor to 2.5.0 2025-08-15 03:27:50 +02:00
config config: use Mix.Config -> import Config 2025-01-02 04:18:42 +01:00
lib Logger.warn -> Logger.warning 2025-01-02 04:18:42 +01:00
LICENSES bench_fixtures: Add Copyright information 2022-09-25 22:29:47 +02:00
test REUSE: Add Pleroma copyright 2022-09-25 22:29:29 +02:00
.formatter.exs REUSE: Add Pleroma copyright 2022-09-25 22:29:29 +02:00
.gitignore mix.lock: Remove 2024-01-29 17:56:26 +01:00
.gitlab-ci.yml Require Elixir 1.14+ 2025-01-02 04:18:42 +01:00
.gitmodules Bump lexbor to 2.4.0 2024-11-13 03:55:05 +01:00
CHANGELOG.md Release 2.4.1 2025-01-02 03:34:03 +01:00
Makefile Makefile: add $CPPFLAGS support 2025-08-15 03:26:38 +02:00
mix.exs Release 2.5.0 2025-08-15 03:28:35 +02:00
README.md Add WITH_SYSTEM_LEXBOR=1 option 2023-07-16 23:07:52 +02:00

FastHTML

A C Node wrapping lexbor. Primarily used with FastSanitize.

  • Available as a hex package: {:fast_html, "~> 2.0"}
  • Documentation

Compiling

  • GNU Make
  • C Compiler
  • Erlang 22.0+ with development headers
  • (optional) lexbor 2.2.0+

If you want to use a system installation of lexbor, you can set WITH_SYSTEM_LEXBOR=1 during compilation time. By default it will used the vendored version present at c_src/lexbor.

Benchmarks

The following table provides median times it takes to decode a string to a tree for html parsers that can be used from Elixir. Benchmarks were conducted on a machine with an AMD Ryzen 9 3950X (32) @ 3.500GHz CPU and 32GB of RAM. The mix fast_html.bench task can be used for running the benchmark by yourself.

File/Parser fast_html (Port) mochiweb_html (erlang) html5ever (Rust NIF) Myhtmlex (NIF)¹
document-large.html (6.9M) 125.12 ms 1778.34 ms 395.21 ms 327.17 ms
document-small.html (25K) 0.50 ms 2.76 ms 1.72 ms 1.19 ms
fragment-large.html (33K) 0.93 ms 4.78 ms 2.34 ms 2.15 ms
fragment-small.html² (757B) 44.60 μs 42.13 μs 43.58 μs 289.71 μs

Full benchmark output can be seen in this snippet

  1. Myhtmlex has a C-Node mode, but it wasn't benchmarked here because it segfaults on document-large.html
  2. The slowdown on fragment-small.html is due to Port overhead. Unlike html5ever and Myhtmlex in NIF mode, fast_html has the parser process isolated and communicates with it over stdio, so even if a fatal crash in the parser happens, it won't bring down the entire VM.

Contribution / Bug Reports

  • Please make sure you do git submodule update after a checkout/pull
  • The project aims to be fully tested