Commit e62c3008 authored by rinpatch's avatar rinpatch

fast_html: make decode a single funciton and allow specifying timeouts

parent c8551dc3
......@@ -4,29 +4,6 @@ defmodule :fast_html do
Based on [Alexander Borisov's myhtml](https://github.com/lexborisov/myhtml),
this binding gains the properties of being html-spec compliant and very fast.
## Example
iex> :fast_html.decode("<h1>Hello world</h1>")
{:ok, {"html", [], [{"head", [], []}, {"body", [], [{"h1", [], ["Hello world"]}]}]}}
Benchmark results (removed Nif calling mode) on various file sizes on a 2,5Ghz Core i7:
Settings:
duration: 1.0 s
## FileSizesBench
[15:28:42] 1/3: github_trending_js.html 341k
[15:28:46] 2/3: w3c_html5.html 131k
[15:28:48] 3/3: wikipedia_hyperlink.html 97k
Finished in 7.52 seconds
## FileSizesBench
benchmark name iterations average time
wikipedia_hyperlink.html 97k 1000 1385.86 µs/op
w3c_html5.html 131k 1000 2179.30 µs/op
github_trending_js.html 341k 500 5686.21 µs/op
"""
@type tag() :: String.t() | atom()
......@@ -44,11 +21,25 @@ defmodule :fast_html do
@doc """
Returns a tree representation from the given html string.
## Examples
`opts` is a keyword list of options, the options available:
* `timeout` - Call timeout
* `format` - Format flags for the tree
The following format flags are available:
* `:html_atoms` uses atoms for known html tags (faster), binaries for everything else.
* `:nil_self_closing` uses `nil` to designate self-closing tags and void elements.
For example `<br>` is then being represented like `{"br", [], nil}`.
See http://w3c.github.io/html-reference/syntax.html#void-elements for a full list of void elements.
* `:comment_tuple3` uses 3-tuple elements for comments, instead of the default 2-tuple element.
## Examples
iex> :fast_html.decode("<h1>Hello world</h1>")
{:ok, {"html", [], [{"head", [], []}, {"body", [], [{"h1", [], ["Hello world"]}]}]}}
iex> :fast_html.decode("Hello world", timeout: 0)
{:error, :timeout}
iex> :fast_html.decode("<span class='hello'>Hi there</span>")
{:ok, {"html", [],
[{"head", [], []},
......@@ -59,24 +50,6 @@ defmodule :fast_html do
iex> :fast_html.decode("<br>")
{:ok, {"html", [], [{"head", [], []}, {"body", [], [{"br", [], []}]}]}}
"""
@spec decode(String.t()) :: {:ok, tree()} | {:error, String.t() | atom()}
def decode(bin) do
decode(bin, format: [])
end
@doc """
Returns a tree representation from the given html string.
This variant allows you to pass in one or more of the following format flags:
* `:html_atoms` uses atoms for known html tags (faster), binaries for everything else.
* `:nil_self_closing` uses `nil` to designate self-closing tags and void elements.
For example `<br>` is then being represented like `{"br", [], nil}`.
See http://w3c.github.io/html-reference/syntax.html#void-elements for a full list of void elements.
* `:comment_tuple3` uses 3-tuple elements for comments, instead of the default 2-tuple element.
## Examples
iex> :fast_html.decode("<h1>Hello world</h1>", format: [:html_atoms])
{:ok, {:html, [], [{:head, [], []}, {:body, [], [{:h1, [], ["Hello world"]}]}]}}
......@@ -96,7 +69,9 @@ defmodule :fast_html do
"""
@spec decode(String.t(), format: [format_flag()]) ::
{:ok, tree()} | {:error, String.t() | atom()}
def decode(bin, format: flags) do
FastHtml.Cnode.call({:decode, bin, flags})
def decode(bin, opts \\ []) do
flags = Keyword.get(opts, :format, [])
timeout = Keyword.get(opts, :timeout, 10000)
FastHtml.Cnode.call({:decode, bin, flags}, timeout)
end
end
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment