Popularity
9.1
Stable
Activity
0.0
Stable
151
7
11
Programming language: Crystal
License: MIT License
Tags:
Low Level Bindings
Latest version: v1.5.5
myhtml alternatives and similar shards
Based on the "Low level bindings" category.
Alternatively, view myhtml alternatives based on common mentions on social networks and blogs.
-
termbox-crystal
Bindings, wrapper, and utilities for termbox (terminal interface library) in Crystal -
wkhtmltopdf-crystal
Crystal C bindings and wrapper for libwkhtmltox library -
serialport.cr
Crystal bindings for libserialport: cross-platform library for accessing serial ports. -
crystal-liblmdb
Crystal language bindings for the Symas LMDB database
Collect and Analyze Billions of Data Points in Real Time
Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
Promo
www.influxdata.com
Do you think we are missing an alternative of myhtml or a related project?
README
MyHTML
Fast HTML5 Parser (Crystal binding for awesome lexborisov's myhtml and Modest). This shard used in production to parse millions of pages per day, very stable and fast.
Installation
Add this to your application's shard.yml
:
dependencies:
myhtml:
github: kostya/myhtml
And run shards install
Usage example
require "myhtml"
html = <<-HTML
<html>
<body>
<div id="t1" class="red">
<a href="/#">O_o</a>
</div>
<div id="t2"></div>
</body>
</html>
HTML
myhtml = Myhtml::Parser.new(html)
myhtml.nodes(:div).each do |node|
id = node.attribute_by("id")
if first_link = node.scope.nodes(:a).first?
href = first_link.attribute_by("href")
link_text = first_link.inner_text
puts "div with id #{id} have link [#{link_text}](#{href})"
else
puts "div with id #{id} have no links"
end
end
# Output:
# div with id t1 have link [O_o](/#)
# div with id t2 have no links
Css selectors example
require "myhtml"
html = <<-HTML
<html>
<body>
<table id="t1">
<tr><td>Hello</td></tr>
</table>
<table id="t2">
<tr><td>123</td><td>other</td></tr>
<tr><td>foo</td><td>columns</td></tr>
<tr><td>bar</td><td>are</td></tr>
<tr><td>xyz</td><td>ignored</td></tr>
</table>
</body>
</html>
HTML
myhtml = Myhtml::Parser.new(html)
p myhtml.css("#t2 tr td:first-child").map(&.inner_text).to_a
# => ["123", "foo", "bar", "xyz"]
p myhtml.css("#t2 tr td:first-child").map(&.to_html).to_a
# => ["<td>123</td>", "<td>foo</td>", "<td>bar</td>", "<td>xyz</td>"]
More Examples
Development Setup:
git clone https://github.com/kostya/myhtml.git
cd myhtml
make
crystal spec
Benchmark
Parse 1000 times google page, and 1000 times css select. myhtml-program, crystagiri-program, nokogiri-program
Lang | Shard | Lib | Parse time, s | Css time, s | Memory, MiB |
---|---|---|---|---|---|
Crystal | lexbor | lexbor | 2.39 | - | 7.7 |
Crystal | myhtml | myhtml(+modest) | 2.70 | 0.22 | 8.3 |
Crystal | Crystagiri | libxml2 | 8.02 | 8.59 | 75.4 |
Crystal | Gumbo | Gumbo | 18.18 | - | 2140.7 |
Ruby 2.7 | Nokogiri | libxml2 | 20.15 | 23.02 | 132.8 |