`core.html`¶

Module Contents¶

Functions¶

`sanitize_html`(→ str)	Takes the given html and strips all but a whitelisted number of tags
`sanitize_svg`(→ str)	I couldn't find a good svg sanitiser function yet, so for now
`html_to_text`(→ str)	Takes the given HTML text and extracts the text from it.

Attributes¶

`SANE_HTML_TAGS`
`SANE_HTML_ATTRS`
`VALID_PLAINTEXT_CHARACTERS`
`EMPTY_LINK`
`cleaner`

core.html.SANE_HTML_TAGS = ['a', 'abbr', 'b', 'br', 'blockquote', 'code', 'del', 'div', 'em', 'i', 'img', 'hr', 'li', 'ol',...[source]¶

core.html.SANE_HTML_ATTRS[source]¶

core.html.VALID_PLAINTEXT_CHARACTERS[source]¶

core.html.EMPTY_LINK[source]¶

core.html.cleaner[source]¶

core.html.sanitize_html(html: str | None) → str[source]¶: Takes the given html and strips all but a whitelisted number of tags from it.

core.html.sanitize_svg(svg: str) → str[source]¶

I couldn’t find a good svg sanitiser function yet, so for now this function will be a no-op, though it will try to detect svg files which are harmful.

I tried to go with bleach/html5lib, but the lack of xml namespace support makes those options a no go.

In the future we want a proper SVG sanitiser here!

core.html.html_to_text(html: str, *, unicode_snob: bool = True, body_width: int = 0, ignore_images: bool = True, single_line_break: bool = True, **config: Any) → str[source]¶

Takes the given HTML text and extracts the text from it.

The result is markdown. The driver behind it is html2text. Have a look at https://github.com/Alir3z4/html2text/blob/master/html2text/__init__.py to see all options.

core.html¶

Module Contents¶

Functions¶

Attributes¶

`core.html`¶