Ctrl K

Extract Text from HTML using a CSS Selector

Published May 26, 2023

This script is a simple text extractor from a valid HTML string. You should specify a css selector, such as ".content", "#id", etc. You will need to obtain the HTML text through another integration, e.g. https://hub.windmill.dev/scripts/http/442/send-get-request-http

Script http Verified

Use in Windmill

The script

Submitted by marco lussetti774 Python3

Verified 1159 days ago

All edits

Permalink

# import wmill
from bs4 import BeautifulSoup


def main(html: str, css_selector: str = ""):
    if not css_selector:
        return {"text": html}
    else:
        soup = BeautifulSoup(html, "html.parser")
        matches = [el.get_text() for el in soup.select(css_selector)]
        return {"text": "\n\n".join(matches), "matches": len(matches)}


`1`	`# import wmill`
`2`	`from bs4 import BeautifulSoup`
`3`
`4`
`5`	`def main(html: str, css_selector: str = ""):`
`6`	`if not css_selector:`
`7`	`return {"text": html}`
`8`	`else:`
`9`	`soup = BeautifulSoup(html, "html.parser")`
`10`	`matches = [el.get_text() for el in soup.select(css_selector)]`
`11`	`return {"text": "\n\n".join(matches), "matches": len(matches)}`
`12`