Extract Text from HTML using a CSS Selector

This script is a simple text extractor from a valid HTML string. You should specify a css selector, such as ".content", "#id", etc. You will need to obtain the HTML text through another integration, e.g. https://hub.windmill.dev/scripts/http/442/send-get-request-http

Script http Verified

by marco lussetti774 ยท 5/26/2023

The script

Submitted by marco lussetti774 Python3
Verified 1098 days ago
1
# import wmill
2
from bs4 import BeautifulSoup
3

4

5
def main(html: str, css_selector: str = ""):
6
    if not css_selector:
7
        return {"text": html}
8
    else:
9
        soup = BeautifulSoup(html, "html.parser")
10
        matches = [el.get_text() for el in soup.select(css_selector)]
11
        return {"text": "\n\n".join(matches), "matches": len(matches)}
12