0
Extract Text from HTML using a CSS Selector
One script reply has been approved by the moderators Verified

This script is a simple text extractor from a valid HTML string. You should specify a css selector, such as ".content", "#id", etc.

You will need to obtain the HTML text through another integration, e.g. https://hub.windmill.dev/scripts/http/442/send-get-request-http

Created by marco lussetti774 547 days ago Viewed 9719 times
0
Submitted by marco lussetti774 Python3
Verified 547 days ago
1
# import wmill
2
from bs4 import BeautifulSoup
3

4

5
def main(html: str, css_selector: str = ""):
6
    if not css_selector:
7
        return {"text": html}
8
    else:
9
        soup = BeautifulSoup(html, "html.parser")
10
        matches = [el.get_text() for el in soup.select(css_selector)]
11
        return {"text": "\n\n".join(matches), "matches": len(matches)}
12