Requests

Fetching remote data, then parsing or querying it

The web is all about fetching data from remote sources. Fortunately, SEL provides an event type to make these requests that support common data formats.

Request

A Request fetches data from a remote location (URL). If we want, for example, to pull in the contents of a published Google Sheet, we can make a GET or POST request like so:

=Request('get', 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSRSc0h4ZvFGaY8ZmCSRHjc5nGE80tNKKvPwyAgivd35eprIS/pub?gid=944422473&single=true&output=csv')

This event will fetch the contents of this URL and store them in the event. The value of the event in the results table and log will be set to the status code of the response. All responses less than 400 are considered successful.

We can also pass common arguments params, cookies, json, and headers to Request. Let's look at an example in context:

#!/usesummit/sel/0.1a

# Ping wayback machine to get page closest to now, scrape it, and clean it.
"url_now": =Object({"params": {"url": "glideapps.com"<customer_url>}})

"wayback_machine_today": =Request('get', "http://archive.org/wayback/available")

In this example, we create an Object to store the parameters that need to be passed to the Request.

❗️

API key security

Many endpoints and API's require the use of keys, usernames, passwords, and/or tokens, in order to authenticate. To use these in SEL while keeping them secure, you should place them in your account Vault. This allows you to refer to them in your SEL code using liquid syntax, e.g. {{ MY_VENDOR_API_KEY }} instead of inserting them as text. This keeps these keys encrypted, out of code, and out of any database.

Since a lot of endpoints (like the Wayback Machine itself) don't like being barraged by requests, we can set an additional configuration option sleep on the object we pass in to the request:

# Let's be nice and honor their desire to not get hit with a lot of requests.
"url_now": =Object({"params": {"url": "glideapps.com"<customer_url>}, "sleep": 2}})

"wayback_machine_today": =Request('get', "http://archive.org/wayback/available")

This will cause the model to pause (sleep) for 2 seconds before making the request. This helps to prevent response codes like 429 (Too Many Requests).

Example:

# Describe configuration.
"request_config": =Object({"sleep": 2, "timeout": 10, "cache_duration": 3600, "params": {...}})

# Define the request.
"my_request": =Request("get", "https://api.vendor.com/v1/docs")

# Pass the config to the request.
"request_config" -> "my_request"

Screenshots

If you'd like to take a screenshot of a webpage, use img as the method:

=Request('img', 'https://stripe.com')

This will return a base64-encoded image/png of the full contents of the web page specified by the second argument. This can be passed to a vision-aware AI like gpt-4o or POST'd to a service that allows you to upload images.

Web Scraping

The request event can be used as a web scraper by defining scrape rules (see below) in an Object that you pass to your request. The general pattern is a string defining where you'd like to store the content followed by a CSS selector or XPath to the element.

"request_options": =Object("scrape": {
    "title" : "h1",
    "subtitle" : "#subtitle",
})

"request": =Request("get", "https://stripe.com")

"request_options" -> "request"

This will extract the h1 text and return it under a "title" attribute, and the text contained inside a DOM element whose id attribute is subtitle and return it in an attribute called "subtitle."

Request configuration options

A full list of options that can be sent to Request using a preceding Object. The Option in the table is the object key.

OptionPurposeValueDefault
paramsQuerystring arguments.A dictionary of key-value pairs.None
headersRequest headers.A dictionary of key-value pairs.None
jsonJSON body arguments.A dictionary of key-value pairs.None
cookiesCookies to include.A dictionary of key-value pairs.None
sleepPlace a pause before making the request.Any number or decimal (seconds)0
timeoutTell the request how long to wait before considering the request a failure.Any number of seconds.10
cache_durationChoose how long you'd like Summit to cache the response. Useful for higher performance when the underlying data doesn't change very often, or when you want to limit your unique requests to the endpoint.Any number of seconds.60
scrapeA set of rules to extract HTML nodes (DOM entities) from a web page.A dictionary of key-value pairs.None

📘

Formats, File Size, and Timeout Limitations

Currently, JSON, CSV, and HTML are the only data formats (or content types) that can be retrieved by a Request. Other formats will be rejected and the output of the event will be set to empty.

Both JSON and CSV responses will be converted to a JSON format on receipt to enable parsing using a Parser event. CSV files are automatically turned into a list of dictionaries (objects with key-value pairs) where the column header is the key.

SEL also imposes a 5 MB limit on the size of the content that can be retrieved.

Parser

If you want to use the response of your request, you'll need to create a Parser event.

These events take a JSONPath argument that defines a search to retrieve the value you want from the response data, like so:

=Parser('$.pricing.express_overnight')

The parser event itself will extract and pass along the values it finds using your expression as a list of values, but the value will not be stored in the parser event (it is not a container).

To operate on the list output of Parser , you can use a Transform to, for example, sum the list, a Pool to store the list, or Matches to search the list.

📘

JSONPath resources

SEL uses the jsonpath-ng Python module to search your data using this expression. You can read more about this module here: https://pypi.org/project/jsonpath-ng/

Writing a complex JSONPath expression can be challenging. In addition to asking ChatGPT, this site is a handy way to test a JSONPath expression against a JSON object: https://jsonpath.com/.

Table

A lot of business processes involve the usage of data stored in a remotely-accessible CSV file. These files are often "published" spreadsheets (using a service like Google Sheets or Microsoft Excel) or even a read-only database view or "clip."

When you want to pull this data into SEL, you can use Table to fetch and structure the data, and Query to run ad hoc queries on it.

To pull a remote CSV file accessible by URL into a Summit, you can use:

=Table('https://docs.google.com/...'<csv_url>)

This will spin up a small database inside your model and store the table uniquely inside of it. In other words, you are welcome to use this event in multiple places, and the data will be stored in different tables:

=Table('https://data.heroku.com/clips/...'<data_url>)

In the SQL of downstream Query events, you will refer to this table as simple $table. For example: SELECT count(*) FROM $table.

You may also wish to pass Table a list of dictionaries, such as the output of a Parser event. You can do this by leaving the Table argument empty:

=Table()

In either case, you may provide an optional argument for the name of the table:

=Table("contacts")

Or

=Table("https://sheets.google.com/...", "contacts")

This will load the input data (or the data from the remote CSV) into a table in your model called contacts.

This can be useful for writing a query in an event that is not directly downstream of the table.

Query

Once you have a data file stored inside a Table, you'll most likely want to run a query on it. In SEL, we can do this by passing SQL to an event like so:

=Query("SELECT min(price) FROM $table WHERE status = 'ACTIVE'")

If we point a Table at this event, we will see a value in the results table and ledger that answers this query.

If a single value is returned by a query, the value of the event will be that return value. Otherwise, the value of the event will be the number of rows returned by the query. Therefore, the above query will return the minimum price stored for ACTIVE products in the table.

SELECT price FROM $table WHERE status = 'ACTIVE' will, on the other hand, return the number of rows in the table.

To do something with the price values returned by the above query, like sum them, find the average, or max, you should use the transform event which can take a list of values and reduce it to a single result.

If the Table you are querying was given a name when it was defined, you should refer to it using its name, for example:

=Query("SELECT * FROM members WHERE age >= 65")

An example using Table, Query and Response Data.

#!/usesummit/sel/0.1a

"csv_table": =Table("https://docs.google.com/spreadsheets/d/e/2PACX-1vRg7CfZbL3V_h4fNzgl8KqTO7JKSuJ1NqJ03C3i4P6eK12jSDbOGPvTBPhzJMoQmkVEBLYUl2r5v5Sf/pub?gid=0&single=true&output=csv")

"csv_rows": =Query("SELECT * FROM $table LIMIT 2")

"my_csv_response": =Response("my_csv_data")

"csv_table" -> "csv_rows" -> "my_csv_response"

Multiline SQL

You can also write multi-line SQL queries by wrapping your query in """:

#!/usesummit/sel/0.1a

"csv_table": =Table("https://docs.google.com/spreadsheets/d/e/2PACX-1vRg7CfZbL3V_h4fNzgl8KqTO7JKSuJ1NqJ03C3i4P6eK12jSDbOGPvTBPhzJMoQmkVEBLYUl2r5v5Sf/pub?gid=0&single=true&output=csv")

"csv_rows": =Query("""
	SELECT * FROM $table LIMIT 2
""")

"my_csv_response": =Response("my_csv_data")

"csv_table" -> "csv_rows" -> "my_csv_response"

📘

Summit SQL == SQLite

Summit uses in-memory SQLite databases for Table, so all of your Query syntax should observe what's included in the SQLite dialect.

Learn more about SQLite's syntax and function library.