Requests

Fetching remote data, then parsing or querying it

The web is all about fetching data from remote sources. Fortunately, SEL provides an event type to make these requests that support common data formats.

Request

A Request fetches data from a remote location (URL). If we want, for example, to pull in the contents of a published Google Sheet, we can make a GET or POST request like so:

=Request('get', 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSRSc0h4ZvFGaY8ZmCSRHjc5nGE80tNKKvPwyAgivd35eprIS/pub?gid=944422473&single=true&output=csv')

This event will fetch the contents of this URL and store them in the event. The value of the event in the results table and log will be set to the status code of the response. All responses less than 400 are considered successful.

Let's look at an example in context:

#!/usesummit/sel/0.1a

# Ping wayback machine to get page closest to now, scrape it, and clean it.
"url_now": =Object({"params": {"url": "glideapps.com"<customer_url>}})

"wayback_machine_today": =Request('get', "http://archive.org/wayback/available")

"json_extract": =Parser('$..closest')

In this example, we create an Object to store the parameters that need to be passed to the Request.

The Request points to a Parser (see below) to process the response data.

πŸ“˜

Formats, File Size, and Timeout Limitations

Currently, JSON, CSV, and HTML are the only data formats (or content types) that can be retrieved by a Request. Other formats will be rejected and the output of the event will be set to empty.

Both JSON and CSV responses will be converted to a JSON format on receipt to enable parsing using a Parser event. CSV files are automatically turned into a list of dictionaries (objects with key-value pairs) where the column header is the key.

SEL also imposes a 5 MB limit on the size of the content that can be retrieved.

A request will automatically timeout if it takes longer than 10 seconds!

Parser

If you want to use the response of your request, you'll need to create a Parser event.

These events take a JSONPath argument that defines a search to retrieve the value you want from the response data, like so:

=Parser('$.pricing.express_overnight')

The parser event itself will extract and pass along the values it finds using your expression as a list of values, but the value will not be stored in the parser event (it is not a container).

To operate on the list output of Parser , you can use a Transform to, for example, sum the list, a Pool to store the list, or Matches to search the list.

πŸ“˜

JSONPath resources

SEL uses the jsonpath-ng Python module to search your data using this expression. You can read more about this module here: https://pypi.org/project/jsonpath-ng/

Writing a complex JSONPath expression can be challenging. In addition to asking ChatGPT, this site is a handy way to test a JSONPath expression against a JSON object: https://jsonpath.com/.

Table

A lot of business processes involve the usage of data stored in a remotely-accessible CSV file. These files are often "published" spreadsheets (using a service like Google Sheets or Microsoft Excel) or even a read-only database view or "clip."

When you want to pull this data into SEL, you can use Table to fetch and structure the data, and Query to run ad hoc queries on it.

To pull a remote CSV file accessible by URL into a Summit, you can use:

=Table('https://docs.google.com/...'<csv_url>)

This will spin up a small database inside your model and store the table uniquely inside of it. In other words, you are welcome to use this event in multiple places, and the data will be stored in different tables:

=Table('https://data.heroku.com/clips/...'<data_url>)

In the SQL of downstream Query events, you will refer to this table as simple $table. For example: SELECT count(*) FROM $table.

You may also wish to pass Table a list of dictionaries, such as the output of a Parser event. You can do this by leaving the Table argument empty:

=Table()

In either case, you may provide an optional argument for the name of the table:

=Table("contacts")

Or

=Table("https://sheets.google.com/...", "contacts")

This will load the input data (or the data from the remote CSV) into a table in your model called contacts.

This can be useful for writing a query in an event that is not directly downstream of the table.

Query

Once you have a data file stored inside a Table, you'll most likely want to run a query on it. In SEL, we can do this by passing SQL to an event like so:

=Query("SELECT min(price) FROM $table WHERE status = 'ACTIVE'")

If we point a Table at this event, we will see a value in the results table and ledger that answers this query.

If a single value is returned by a query, the value of the event will be that return value. Otherwise, the value of the event will be the number of rows returned by the query. Therefore, the above query will return the minimum price stored for ACTIVE products in the table.

SELECT price FROM $table WHERE status = 'ACTIVE' will, on the other hand, return the number of rows in the table.

To do something with the price values returned by the above query, like sum them, find the average, or max, you should use the transform event which can take a list of values and reduce it to a single result.

🚧

Semi-colons need not apply.

It's very common to end SQL statements with ;. Don't do that in SEL or when using the Query event.

If the Table you are querying was given a name when it was defined, you should refer to it using its name, for example:

=Query("SELECT * FROM members WHERE age >= 65")

An example using Table, Query and Response Data.

#!/usesummit/sel/0.1a

"csv_table": =Table("https://docs.google.com/spreadsheets/d/e/2PACX-1vRg7CfZbL3V_h4fNzgl8KqTO7JKSuJ1NqJ03C3i4P6eK12jSDbOGPvTBPhzJMoQmkVEBLYUl2r5v5Sf/pub?gid=0&single=true&output=csv")

"csv_rows": =Query("SELECT * FROM $table LIMIT 2")

"my_csv_response": =Response("my_csv_data")

"csv_table" -> "csv_rows" -> "my_csv_response"