Requests

Fetching remote data, then parsing or querying it

The web is all about fetching data from remote sources. Fortunately, SEL provides an event type to make these requests that support common data formats.

Request

A Request fetches data from a remote location (URL). If we want, for example, to pull in the contents of a published Google Sheet, we can make a GET or POST request like so:

=Request('get', 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSRSc0h4ZvFGaY8ZmCSRHjc5nGE80tNKKvPwyAgivd35eprIS/pub?gid=944422473&single=true&output=csv')

This event will fetch the contents of this URL and store them in the event. The value of the event in the results table and log will be set to the status code of the response. All responses less than 400 are considered successful.

We can also pass common arguments params, cookies, json, and headers to Request. Let's look at an example in context:

#!/usesummit/sel/0.1a

# Ping wayback machine to get page closest to now, scrape it, and clean it.
"url_now": =Object({"params": {"url": "glideapps.com"<customer_url>}})

"wayback_machine_today": =Request('get', "http://archive.org/wayback/available")

In this example, we create an Object to store the parameters that need to be passed to the Request.

❗️

API key security

Many endpoints and API's require the use of keys, usernames, passwords, and/or tokens, in order to authenticate. To use these in SEL while keeping them secure, you should place them in your account Vault. This allows you to refer to them in your SEL code using liquid syntax, e.g. {{ MY_VENDOR_API_KEY }} instead of inserting them as text. This keeps these keys encrypted, out of code, and out of any database.

Since a lot of endpoints (like the Wayback Machine itself) don't like being barraged by requests, we can set an additional configuration option sleep on the object we pass in to the request:

# Let's be nice and honor their desire to not get hit with a lot of requests.
"url_now": =Object({"params": {"url": "glideapps.com"<customer_url>}, "sleep": 2}})

"wayback_machine_today": =Request('get', "http://archive.org/wayback/available")

This will cause the model to pause (sleep) for 2 seconds before making the request. This helps to prevent response codes like 429 (Too Many Requests).

Example:

# Describe configuration.
"request_config": =Object({"sleep": 2, "timeout": 10, "cache_duration": 3600, "params": {...}})

# Define the request.
"my_request": =Request("get", "https://api.vendor.com/v1/docs")

# Pass the config to the request.
"request_config" -> "my_request"

Screenshots

If you'd like to take a screenshot of a webpage, use img as the method:

=Request('img', 'https://stripe.com')

This will return a base64-encoded image/png of the full contents of the web page specified by the second argument. This can be passed to a vision-aware AI like gpt-4o or POST'd to a service that allows you to upload images.

Web Scraping

The request event can be used as a web scraper by defining scrape rules (see below) in an Object that you pass to your request. The general pattern is a string defining where you'd like to store the content followed by a CSS selector or XPath to the element.

"request_options": =Object("scrape": {
    "title" : "h1",
    "subtitle" : "#subtitle",
})

"request": =Request("get", "https://stripe.com")

"request_options" -> "request"

This will extract the h1 text and return it under a "title" attribute, and the text contained inside a DOM element whose id attribute is subtitle and return it in an attribute called "subtitle."

Request configuration options

A full list of options that can be sent to Request using a preceding Object. The Option in the table is the object key.

OptionPurposeValueDefault
paramsQuerystring arguments.A dictionary of key-value pairs.None
headersRequest headers.A dictionary of key-value pairs.None
jsonJSON body arguments.A dictionary of key-value pairs.None
cookiesCookies to include.A dictionary of key-value pairs.None
sleepPlace a pause before making the request.Any number or decimal (seconds)0
timeoutTell the request how long to wait before considering the request a failure.Any number of seconds.10
cache_durationChoose how long you'd like Summit to cache the response. Useful for higher performance when the underlying data doesn't change very often, or when you want to limit your unique requests to the endpoint.Any number of seconds.60
scrapeA set of rules to extract HTML nodes (DOM entities) from a web page.A dictionary of key-value pairs.None

📘

Formats, File Size, and Timeout Limitations

Currently, JSON, CSV, and HTML are the only data formats (or content types) that can be retrieved by a Request. Other formats will be rejected and the output of the event will be set to empty.

Both JSON and CSV responses will be converted to a JSON format on receipt to enable parsing using a Parser event. CSV files are automatically turned into a list of dictionaries (objects with key-value pairs) where the column header is the key.

SEL also imposes a 5 MB limit on the size of the content that can be retrieved.

Proxy

The SEL editor hosted at usesummit.com provides the option to connect to remote services such as HubSpot using integrations based on oauth. In short, you can sign in to these remote services from Summit and use those authenticated connections in your SEL models to send and receive data.

To implement one of these authenticated connections in SEL, we use a Proxy event:

=Proxy('hubspot', ["get", "/crm/v3/objects/contacts"])

The second two arguments define the method and API endpoint of the remote service you'd like to access. You should always omit the protocol (ex. https://) and host (ex. api.hubspot.com).

Since Summit already possess an authenticated connection, you do not need to provide this proxy event with an Authentication header. However, you almost always still need to use an upstream Object event to populate other request arguments, such as the json body or query parameters.

Here is an example of a proxy in action using .sel notation:

# Define the input variable for the HS record ID to update.
"hs_record_id": =String("123456789"<hs_id>)

# Set the data that will be sent to HubSpot.
"request_config": =Object({
  "data": {
    "properties": {
      "icebreaker_email": "{{ extract_result | json_safe }}"
    }
  }
})

# Create a request to HS using our authenticated connection.
"hubspot_conn": =Proxy("hubspot", ["PATCH", "/crm/v3/objects/companies/{{ hs_record_id }}"])

# Show the results.
"hs_response": =Response("company_data")

# Pass the record_id and data to the Proxy request, then output the results.
"hs_record_id" -> "request_config" -> "hubspot_conn" -> "hs_response"

Parser

If you want to use the response of your request, you'll need to create a Parser event.

These events take a JSONPath argument that defines a search to retrieve the value you want from the response data, like so:

=Parser('$.pricing.express_overnight')

The parser event itself will extract and pass along the values it finds using your expression as a list of values, but the value will not be stored in the parser event (it is not a container).

To operate on the list output of Parser , you can use a Transform to, for example, sum the list, a Pool to store the list, or Matches to search the list.

📘

JSONPath resources

SEL uses the jsonpath-ng Python module to search your data using this expression. You can read more about this module here: https://pypi.org/project/jsonpath-ng/

Writing a complex JSONPath expression can be challenging. In addition to asking ChatGPT, this site is a handy way to test a JSONPath expression against a JSON object: https://jsonpath.com/.

Tables & Queries

👍

Our data tables and queries docs have moved.

Check out Tables & Queries!