Requests
Fetching remote data, then parsing or querying it
The web is all about fetching data from remote sources. Fortunately, SEL provides an event type to make these requests that support common data formats.
Request
A Request
fetches data from a remote location (URL). If we want, for example, to pull in the contents of a published Google Sheet, we can make a GET
or POST
request like so:
=Request('get', 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSRSc0h4ZvFGaY8ZmCSRHjc5nGE80tNKKvPwyAgivd35eprIS/pub?gid=944422473&single=true&output=csv')
This event will fetch the contents of this URL and store them in the event. The value of the event in the results table and log will be set to the status code of the response. All responses less than 400
are considered successful.
We can also pass common arguments params
, cookies
, json
, and headers
to Request
. Let's look at an example in context:
#!/usesummit/sel/0.1a
# Ping wayback machine to get page closest to now, scrape it, and clean it.
"url_now": =Object({"params": {"url": "glideapps.com"<customer_url>}})
"wayback_machine_today": =Request('get', "http://archive.org/wayback/available")
In this example, we create an Object to store the parameters that need to be passed to the Request
.
API key security
Many endpoints and API's require the use of keys, usernames, passwords, and/or tokens, in order to authenticate. To use these in SEL while keeping them secure, you should place them in your account Vault. This allows you to refer to them in your SEL code using liquid syntax, e.g.
{{ MY_VENDOR_API_KEY }}
instead of inserting them as text. This keeps these keys encrypted, out of code, and out of any database.
Since a lot of endpoints (like the Wayback Machine itself) don't like being barraged by requests, we can set an additional configuration option sleep
on the object we pass in to the request:
# Let's be nice and honor their desire to not get hit with a lot of requests.
"url_now": =Object({"params": {"url": "glideapps.com"<customer_url>}, "sleep": 2}})
"wayback_machine_today": =Request('get', "http://archive.org/wayback/available")
This will cause the model to pause (sleep) for 2
seconds before making the request. This helps to prevent response codes like 429
(Too Many Requests).
Example:
# Describe configuration.
"request_config": =Object({"sleep": 2, "timeout": 10, "cache_duration": 3600, "params": {...}})
# Define the request.
"my_request": =Request("get", "https://api.vendor.com/v1/docs")
# Pass the config to the request.
"request_config" -> "my_request"
Screenshots
If you'd like to take a screenshot of a webpage, use img
as the method:
=Request('img', 'https://stripe.com')
This will return a base64-encoded image/png
of the full contents of the web page specified by the second argument. This can be passed to a vision-aware AI like gpt-4o
or POST'd to a service that allows you to upload images.
Web Scraping
The request event can be used as a web scraper by defining scrape
rules (see below) in an Object that you pass to your request. The general pattern is a string defining where you'd like to store the content followed by a CSS selector or XPath to the element.
"request_options": =Object("scrape": {
"title" : "h1",
"subtitle" : "#subtitle",
})
"request": =Request("get", "https://stripe.com")
"request_options" -> "request"
This will extract the h1
text and return it under a "title" attribute, and the text contained inside a DOM element whose id
attribute is subtitle
and return it in an attribute called "subtitle."
Request configuration options
A full list of options that can be sent to Request
using a preceding Object
. The Option in the table is the object key.
Option | Purpose | Value | Default |
---|---|---|---|
params | Querystring arguments. | A dictionary of key-value pairs. | None |
headers | Request headers. | A dictionary of key-value pairs. | None |
json | JSON body arguments. | A dictionary of key-value pairs. | None |
cookies | Cookies to include. | A dictionary of key-value pairs. | None |
sleep | Place a pause before making the request. | Any number or decimal (seconds) | 0 |
timeout | Tell the request how long to wait before considering the request a failure. | Any number of seconds. | 10 |
cache_duration | Choose how long you'd like Summit to cache the response. Useful for higher performance when the underlying data doesn't change very often, or when you want to limit your unique requests to the endpoint. | Any number of seconds. | 60 |
scrape | A set of rules to extract HTML nodes (DOM entities) from a web page. | A dictionary of key-value pairs. | None |
Formats, File Size, and Timeout Limitations
Currently, JSON, CSV, and HTML are the only data formats (or content types) that can be retrieved by a
Request
. Other formats will be rejected and the output of the event will be set to empty.Both JSON and CSV responses will be converted to a JSON format on receipt to enable parsing using a
Parser
event. CSV files are automatically turned into a list of dictionaries (objects with key-value pairs) where the column header is the key.SEL also imposes a 5 MB limit on the size of the content that can be retrieved.
Proxy
The SEL editor hosted at usesummit.com provides the option to connect to remote services such as HubSpot using integrations based on oauth. In short, you can sign in to these remote services from Summit and use those authenticated connections in your SEL models to send and receive data.
To implement one of these authenticated connections in SEL, we use a Proxy
event:
=Proxy('hubspot', ["get", "/crm/v3/objects/contacts"])
The second two arguments define the method and API endpoint of the remote service you'd like to access. You should always omit the protocol (ex. https://
) and host (ex. api.hubspot.com
).
Since Summit already possess an authenticated connection, you do not need to provide this proxy event with an Authentication
header. However, you almost always still need to use an upstream Object
event to populate other request arguments, such as the json
body or query parameters.
Here is an example of a proxy in action using .sel
notation:
# Define the input variable for the HS record ID to update.
"hs_record_id": =String("123456789"<hs_id>)
# Set the data that will be sent to HubSpot.
"request_config": =Object({
"data": {
"properties": {
"icebreaker_email": "{{ extract_result | json_safe }}"
}
}
})
# Create a request to HS using our authenticated connection.
"hubspot_conn": =Proxy("hubspot", ["PATCH", "/crm/v3/objects/companies/{{ hs_record_id }}"])
# Show the results.
"hs_response": =Response("company_data")
# Pass the record_id and data to the Proxy request, then output the results.
"hs_record_id" -> "request_config" -> "hubspot_conn" -> "hs_response"
Parser
If you want to use the response of your request, you'll need to create a Parser
event.
These events take a JSONPath argument that defines a search to retrieve the value you want from the response data, like so:
=Parser('$.pricing.express_overnight')
The parser event itself will extract and pass along the values it finds using your expression as a list of values, but the value will not be stored in the parser event (it is not a container).
To operate on the list output of Parser
, you can use a Transform to, for example, sum the list, a Pool to store the list, or Matches to search the list.
JSONPath resources
SEL uses the
jsonpath-ng
Python module to search your data using this expression. You can read more about this module here: https://pypi.org/project/jsonpath-ng/Writing a complex JSONPath expression can be challenging. In addition to asking ChatGPT, this site is a handy way to test a JSONPath expression against a JSON object: https://jsonpath.com/.
Tables & Queries
Our data tables and queries docs have moved.
Check out Tables & Queries!
Updated 12 days ago