Connectors
Audiences
Syncs
Resources
📓 API Docs
Octolis uses Nunjucks, the Javascript implementation of Jinja, a widely used templating language.
This means that all the templating syntax from Jinja is available inside of Octolis SQL Audiences to generate powerful queries.
e.g. if/else conditional syntax, for loops...
Octolis also extends this templating syntax with some functions often inspired by DBT (which also uses Jinja).
Especially, please note the {{ ref("Source name") }}
syntax that should be used instead of directly referring to the Source table name.
This way, users don't have to know the actual table name of the Source to write queries.
$scope
variable of a SQL Audience execution$scope='standalone'
:
The SQL query runs in the context of either an entireAudience
or a sql
execution.
All data from all Sources is fetched, and the SQL query runs againts them.
All you need to do is using the usual SELECT ... FROM {{ ref("Source name") }}
syntax.
$scope='increment'
:
The SQL query runs in the context of a connectionSource
, syncSource
, or webhookSource
execution.
The incremental data of only one Source enters the SQL Audience.
Here it becomes interesting as this is THE way of adapting/improving your query.
Why run a query against all data of all Sources when you know there are only a few new incremental changes?
$origin_name
variable and {{ ref($origin) }}
table of a SQL Audience execution$origin_name
is a variable equal to name of the Source that emitted the incremental changes.{{ ref($origin) }}
is the table containing the incremental changes.<aside>
⚠️ This is only applicable when $scope='increment'
</aside>
Let's say you compute some orders aggregates in a SQL Audience. A usual query would look like the following:
SELECT orders.contact_masterid,
orders.total_amount,
orders.__modified_at__ as modified_at
FROM {{ ref("Orders source") }} orders
GROUP BY contact_masterid
Whatever the kind of execution, this will always work, as it always fetches all data from the orders Source and computes aggregates based on it.
Now let's take a step back. This is not efficient at all in the case of a connectionSource
execution.
Instead of only processing the incremental changes, it will work against all the Source data.
Let's adapt the query:
SELECT orders.contact_masterid,
orders.total_amount,
orders.__modified_at__ as modified_at
{% if $scope === "increment" %}
FROM {{ ref($origin) }}
{% else %}
FROM {{ ref("Orders source") }}
{% endif %}
GROUP BY contact_masterid
Which is of course equivalent to:
SELECT orders.contact_masterid,
orders.total_amount,
orders.__modified_at__ as modified_at
FROM {% if $scope === "increment" %}{{ ref($origin) }}{% else %}{{ ref("Orders source") }}{% endif %}
GROUP BY contact_masterid
This is already way better.
This simple syntax only works because there is only one source, otherwise, you would have to use:
{% if $scope === "increment" and $origin_name === 'Contacts source' %}
instead of only {% if $scope === "increment" %}
.
Now let's say you have two contacts and order Sources that can trigger your SQL Audience.
A basic query would look like the following:
SELECT contacts.__masterid__::text as contact_masterid,
contacts.first_name,
contacts.last_name,
sum(orders.total_amount) as ltv,
greatest(contacts.__modified_at__, max(orders.__modified_at__)) as modified_at
FROM {{ ref('Contacts source') }} contacts
LEFT JOIN {{ ref('Orders source') }} orders
ON contacts.__masterid__::text = orders.contact_masterid;
Now let's optimize it so that we don't compute aggregates for all contacts again when there are only a few incremental changes coming from the contacts Source:
SELECT contacts.__masterid__::text as contact_masterid,
contacts.first_name,
contacts.last_name,
sum(orders.total_amount) as ltv,
greatest(contacts.__modified_at__, max(orders.__modified_at__)) as modified_at
{% if $scope === "increment" and $origin_name === 'Contacts source' %}
FROM {{ ref($origin) }} contacts
{% else %}
FROM {{ ref("Orders source") }}
{% endif %}
LEFT JOIN {{ ref('Orders source') }} orders
ON contacts.__masterid__::text = orders.contact_masterid;
Now let's optimize it also for the case when there are incremental changes coming from the orders Source:
SELECT contacts.__masterid__::text as contact_masterid,
contacts.first_name,
contacts.last_name,
sum(orders.total_amount) as ltv,
greatest(contacts.__modified_at__, max(orders.__modified_at__)) as modified_at
FROM {% if $scope === "increment" and $origin_name === 'Contacts source' %}{{ ref($origin) }}{% else %}{{ ref("Contacts source") }}{% endif %} contacts
LEFT JOIN {% if $scope === "increment" and $origin_name === 'Orders source' %}{{ ref($origin) }}{% else %}{{ ref('Orders source') }}{% endif %} orders
ON contacts.__masterid__::text = orders.contact_masterid
{% if $scope === "increment" and $origin_name === 'Orders source' %}
WHERE contacts.__masterid__::text IN (SELECT origin.contact_masterid FROM {{ ref($origin) }} AS origin)
{% endif %}
And that's it! This way your query will always run in an optimized way over the mimimum required number of records.
<aside> 💡 This is how Octolis works internally in "builder" Audiences to ensure performance.
</aside>
When you read this you maybe thought it makes sense to implement those optimizations into each and every SQL Audience. But this does not always make sense...
Let's take the RFM segment for instance: it should always be computed again each time there is a new order or a new contact.
👉 Join or merge your Audience with another Source
On this page