Connectors
Audiences
Syncs
Resources
đź““ API Docs
A Dataset is a combination of several data sources, on which you could apply data preparation rules and scorings, before syncing them to other tools.
For example, you could create datasets:
Datasets can be built from your raw data sources, or from another dataset.
You can create a dataset in no code or in SQL.
Let’s focus here on “No code” dataset creation.
Step 1 - Create a dataset
From the left Menu Dataset, you can click on “Build Dataset” at the top right.
You will be invited to choose between “No code Builder” or “SQL Builder”, and to give a name to your dataset.
Step 2 - Choose the first source
The first step is to choose the data source of your dataset.
You can choose a table from one of the Sources defined in the “Connectors” menu. It is also possible to choose an existing dataset as a source.
<aside> đź’ˇ For files in your FTP, there is an option to choose several files at once using a regex expression on the file name.
</aside>
Step 3 - Describe the first source
You are invited to:
<aside> đź’ˇ For the real-time purpose, datasets could be fueled by API or webhooks.
</aside>
Step 4 - Define the fields to import
The objective here is to choose the fields from your data source that will be imported into your dataset. You can choose all fields or select only some of them.
Octolis detects automatically most data “Types”, but we can make some mistakes. Please take the time to review the right data “Type” because a wrong data “Type” may create issues when applying a data preparation recipe on this column.
In the “Advanced settings”, you can choose if Octolis imports all the data source files each time or only the updated records since the last import to fuel the dataset. For obvious performance reasons, the default and recommended option are to import only the last updated records. This implies having a reliable “Updated at” column in your source file.
Step 5 - Map the fields with your dataset + Define the dedupe key
By default, your dataset columns have the same names that one of the data sources, and you can rename them in this step.
In the “Advanced settings” menu, you can define the “Dedupe” key. It could be the main ID of your data source records, a column “Email” or a combination like “Firstname” x ”Lastname” x ”Postal code”.
When two records imported have the same “Dedupe key”, they are merged when imported into the audience.
👉 More info on dedupe
👉 Join or merge your Audience with another Source
On this page