This guide provides instructions for using Dagster with Sigma using the dagster-sigma library. Your Sigma assets, including datasets and workbooks, can be represented in the Dagster asset graph, allowing you to track lineage and dependencies between Sigma assets and upstream data assets you are already modeling in Dagster.
To load Sigma assets into the Dagster asset graph, you must first construct a SigmaOrganization resource, which allows Dagster to communicate with your Sigma organization. You'll need to supply your client ID and client secret alongside the base URL. See Identify your API request URL in the Sigma documentation for more information on how to find your base URL.
Dagster can automatically load all datasets and workbooks from your Sigma workspace as asset specs. Call the undefined.load_sigma_asset_specs function, which returns list of AssetSpecs representing your Sigma assets. You can then include these asset specs in your Definitions object:
It is possible to load a subset of your Sigma assets by providing a undefined.SigmaFilter to the undefined.load_sigma_asset_specs function. This SigmaFilter object allows you to specify the folders from which you want to load Sigma workbooks, and also will allow you to configure which datasets are represented as assets.
Note that the content and size of Sigma organization may affect the performance of your Dagster deployments. Filtering the workbooks selection from which your Sigma assets will be loaded is particularly useful for improving loading times.
from dagster_sigma import(
SigmaBaseUrl,
SigmaFilter,
SigmaOrganization,
load_sigma_asset_specs,)import dagster as dg
sigma_organization = SigmaOrganization(
base_url=SigmaBaseUrl.AWS_US,
client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),)
sigma_specs = load_sigma_asset_specs(
organization=sigma_organization,
sigma_filter=SigmaFilter(# Filter down to only the workbooks in these folders
workbook_folders=[("my_folder","my_subfolder"),("my_folder","my_other_subfolder"),],# Specify whether to include datasets that are not used in any workbooks# default is True
include_unused_datasets=False,),)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})
Customize asset definition metadata for Sigma assets#
By default, Dagster will generate asset keys for each Sigma asset based on its type and name and populate default metadata. You can further customize asset properties by passing a custom DagsterSigmaTranslator subclass to the undefined.load_sigma_asset_specs function. This subclass can implement methods to customize the asset keys or specs for each Sigma asset type.
from dagster_sigma import(
DagsterSigmaTranslator,
SigmaBaseUrl,
SigmaOrganization,
SigmaWorkbook,
load_sigma_asset_specs,)import dagster as dg
sigma_organization = SigmaOrganization(
base_url=SigmaBaseUrl.AWS_US,
client_id=dg.EnvVar("SIGMA_CLIENT_ID"),
client_secret=dg.EnvVar("SIGMA_CLIENT_SECRET"),)# A translator class lets us customize properties of the built Sigma assets, such as the owners or asset keyclassMyCustomSigmaTranslator(DagsterSigmaTranslator):defget_asset_spec(self, data: SigmaWorkbook)-> dg.AssetSpec:# Adds a custom team owner tag for all Sigma assetsreturnsuper().get_asset_spec(data)._replace(owners=["team:my_team"])
sigma_specs = load_sigma_asset_specs(
sigma_organization, dagster_sigma_translator=MyCustomSigmaTranslator
)
defs = dg.Definitions(assets=[*sigma_specs], resources={"sigma": sigma_organization})
Definitions from multiple Sigma organizations can be combined by instantiating multiple SigmaOrganization resources and merging their specs. This lets you view all your Sigma assets in a single asset graph:
from dagster_sigma import SigmaBaseUrl, SigmaOrganization, load_sigma_asset_specs
import dagster as dg
sales_team_organization = SigmaOrganization(
base_url=SigmaBaseUrl.AWS_US,
client_id=dg.EnvVar("SALES_SIGMA_CLIENT_ID"),
client_secret=dg.EnvVar("SALES_SIGMA_CLIENT_SECRET"),)
marketing_team_organization = SigmaOrganization(
base_url=SigmaBaseUrl.AWS_US,
client_id=dg.EnvVar("MARKETING_SIGMA_CLIENT_ID"),
client_secret=dg.EnvVar("MARKETING_SIGMA_CLIENT_SECRET"),)
sales_team_specs = load_sigma_asset_specs(sales_team_organization)
marketing_team_specs = load_sigma_asset_specs(marketing_team_organization)# Merge the specs into a single set of definitions
defs = dg.Definitions(
assets=[*sales_team_specs,*marketing_team_specs],
resources={"marketing_sigma": marketing_team_organization,"sales_sigma": sales_team_organization,},)