Sigma
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Asset Containers | ✅ | Enabled by default | 
| Descriptions | ✅ | Enabled by default | 
| Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion.remove_stale_metadata | 
| Extract Ownership | ✅ | Enabled by default, configured using ingest_owner | 
| Extract Tags | ✅ | Enabled by default | 
| Platform Instance | ✅ | Enabled by default | 
| Schema Metadata | ✅ | Enabled by default | 
| Table-Level Lineage | ✅ | Enabled by default. | 
This plugin extracts the following:
- Sigma Workspaces and Workbooks as Container.
- Sigma Datasets
- Pages as Dashboard and its Elements as Charts
Integration Details
This source extracts the following:
- Workspaces and workbooks within that workspaces as Container.
- Sigma Datasets as Datahub Datasets.
- Pages as Datahub dashboards and elements present inside pages as charts.
Configuration Notes
- Refer doc to generate an API client credentials.
- Provide the generated Client ID and Secret in Recipe.
Concept mapping
| Sigma | Datahub | Notes | 
|---|---|---|
| Workspace | Container | SubType "Sigma Workspace" | 
| Workbook | Container | SubType "Sigma Workbook" | 
| Page | Dashboard | |
| Element | Chart | |
| Dataset | Dataset | SubType "Sigma Dataset" | 
| User | User (a.k.a CorpUser) | Optionally Extracted | 
Advanced Configurations
Chart source platform mapping
If you want to provide platform details(platform name, platform instance and env) for chart's all external upstream data sources, then you can use chart_sources_platform_mapping as below:
Example - For just one specific chart's external upstream data sources
    chart_sources_platform_mapping:
      'workspace_name/workbook_name/chart_name_1': 
        data_source_platform: snowflake
        platform_instance: new_instance
        env: PROD
      'workspace_name/folder_name/workbook_name/chart_name_2': 
        data_source_platform: postgres
        platform_instance: cloud_instance
        env: DEV
Example - For all charts within one specific workbook
    chart_sources_platform_mapping:
      'workspace_name/workbook_name_1': 
        data_source_platform: snowflake
        platform_instance: new_instance
        env: PROD
      
      'workspace_name/folder_name/workbook_name_2': 
        data_source_platform: snowflake
        platform_instance: new_instance
        env: PROD
Example - For all workbooks charts within one specific workspace
    chart_sources_platform_mapping:
      'workspace_name': 
        data_source_platform: snowflake
        platform_instance: new_instance
        env: PROD
Example - All workbooks use the same connection
    chart_sources_platform_mapping:
      '*': 
        data_source_platform: snowflake
        platform_instance: new_instance
        env: PROD
CLI based Ingestion
Install the Plugin
The sigma source works out of the box with acryl-datahub.
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
  type: sigma
  config:
    # Coordinates
    api_url: "https://aws-api.sigmacomputing.com/v2"
    # Credentials
    client_id: "CLIENTID"
    client_secret: "CLIENT_SECRET"
    
    # Optional - filter for certain workspace names instead of ingesting everything.
    # workspace_pattern:
    #   allow:
    #     - workspace_name
    ingest_owner: true
    
    # Optional - mapping of sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.
    # chart_sources_platform_mapping:
    #   folder_path:
    #     data_source_platform: postgres
    #     platform_instance: cloud_instance
    #     env: DEV
sink:
  # sink configs
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| client_id ✅ string | Sigma Client ID | 
| client_secret ✅ string | Sigma Client Secret | 
| api_url string | Sigma API hosted URL. | 
| extract_lineage boolean | Whether to extract lineage of workbook's elements and datasets or not. Default: True | 
| ingest_owner boolean | Ingest Owner from source. This will override Owner info entered from UI. Default: True | 
| ingest_shared_entities boolean | Whether to ingest the shared entities or not. Default: False | 
| platform_instance string | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details. | 
| env string | The environment that all assets produced by this connector belong to Default: PROD | 
| chart_sources_platform_mapping map(str,PlatformDetail) | Any source that connects to a platform should inherit this class | 
| chart_sources_platform_mapping. key.envstring | The environment that all assets produced by this connector belong to Default: PROD | 
| chart_sources_platform_mapping. key.data_source_platform ❓string | A chart's data sources platform name. | 
| chart_sources_platform_mapping. key.platform_instancestring | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details. | 
| workbook_lineage_pattern AllowDenyPattern | Regex patterns to filter workbook's elements and datasets lineage in ingestion.Requires extract_lineage to be enabled. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | 
| workbook_lineage_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| workbook_lineage_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] | 
| workbook_lineage_pattern.allow.string string | |
| workbook_lineage_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] | 
| workbook_lineage_pattern.deny.string string | |
| workspace_pattern AllowDenyPattern | Regex patterns to filter Sigma workspaces in ingestion.Mention 'My documents' if personal entities also need to ingest. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | 
| workspace_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| workspace_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] | 
| workspace_pattern.allow.string string | |
| workspace_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] | 
| workspace_pattern.deny.string string | |
| stateful_ingestion StatefulStaleMetadataRemovalConfig | Sigma Stateful Ingestion Config. | 
| stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_apiis specified, otherwise FalseDefault: False | 
| stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True | 
The JSONSchema for this configuration is inlined below.
{
  "title": "SigmaSourceConfig",
  "description": "Base configuration class for stateful ingestion for source configs to inherit from.",
  "type": "object",
  "properties": {
    "env": {
      "title": "Env",
      "description": "The environment that all assets produced by this connector belong to",
      "default": "PROD",
      "type": "string"
    },
    "platform_instance": {
      "title": "Platform Instance",
      "description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.",
      "type": "string"
    },
    "stateful_ingestion": {
      "title": "Stateful Ingestion",
      "description": "Sigma Stateful Ingestion Config.",
      "allOf": [
        {
          "$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
        }
      ]
    },
    "api_url": {
      "title": "Api Url",
      "description": "Sigma API hosted URL.",
      "default": "https://aws-api.sigmacomputing.com/v2",
      "type": "string"
    },
    "client_id": {
      "title": "Client Id",
      "description": "Sigma Client ID",
      "type": "string"
    },
    "client_secret": {
      "title": "Client Secret",
      "description": "Sigma Client Secret",
      "type": "string"
    },
    "workspace_pattern": {
      "title": "Workspace Pattern",
      "description": "Regex patterns to filter Sigma workspaces in ingestion.Mention 'My documents' if personal entities also need to ingest.",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "ingest_owner": {
      "title": "Ingest Owner",
      "description": "Ingest Owner from source. This will override Owner info entered from UI.",
      "default": true,
      "type": "boolean"
    },
    "ingest_shared_entities": {
      "title": "Ingest Shared Entities",
      "description": "Whether to ingest the shared entities or not.",
      "default": false,
      "type": "boolean"
    },
    "extract_lineage": {
      "title": "Extract Lineage",
      "description": "Whether to extract lineage of workbook's elements and datasets or not.",
      "default": true,
      "type": "boolean"
    },
    "workbook_lineage_pattern": {
      "title": "Workbook Lineage Pattern",
      "description": "Regex patterns to filter workbook's elements and datasets lineage in ingestion.Requires extract_lineage to be enabled.",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "chart_sources_platform_mapping": {
      "title": "Chart Sources Platform Mapping",
      "description": "A mapping of the sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.",
      "default": {},
      "type": "object",
      "additionalProperties": {
        "$ref": "#/definitions/PlatformDetail"
      }
    }
  },
  "required": [
    "client_id",
    "client_secret"
  ],
  "additionalProperties": false,
  "definitions": {
    "DynamicTypedStateProviderConfig": {
      "title": "DynamicTypedStateProviderConfig",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "description": "The type of the state provider to use. For DataHub use `datahub`",
          "type": "string"
        },
        "config": {
          "title": "Config",
          "description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
          "default": {},
          "type": "object"
        }
      },
      "required": [
        "type"
      ],
      "additionalProperties": false
    },
    "StatefulStaleMetadataRemovalConfig": {
      "title": "StatefulStaleMetadataRemovalConfig",
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "type": "object",
      "properties": {
        "enabled": {
          "title": "Enabled",
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "default": false,
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "title": "Remove Stale Metadata",
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    },
    "AllowDenyPattern": {
      "title": "AllowDenyPattern",
      "description": "A class to store allow deny regexes",
      "type": "object",
      "properties": {
        "allow": {
          "title": "Allow",
          "description": "List of regex patterns to include in ingestion",
          "default": [
            ".*"
          ],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "deny": {
          "title": "Deny",
          "description": "List of regex patterns to exclude from ingestion.",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "ignoreCase": {
          "title": "Ignorecase",
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    },
    "PlatformDetail": {
      "title": "PlatformDetail",
      "description": "Any source that connects to a platform should inherit this class",
      "type": "object",
      "properties": {
        "env": {
          "title": "Env",
          "description": "The environment that all assets produced by this connector belong to",
          "default": "PROD",
          "type": "string"
        },
        "platform_instance": {
          "title": "Platform Instance",
          "description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.",
          "type": "string"
        },
        "data_source_platform": {
          "title": "Data Source Platform",
          "description": "A chart's data sources platform name.",
          "type": "string"
        }
      },
      "required": [
        "data_source_platform"
      ],
      "additionalProperties": false
    }
  }
}
Code Coordinates
- Class Name: datahub.ingestion.source.sigma.sigma.SigmaSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Sigma, feel free to ping us on our Slack.