Preset
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion | 
| Domains | ✅ | Enabled by domainconfig to assign domain_key | 
| Table-Level Lineage | ✅ | Supported by default | 
Variation of the Superset plugin that works with Preset.io (Apache Superset SaaS).
CLI based Ingestion
Install the Plugin
The preset source works out of the box with acryl-datahub.
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| api_key string | Preset.io API key. | 
| api_secret string | Preset.io API secret. | 
| connect_uri string | Preset workspace URL. Default:  | 
| database_alias map(str,string) | |
| display_uri string | optional URL to use in links (if connect_uriis only for ingestion) | 
| manager_uri string | Preset.io API URL Default: https://api.app.preset.io | 
| options object | Default: {} | 
| password string | Superset password. | 
| platform_instance string | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details. | 
| provider string | Superset provider. Default: db | 
| username string | Superset username. | 
| env string | Environment to use in namespace when constructing URNs Default: PROD | 
| domain map(str,AllowDenyPattern) | A class to store allow deny regexes | 
| domain. key.allowarray | List of regex patterns to include in ingestion Default: ['.*'] | 
| domain. key.allow.stringstring | |
| domain. key.ignoreCaseboolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| domain. key.denyarray | List of regex patterns to exclude from ingestion. Default: [] | 
| domain. key.deny.stringstring | |
| stateful_ingestion StatefulStaleMetadataRemovalConfig | Preset Stateful Ingestion Config. | 
| stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_apiis specified, otherwise FalseDefault: False | 
| stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True | 
The JSONSchema for this configuration is inlined below.
{
  "title": "PresetConfig",
  "description": "Base configuration class for stateful ingestion for source configs to inherit from.",
  "type": "object",
  "properties": {
    "platform_instance": {
      "title": "Platform Instance",
      "description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.",
      "type": "string"
    },
    "env": {
      "title": "Env",
      "description": "Environment to use in namespace when constructing URNs",
      "default": "PROD",
      "type": "string"
    },
    "stateful_ingestion": {
      "title": "Stateful Ingestion",
      "description": "Preset Stateful Ingestion Config.",
      "allOf": [
        {
          "$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
        }
      ]
    },
    "connect_uri": {
      "title": "Connect Uri",
      "description": "Preset workspace URL.",
      "default": "",
      "type": "string"
    },
    "display_uri": {
      "title": "Display Uri",
      "description": "optional URL to use in links (if `connect_uri` is only for ingestion)",
      "type": "string"
    },
    "domain": {
      "title": "Domain",
      "description": "regex patterns for tables to filter to assign domain_key. ",
      "default": {},
      "type": "object",
      "additionalProperties": {
        "$ref": "#/definitions/AllowDenyPattern"
      }
    },
    "username": {
      "title": "Username",
      "description": "Superset username.",
      "type": "string"
    },
    "password": {
      "title": "Password",
      "description": "Superset password.",
      "type": "string"
    },
    "api_key": {
      "title": "Api Key",
      "description": "Preset.io API key.",
      "type": "string"
    },
    "api_secret": {
      "title": "Api Secret",
      "description": "Preset.io API secret.",
      "type": "string"
    },
    "manager_uri": {
      "title": "Manager Uri",
      "description": "Preset.io API URL",
      "default": "https://api.app.preset.io",
      "type": "string"
    },
    "provider": {
      "title": "Provider",
      "description": "Superset provider.",
      "default": "db",
      "type": "string"
    },
    "options": {
      "title": "Options",
      "default": {},
      "type": "object"
    },
    "database_alias": {
      "title": "Database Alias",
      "description": "Can be used to change mapping for database names in superset to what you have in datahub",
      "default": {},
      "type": "object",
      "additionalProperties": {
        "type": "string"
      }
    }
  },
  "additionalProperties": false,
  "definitions": {
    "DynamicTypedStateProviderConfig": {
      "title": "DynamicTypedStateProviderConfig",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "description": "The type of the state provider to use. For DataHub use `datahub`",
          "type": "string"
        },
        "config": {
          "title": "Config",
          "description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
          "default": {},
          "type": "object"
        }
      },
      "required": [
        "type"
      ],
      "additionalProperties": false
    },
    "StatefulStaleMetadataRemovalConfig": {
      "title": "StatefulStaleMetadataRemovalConfig",
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "type": "object",
      "properties": {
        "enabled": {
          "title": "Enabled",
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "default": false,
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "title": "Remove Stale Metadata",
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    },
    "AllowDenyPattern": {
      "title": "AllowDenyPattern",
      "description": "A class to store allow deny regexes",
      "type": "object",
      "properties": {
        "allow": {
          "title": "Allow",
          "description": "List of regex patterns to include in ingestion",
          "default": [
            ".*"
          ],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "deny": {
          "title": "Deny",
          "description": "List of regex patterns to exclude from ingestion.",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "ignoreCase": {
          "title": "Ignorecase",
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    }
  }
}
Code Coordinates
- Class Name: datahub.ingestion.source.preset.PresetSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Preset, feel free to ping us on our Slack.
Is this page helpful?