Elasticsearch
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Platform Instance | ✅ | Enabled by default | 
This plugin extracts the following:
- Metadata for indexes
 - Column types associated with each index field
 
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[elasticsearch]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
  type: "elasticsearch"
  config:
    # Coordinates
    host: 'localhost:9200'
    # Credentials
    username: user # optional
    password: pass # optional
    # SSL support
    use_ssl: False
    verify_certs: False
    ca_certs: "./path/ca.cert"
    client_cert: "./path/client.cert"
    client_key: "./path/client.key"
    ssl_assert_hostname: False
    ssl_assert_fingerprint: "./path/cert.fingerprint"
    # Options
    url_prefix: "" # optional url_prefix
    env: "PROD"
    index_pattern:
      allow: [".*some_index_name_pattern*"]
      deny: [".*skip_index_name_pattern*"]
    ingest_index_templates: False
    index_template_pattern:
      allow: [".*some_index_template_name_pattern*"]
sink:
# sink configs
Config Details
- Options
 - Schema
 
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
ca_certs  string  | Path to a certificate authority (CA) certificate. | 
client_cert  string  | Path to the file containing the private key and the certificate, or cert only if using client_key. | 
client_key  string  | Path to the file containing the private key if using separate cert and key files. | 
host  string  | The elastic search host URI.  Default: localhost:9200  | 
ingest_index_templates  boolean  | Ingests ES index templates if enabled.  Default: False  | 
password  string  | The password credential. | 
platform_instance  string  | The instance of the platform that all assets produced by this recipe belong to | 
ssl_assert_fingerprint  string  | Verify the supplied certificate fingerprint if not None. | 
ssl_assert_hostname  boolean  | Use hostname verification if not False.  Default: False  | 
url_prefix  string  | There are cases where an enterprise would have multiple elastic search clusters. One way for them to manage is to have a single endpoint for all the elastic search clusters and use url_prefix for routing requests to different clusters.  Default:   | 
use_ssl  boolean  | Whether to use SSL for the connection or not.  Default: False  | 
username  string  | The username credential. | 
verify_certs  boolean  | Whether to verify SSL certificates.  Default: False  | 
env  string  | The environment that all assets produced by this connector belong to  Default: PROD  | 
index_pattern  AllowDenyPattern  | regex patterns for indexes to filter in ingestion.  Default: {'allow': ['.*'], 'deny': ['^_.*', '^ilm-history.*...  | 
index_pattern.allow  array(string)  | |
index_pattern.deny  array(string)  | |
index_pattern.ignoreCase  boolean  | Whether to ignore case sensitivity during pattern matching.  Default: True  | 
index_template_pattern  AllowDenyPattern  | The regex patterns for filtering index templates to ingest.  Default: {'allow': ['.*'], 'deny': ['^_.*'], 'ignoreCase': ...  | 
index_template_pattern.allow  array(string)  | |
index_template_pattern.deny  array(string)  | |
index_template_pattern.ignoreCase  boolean  | Whether to ignore case sensitivity during pattern matching.  Default: True  | 
The JSONSchema for this configuration is inlined below.
{
  "title": "ElasticsearchSourceConfig",
  "description": "Any source that connects to a platform should inherit this class",
  "type": "object",
  "properties": {
    "env": {
      "title": "Env",
      "description": "The environment that all assets produced by this connector belong to",
      "default": "PROD",
      "type": "string"
    },
    "platform_instance": {
      "title": "Platform Instance",
      "description": "The instance of the platform that all assets produced by this recipe belong to",
      "type": "string"
    },
    "host": {
      "title": "Host",
      "description": "The elastic search host URI.",
      "default": "localhost:9200",
      "type": "string"
    },
    "username": {
      "title": "Username",
      "description": "The username credential.",
      "type": "string"
    },
    "password": {
      "title": "Password",
      "description": "The password credential.",
      "type": "string"
    },
    "use_ssl": {
      "title": "Use Ssl",
      "description": "Whether to use SSL for the connection or not.",
      "default": false,
      "type": "boolean"
    },
    "verify_certs": {
      "title": "Verify Certs",
      "description": "Whether to verify SSL certificates.",
      "default": false,
      "type": "boolean"
    },
    "ca_certs": {
      "title": "Ca Certs",
      "description": "Path to a certificate authority (CA) certificate.",
      "type": "string"
    },
    "client_cert": {
      "title": "Client Cert",
      "description": "Path to the file containing the private key and the certificate, or cert only if using client_key.",
      "type": "string"
    },
    "client_key": {
      "title": "Client Key",
      "description": "Path to the file containing the private key if using separate cert and key files.",
      "type": "string"
    },
    "ssl_assert_hostname": {
      "title": "Ssl Assert Hostname",
      "description": "Use hostname verification if not False.",
      "default": false,
      "type": "boolean"
    },
    "ssl_assert_fingerprint": {
      "title": "Ssl Assert Fingerprint",
      "description": "Verify the supplied certificate fingerprint if not None.",
      "type": "string"
    },
    "url_prefix": {
      "title": "Url Prefix",
      "description": "There are cases where an enterprise would have multiple elastic search clusters. One way for them to manage is to have a single endpoint for all the elastic search clusters and use url_prefix for routing requests to different clusters.",
      "default": "",
      "type": "string"
    },
    "index_pattern": {
      "title": "Index Pattern",
      "description": "regex patterns for indexes to filter in ingestion.",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [
          "^_.*",
          "^ilm-history.*"
        ],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "ingest_index_templates": {
      "title": "Ingest Index Templates",
      "description": "Ingests ES index templates if enabled.",
      "default": false,
      "type": "boolean"
    },
    "index_template_pattern": {
      "title": "Index Template Pattern",
      "description": "The regex patterns for filtering index templates to ingest.",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [
          "^_.*"
        ],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    }
  },
  "additionalProperties": false,
  "definitions": {
    "AllowDenyPattern": {
      "title": "AllowDenyPattern",
      "description": "A class to store allow deny regexes",
      "type": "object",
      "properties": {
        "allow": {
          "title": "Allow",
          "description": "List of regex patterns to include in ingestion",
          "default": [
            ".*"
          ],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "deny": {
          "title": "Deny",
          "description": "List of regex patterns to exclude from ingestion.",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "ignoreCase": {
          "title": "Ignorecase",
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    }
  }
}
Code Coordinates
- Class Name: 
datahub.ingestion.source.elastic_search.ElasticsearchSource - Browse on GitHub
 
Questions
If you've got any questions on configuring ingestion for Elasticsearch, feel free to ping us on our Slack.