Datasource

This section is about operating datasources in Timbr. Datasources are used as the execution engine for all the queries run on an ontology. Timbr connects to a datasource using a JDBC driver, and can theoretically connect to any datasource that has a JDBC driver. The types of datasources that are currently supported out-of-the-box are:

Supported Datasources

Amazon Athena with driver: com.simba.athena.jdbc.Driver
Amazon Redshift with driver: com.amazon.redshift.jdbc.Driver
Apache Drill with driver: org.apache.drill.jdbc.Driver
Apache Hive with driver: org.apache.hive.jdbc.HiveDriver
Apache Spark with driver: org.apache.hive.jdbc.HiveDriver
Azure Blob Storage with JSON connection template
Azure Datalake Storage with driver: org.apache.drill.jdbc.Driver
Clickhouse with driver: com.clickhouse.jdbc.ClickHouseDriver
Databricks with driver: com.simba.spark.jdbc.Driver
Google BigQuery with driver: com.simba.googlebigquery.jdbc42.Driver
Google Cloud Storage with JSON connection template
Microsoft SQL Server with driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
MySQL with driver: com.mysql.jdbc.Driver
Oracle with driver: oracle.jdbc.driver.OracleDriver
Oracle 11 with driver: oracle.jdbc.driver.OracleDriver
PostgreSQL with driver: org.postgresql.Driver
Presto with driver: com.facebook.presto.jdbc.PrestoDriver
S3 with driver: org.apache.drill.jdbc.Driver
SAP Hana with driver: com.sap.db.jdbc.Driver
Snowflake with driver: net.snowflake.client.jdbc.SnowflakeDriver
Trino with driver: io.trino.jdbc.TrinoDriver
Vertica Analytics Platform with driver: com.vertica.jdbc.Driver

Create a datasource

Required information when querying to create a new datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

{db_name} - The name of the datasource as it will be presented in Timbr
{target_url} - The JDBC URL of the datasource, usually including a protocol, hostname, port and sometimes database/catalog
{datasource_type} - The type of the datasource, as its presented in the list of supported datasources above
{user} - The username credentials for the datasource
{password} - The password credentials for the datasource

To create a datasource you can run the command:

CREATE DATASOURCE `{db_name}` DESCRIPTION '{description}' OPTIONS(URL '{target_url}', DRIVER '{datasource_driver}', TYPE '{datasource_type}', USER '{user}', PASSWORD '{password}') WITH VIRTUALIZATION

Optional keys are:

DESCRIPTION '{description}' - Provide a string with description for the datasource
WITH VIRTUALIZATION - Works only in the cases where you have the Timbr Virtualization Service or if the datasource type is Apache Spark or Databricks. This setting indicates whether the datasource serves as a virtualization engine. If you are using the Timbr Virtualization Service then your datasource type is Apache Spark

Create a datasource based on a json string

Some datasources like S3, Azure Datalake Storage, Google Cloud Storage, Azure Blob Storage require a JSON connection configuration template.

Required information when querying to create a new datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

{db_name} - The name of the datasource as it will be presented in Timbr
{json} - A stringify version of the template JSON used for the connection
{datasource_type} - The type of the datasource, as its presented in the list of supported datasources above
{json_secret} - The password credentials for the datasource

If you want to create such datasource requiring a JSON string, you can do it programmatically by running

CREATE DATASOURCE `{db_name}` DESCRIPTION '{description}' OPTIONS(JSON '{json}', TYPE '{datasource_type}', PASSWORD '{json_secret}') WITH VIRTUALIZATION

Optional keys are:

DESCRIPTION '{description}' - Provide a string with description for the datasource
WITH VIRTUALIZATION - Works only in the cases where you have the Timbr Virtualization Service or if the datasource type is Apache Spark or Databricks. This setting indicates whether the datasource serves as a virtualization engine. If you are using the Timbr Virtualization Service then your datasource type is Apache Spark

Show all datasources

Shows all the datasources associated with an ontology

SHOW DATASOURCES

Change the description of a datasource

Required information when changing the description of a datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

{datasource} - The name of the datasource as it will be presented in Timbr
{description} - The description given to the datasource

When connected to a current ontology that has the datasource set as active, you can run the command

ALTER DATASOURCE `{datasource}` SET DESCRIPTION = '{description}'

Setting the virtualization of a datasource on / off

If a datasource was created with a WITH VIRTUALIZATION option, you can set it ON or OFF

Required information when setting the virutalization of a datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

{datasource} - The name of the datasource as its presented in Timbr
{value} - 1 will set it ON and 0 will set it OFF

ALTER DATASOURCE `{datasource}` SET VIRTUALIZATION = {value}

Removing a datasource

Removes a datasource from Timbr.

:::danger Important

In order to remove a datasource from timbr you need to make sure it is not associated/connected to any ontology. If the datasource is associated/connected to an ontology, you should first remove it from that ontology and then remove it from Timbr.

:::

Required information for removing a datasource (the curly brackets {} should not be an input, they are used only as a variable substitution):

{datasource} - The name of the datasource as its presented in Timbr

DROP DATASOURCE `{datasource}`

Create a datasource​

Create a datasource based on a json string​

Show all datasources​

Change the description of a datasource​

Setting the virtualization of a datasource on / off​

Removing a datasource​