Skip to main content

Writing Data Drivers

As an entity-centric platform, Octostar fully supports integrating data from external providers, on-demand, into your local workspaces. You may, for example, search the internet for additional information about a person, an IP address, an email, or use a proprietary service that is available inside your organization and which Octostar is able to connect to.

To facilitate data integration into the ontology Octostar uses, and therefore make the external data fully compatible with dashboards, apps, templates and the filesystem, we provide a utility library for raw API response conversions.

The API Parser Module​

The API Parser module is available in streamlit-octostar-utils with no additional dependencies. It contains an engine to parse dict-like data (e.g. JSON) into a set of entities and relationships, according to rulesets which are to be defined by the user and which are read top-to-bottom, in a similar fashion to Declarative Programming (that is, pairs of rules and consequences).

The output of the engine parsing is a list of entities and relationships that Octostar can understand in agreement with the business ontology in use. The parser has a rich and customizable set of Rules and Matches which allows to:

  • Match fuzzy keys into entities and entity properties
  • Define nested entities with associated relationships if a pattern is matched in the data
  • Unroll nested lists and dictionaries
  • Combine multiple fields into the same entity property (e.g. merge first and last name into a name property via string concatenation)
  • Consolidate similar entities into a singular one

Minimal Usage Example​

from api_parser.entities_parser import EntitiesParser, EntitiesParserRuleset
from api_parser.parameters import ConsolidationParameters
import yaml
import json
import os

### Provide some functions to indicate the rulesets to use, which are the entries in the JSON and the likely entity type
from vetric_functions import guess_raw_vetric_ruleset as guess_ruleset_fn, \
guess_vetric_entries as guess_entries_fn, \
guess_raw_vetric_entity_type as guess_entity_fn

with open("my_raw_data.json", "r") as file:
response = json.load(file)
entities = list()
relationships = list()
entries = guess_entries_fn(endpoint, response)
for entry in entries:
ruleset_files, ruleset_parameters = guess_ruleset_fn(endpoint, entry)
guessed_entity = guess_entity_fn(endpoint, entry)
parser = EntitiesParser([EntitiesParserRuleset(os.path.join("rulesets", ruleset_files[i]), ruleset_parameters[i]) for i in range(len(ruleset_files))])
new_entities, new_relationships = parser.apply_rules(entry, endpoint, guessed_entity)
entities.extend(new_entities)
relationships.extend(new_relationships)

print("ENTITIES (" + str(len(consolidated_entities)) + ")")
for entity in consolidated_entities:
print(entity.__dict__)
print()
print("RELATIONSHIPS")
for rel in consolidated_relationships:
print(rel.__dict__)
print()

The example assumes the developer has defined:

  • A function to return the list of entries to parse from a JSON (because e.g. in the JSON file the raw entities are contained in a list keyed results)
  • A function to return the ruleset (and params) to apply per entry
  • A function to optionally guess the entity type from the entry
  • The rulesets themselves

Minimal Ruleset Example​

An example of a ruleset file is as follows:

from api_parser.matches import EntityMatch, DiscardMatch
from api_parser.rules import BoolRule, RegexRule

def _is_instagram_account(field, entry):
conditions = list()
entry = entry.value
conditions.append('class' in entry and 'instagram' in entry['class'].lower())
conditions.append('guessed_entity' in entry and entry['guessed_entity'] == 'instagram_account')
return any(conditions)

CONCEPTS = {
BoolRule(_is_instagram_account): EntityMatch(
concept_name="instagram_account",
inherits_rules_from_concepts=["account"],
is_root=True,
keep=False,
push_context=True,
parse_rules={
RegexRule("photo_id"): DiscardMatch()
}
)
}

The file defines that an entry should be processed by this ruleset if _is_instagram_account() returns True, and if so it should create an entity of type instagram_account with fields obtained by parsing the entry as follows:

  • discarding a field named photo_id if there is any
  • inheriting the rules from another ruleset (which defines rules for the more generic concept of account) and applying those to the entry

Advanced Ruleset Features + Customizations​

TODO