Methods and Apparatus for a Semantic Multi-Database Data Lake.

U.S. Patent No. 10,901,973. Washington, DC: U.S. Patent and Trademark Office., 2021

Recommended citation: Rodrigo Dias Arruda Senra, Karin Breitman, Adriana Bechara Prado, and Victor S. Bursztyn. 2021. Methods and Apparatus for a Semantic Multi-Database Data Lake. U.S. Patent No. 10,901,973. Washington, DC: U.S. Patent and Trademark Office. https://patents.google.com/patent/US10901973B1/en

Methods and apparatus are provided for integrating a plurality of different database types in a semantic multi-database data lake. An exemplary method comprises providing a plurality of databases having different database types; translating ontology definition language database commands obtained from a user into a plurality of data definition language and/or data manipulation language commands supported by the different database types in order to replicate data from the user to each of the different database types; obtaining a query specified in a query language of a given database; and delegating the query to the given database. A plurality of cluster gateways optionally manage a corresponding plurality of clusters of database instances and wherein queries are delegated to a given database instance by delegating the queries to the appropriate cluster gateway. Dark data that was not queried by any supported query language in a predefined period of time can be detected.