Our metadata crew are in the middle of some serious design work in looking at how to best document our data holdings and I was wondering whether there were any examples I could see of data lineage being represented in Aristotle?
If we were to harvest information from our internal systems that table W and X in system A, was fed into a landing table Y in system B, and then table Y’s content was then pulled into system C in table Z how could we best represent this information?
As far as I know, these tables themselves would be represented in Aristotle as Distributions; and we even have a custom field in Distributions to represent the Physical Table Name of each table W, X, Y and Z. Each physical column of each table would be represented in these Aristotle distributions by the field path_name (or for the API it’s ‘logical_path’ inside of distributiondataelementpath_set inside of distribution).
But the data flow, from distributions W and X, feeding into distribution Y, which in turn feeds distribution Z. How is this shown?
Also, System A, B, and C … how are they best documented?
My feeling is that we would use links, such as the API call http://dss.aristotlecloud.io/api/v4/links/ , but I have not tested this.
Lastly, in order to locate table W in Aristotle, are we able to search via API according to Distribution physical name? (Physical name being a custom field we have set up).
So to sum up, my questions are:
- How do we best represent data flow across tables across different data platforms?
- Confirm that Path name is the best place to document the physical column name of a physical table distribution? (I’ve even seen a csv file documented as a Distribution, and Path name was then used to specify cell values eg: B1:B10)
- How best to locate whether a physical table is actually documented in Aristotle when all I know right now is the physical name? Is this a graphQL call, or must we cycle through all distributions via v4’s metadata_distribution_list ?