Documenting metadata lineage/provenance query

heather · September 30, 2022, 2:13am

Our team are wondering if there is a way to document multiple levels of table provenance. We have a source table (A) which moves into a different database, table (B) and is then transformed to its final state, table (C).

Currently only table B shows the relationship between all 3 tables as it shows the source distribution, table A and then the generated distribution, table C. When looking at the data lineage graph for table B - all 3 tables are represented and the transformation from source through to the final table is clear.

From table C, when both table A and B are added to the provenance field they are appearing as being equivalent level tables when viewing the data lineage graph, when table A is the original source table and B follows.

As a result of not being able to document table A from table C. When viewing the original source table A, table C is not shown as being a generated distribution either. Only table B appears.

Is there a way to document or represent different levels of provenance?

Thankyou,

Heather

Michael · September 30, 2022, 6:17am

So in the graph view of this situation. Is it at all possible to be in the record of Table C, and be able to not only see the parent table, but the grandparent?

Being able to specify more levels, beyond immediate children or parents, would be a great addition to the graphic view here.

Also, being able to view lineage information in the main screen should be really useful also. Something like this:

AndrewB · October 5, 2022, 5:40am

I agree that more of the information entered into the Editor should be available in that table. Users shouldn’t have to edit the item to see that information, then very carefully exit without making any changes.

It would also be useful to be able to turn on and off higher-level parents (e.g. grandparents) and lower-level children (e.g. grandchildren) in the data lineage graphs.

sam · October 12, 2022, 10:52am

Hey all - thanks for all of the feedback!

@heather - would you be able to draw a graph of what you are describing here. I think I understand with the multi-layers of provenance, but I think the challenge here is how we differentiate between how table A and table B link into table C. Espcially if it is A → B → C. Because, as @AndrewB can probably share some of the provenance information can get quite lengthy, so adding more in can be come difficult to read.

@Michael - it should be possible ot load in more information to the graphs by clicking the info icon (i) to load extra source or destination tables. I’ll see if we can get a demo that demonstrates this so you can see how this works.

@AndrewB - I think we’ve discussed this in the past and I agree with you and @Michael and will work with the team to resolve the issues to show more infromation in the table. If yo uare a fan of @Michael’s mock up we can get that turned around quite quickly.

AndrewB · October 20, 2022, 11:32pm

Hi all,

Sorry I’m late replying to this conversation thread.

As @Sam noted the lineage graphs can get very complex. Some of the graphs we’ve been looking at have lines going everywhere (and that’s without grandparents or grandchildren being visible). It would be really important to be able to reduce the level of complexity to make the graphs readable and interpretable.

Table A → Table B → Table C mockup

The approach in this mockup (Table A - > B → C mockup) would be useful

It would help to be able to show/hide the grandparents/grandchildren. So if you are in the lineage graph for Table C, you could show/hide the lineage to Table A.

This could be done through another option box here:

Mockup of the Lineage Information in the Path Table in Distributions

I agree with @Michael that the Lineage Information needs to be accessible. Currently it can only be viewed by opening the distribution in the editor (then remembering to close it without saving).

However I’m not sure whether including it in the Path Table in the Distribution is the right approach (as per the second mockup). In some cases there may be a fair bit of Lineage Information that is included. This may make the Path table very long and hard to navigate.

It may be helpful to be able to either show the more detailed information for each path in a separate window or make it so the detailed information can be toggled on and off.

Lineage within a distribution

In some cases a path may have a lineage to one or more paths in the same distribution. Aristotle doesn’t currently support this type of lineage relationship. Is this something that should be supported?

Information about methods used in the lineage

On another matter related to lineages, some of our stakeholders have been asking about cases where a standard method is used to produce an item from other items (e.g. the standard methods used to code ANZSIC or ANZSCO from input items).

Would it make sense to have a metadata object to describe the methods used that could then be connected to the lineage path?

The benefit would be that the method could be described once then reused rather than needing to be described in the Lineage Information each time.

For example:

sam · November 13, 2022, 11:05am

Hi @AndrewB @Michael I’ll have a few longer responses for you this week, but wanted to share progress on the lineage review tooling.

We’ve added an update that allows users to view the ssource links for distributions. We did a few mockups of seeing impacts - but these could get quite long, so we are still looking at options here.

In the interim,I’ve got a preview that will be going out this week:

For this chain with two linked distributions:

Where links exist between logical paths, users will be able to expand to see the lineage (if it exists) along with a description of the lineage calcluation and links directly to the source lineage logical paths.

Michael · November 13, 2022, 11:35am

Looking like an awesome update Sam.

AndrewB · November 14, 2022, 1:02am

Thanks very much Sam

heather · November 16, 2022, 2:56am

Love it Sam! This is great

AndrewB · December 2, 2022, 1:18am

I had a quick look at the update for this today and it is great thank you.

The only feedback I have is it would be good to have a heading for the Lineage Relationships under the Toggled information. It would help to clearly distinguish the Lineage Relationships from the Lineage Details.

Topic		Replies	Views
Documenting metadata lineage and physical implementation information	2	594	September 2, 2021
How to record data lineage	2	456	July 1, 2021
Functionality of the SDDF templates Site Feedback	2	27	October 11, 2024
Data Set Specifications vs Distributions Site Feedback	1	37	July 19, 2024
Feature feedback: Updates to datasets to improve discovery and grouping	8	494	March 7, 2023

Documenting metadata lineage/provenance query

Related topics