Hey all,
We have an upcoming change to datasets to improve how datasets are grouped, we’ve also been working with people from the New South Wales Department of Communities and Justice who have given us approval to use an example from their registry to showcase the improvements.
Because of the major to dataset visualisation we are gathering feedback to ensure this is the ideal solution across all clients and ensure all options are explored. If you have questions, recommendations, feedback or would like to offer you approval of the change please comment below.
In summary, they have a large dataset call the “Human Services Dataset (HSDS)” that includes over 100 distributions, and are looking for solutions to capture a data model for these tables and file.
However, the the two layer Dataset / Distribution model from DCAT and ISO11179 isn’t sufficient for recording the complexity of some large data assets.
The challenge:
The HSDS current contains 136 distributions shown sequentially and takes between 5-10 to load.
To resolve this, the Family and Community Services Insights and Analyics area has developed a collection that holds the hierarchy of the dataset to make it easier to browse. However this puts critical information about the dataset outside the governance process.
There is also a further challenge that when viewing a dataset within a Tablion Data Portal the hierarchy is not able to be synced from a collection.
The suggestion: adding a dataset grouping browser
We are going to add a “Dataset grouping” hierarchy, similar to Data Set Specifications to Datasets to capture a hierarchy of distributions in a way that is easy to manage.
This will add a hierarchical grouping that allows distributions to be structured within a Dataset
to captures semantics of the structure within the page. This will also improves discovery of distributions and improves overall page load speeds and page size.
Proposed change: Screenshot 1: The HSDS is now able to bring in all details into the description within a unified item page.
Proposed change: Screenshot 2: This will move the distributions further down the page into its own “browse panel” within the dataset. Each grouping can have its own metadata, such as a description of a subtable, view or database, and viewing an individual grouping will show all containing groups and distributions.
Proposed change: Screenshot 3: This will also allow only single distributions to be rendered making pages shorter and page loads faster.
Proposed change: Screenshot 4: Adding a hierarchy to the dataset will also allow this to be synced across to a Tablion Data Portal to improve variable selection and discovery:
Alternate options
Alternate option 1:Nested Datasets
DCAT and ISO11179 allow datasets to have relations to other datasets to build a hierarchy, however we have decided not to implement this for two major reasons.
Firstly, nested Datasets will introduce the usability challenge of managing and syncing governance, registration and permission between parent & child datasets. Adding this relation would have made it difficult to know who controlled the hierarchy and what was within the tree. Secondly, this would add technical complexity when syncing between systems as different datasets made not be able to be synced.