Fun with API's - Part 1

Hello Community Members!

If you’ve ever been interested in Aristotle’s sweet little API toolkit and wondered what it could do, this post is for you :smiley:

Look at your Aristotle dashboard. Imagine metadata being put in, removed, or changed. It could be anything. As long as you know how to write code, the Aristotle API can probably do the job without using the website’s menu options on Aristotle.

All you need is some code writing know-how and Aristotle’s nifty API toolkit.

Don’t get me wrong, Aristotle has a lovely interface and a lot of smart work gets done there. But sometimes code is the better. People are smart, but what computer code lacks in brains it makes up for in speed and blind obedience!

image

Imagine a highly manual task, the sort that needs to be repeated hundreds or thousands of times. Boring, mundane, but necessary jobs. You could have your team draw straws to see who gets the dull but necessary job… but it just might be the perfect job for code and Aristotle’s API.

Here’s once such scenario. You have been given an enormous list of tables and columns from a database owner, and not much else. Before you can begin linking those columns to data elements, you need to grind out a set of barebones distributions.


This list is only 5 tables long and you could easily type it up by hand. But what about the day your friendly neighbourhood Aristotle Admin hands you a list of 1000 tables; each with a dozen columns? Not much fun… but an API loves this sort of work.

Below I’ll put in some Python code that does this job. It will read the file above, which I’ve called ‘distribution_payload.csv’. Once read, the python code will build a series of distribution objects from each csv line and pass them to Aristotle’s API to create for you. Fun!

Foolish Assumptions:

  1. You know how to write code in Python. Or at least know someone who does.
  2. You have a passing familiarity with Aristotle’s API and token management system pages. If you don’t the folk at Aristotle can point you in the right direction.

The Code Files:
File ‘distribution_payload.csv’ - a csv file (Comma Separated Values) where every line is a table name, with column names written afterward, and each name separated by commas.

DONORS_TABLE,donid,fname,lname,address,city,postcode
DONATIONS_TABLE,amount,donid,regid
CURRENCY_TABLE,ccode,cname
REGISTRANTS_TABLE,regid,type,fname,lname,address,city,postcode,phone,email,membership,listasdonor
INDIVIDUALS_TABLE,regid,age,tshirtsize

File ‘distribution_common_fields.json’ - this contains the bare bones json structure of an Aristotle distribution

{
    "name": "Our code will add the Distribution name here",
    "distributiondataelementpath_set": [ ]
}

File ‘csvDistributionLoader.py’ - where the Python is! I’ve heavily commented it to explain the steps therein.

# @author Michael, Mad Writer of API Code
# @date 20/08/2022
#
# This program is an example of how to load a csv of table/column values 
# as distributions into Aristotle using its native API.
# To make this work you'll need to create your own authority token 
# in Aristotle's token management page: https://aristotle.cloud/api/token/list/
# Sorry, you can't use mine :)
# So make your own and add it into the 'your_token_value' spot in the code below.
# 
# The CSV simply contains a distribution in each line in the following 
# format.
# <Distribution Name>,<Column A Name>,<Column B Name>, etc
#
# A template json file is also read. This contains other distribution 
# values that are common to all the distributions to be loaded. Good 
# candidates for this are fields like workgroup and stewardship_organisation.
#
# NOTE: While it does not allow the bulk import of all possible Distribution 
# fields, it does allow for: 
#   1. Multiple Distributions
#   2. Multiple Columns per distribution
#
# TODO: For a challenge, expand this to allow for column links to existing 
# data elements to be added.
#
# More API information can be found at: https://aristotle.cloud/api/
#
# To run "python3 csvDistributionLoader.py <csv filename>"
import sys
import json
import requests

# Some basic colours for console output, because it isn't the 1980's anymore
red = "\033[1;31;40m"
green = "\033[1;32;40m"
yellow = "\033[1;33;40m"
black = "\033[0;0m"


# Need 1 argument provided by user - the source csv file containing all our distribution metadata
if (len(sys.argv) -1 != 1):
    print(red + "Specify a csv filename." + black)
    print(red + "To run: 'python3 csvDistributionLoader.py <csv filename>'" + black)    
    sys.exit()

# Load our template distribution file in json format
# You can find examples at https://aristotle.cloud/api/v4/metadata/distribution
# Depending on where you put this json file, you may need to fiddle with the path
f = open('distribution_common_fields.json')

# This template has the basic structure we need to load a Distribution into Aristotle
# But there are fields unique to each distribution which we load from our csv file (the
# table and column names). We will add these into this record before sending to Aristotle. 
# Fields that stay the same throughout our entire API load are good candidates to enter
# into distribution_common_fields.json. For example, consider stewardship_organisation.
distributionBaseRecordJSON = json.load(f)


# OK so let's get that CSV file with all our unique distribution name and column information
# The format of the CSV is a simple <Distribution Name>,<Column A Name>,<Column B Name>, etc
csvFile = open(sys.argv[1], 'r')

# Alternative hardcoding option of the csv file
# Depending on where you put this file, you may need to fiddle with the path
# Commented out in this instance.
# csvFile = open('distribution_payload.csv', 'r')

distributionCount = 0 # Let's keep track of how many distributions we've loaded.

# For every line in our CSV file, let's do some processing to find all the bits we 
# need for our distribution
for line in csvFile.readlines():

    # rstrip removes the carriage return (\n character) at the end of the string we just read.
    # split creates a python list out of every item in your CSV line between each comma (,)
    csvLineValueList = line.rstrip().split(',') 
       
    # Let's loop through the list of CSV items for this distribution and process those values
    for csvItemNumber, textValue in enumerate(csvLineValueList):
      
        # Now we add these CSV value to our copy of distributionBaseRecordJSON. Making an
        # Aristotle API compliant Distribution record ready to load into the API.

        if (csvItemNumber == 0) : # Our first item in the CSV line must be the Distribution Name
            distributionBaseRecordJSON['name'] = textValue
        else : # Everything afterward are a logical path names for our columns of this distribution
            distributionBaseRecordJSON['distributiondataelementpath_set'].append({'logical_path' : textValue, 'order' : csvItemNumber, 'specialisation_classes': []})

    # Now we have one Distribution all prepped and ready for the Aristotle API
    # Let's send him home.

    # The URL of the Distribution call of the Aristotle API
    apiDistributionPutUrl = 'https://aristotle.cloud/api/v4/metadata/distribution' 

    ###### Let's set up an API connection ######
    # Some parameters on a post request to consider. 
    #  -proxies - add this if your organisation routes API requests through them
    #  -json - here we add our complete copy of distributionBaseRecordJSON
    #  -headers - you need a token with write authority. Create one on the token page: https://aristotle.cloud/api/token/list/

    # Let's now post our json distribution to the Aristotle API's url with our authority token
    # proving we are who we say we are:
    # The text 'your_token_value' will need to be swapped for your own token
    r = requests.post(apiDistributionPutUrl, json=distributionBaseRecordJSON, headers={'Authorization': 'Token your_token_value'})

    if (r.status_code != 200 and r.status_code != 201): # something went wrong...
        print ("Error processing distribution: " + distributionBaseRecordJSON['name'] + " : "  + r.reason)
        continue # let's skip this one and try the next [NOTE: you might want to abort instead]
    
    apiResponseDictionary = json.loads(r.text)
    
    # Aristotle will send you a json package in reply containing all the information about 
    # the distribution you just created, including its assigned uuid
    # (find it here under apiResponseDictionary['uuid']) 
    print(green + 'Aristotle API has created distribution ' + apiResponseDictionary['uuid'] + ':' + apiResponseDictionary['name'] + ' successfully.' + black)
    

    # TODO: the above call to the API would be so much better if it were in it's own library
    # of functions. You don't want to be typing this stuff over and over.

    # Now before we loop back to the next Distribution, let's reset the list of distributiondataelementpath_set path items
    distributionBaseRecordJSON['distributiondataelementpath_set'].clear()

    # Notch up another Distribution as loaded
    distributionCount += 1

    
print(" ") # Add a line to space out our console output a bit

# Let's publish a final tally for our load. Those manager types love stats. 
print(green + "--== DISTRIBUTION CSV BATCH INSERT COMPLETE ==--"  + black)
print(green + "Created " + str(distributionCount) +  " new distributions" + black)

Possibilities:
If you’ve made it this far and understand how this API function works, consider the implications for your own work patterns.

  • Metadata record loaders - This could easily be expanded to allow for column->data element links in the same CSV file. Better yet, you could read an Excel sheet instead of a CSV for better flexibility in design. Distributions aren’t the limit, as any metadata entity can be loaded, read or modified using the Aristotle API.
  • Metadata quality checks and updates - You can read existing records as easily as you can write in new ones. If you are considering some metadata quality checks on your holdings, API code is your best friend.

A full suite of API functions can be found at: https://aristotle.cloud/api/

A Final Warning:
With the power to manage thousands of records through your API code, you can save a great deal of time and effort. Just be careful and test thoroughly beforehand, as you can also ruin records with a lot less time and effort also :rofl:

1 Like

Great post thanks Michael. Looking forward to Part 2 :grinning:

1 Like