JANUS Pathway Generator Variables

By: Fergus Bisset

Published: 18th July, 2022

This post is currently in draft

Content is not final, and may change at any time. The diagrams, which are drawn programmatically, are currently in beta - and should not be considered final. Known issues exist around their legibility on non-desktop devices.

Background to this work

Over the last month or so I have been doing a bit of data archaeology on the Pathway Generator Patient Vector and Pathway Generator Service Vector.

The published articles from the Pathway Generator team detailing the previous work at JANUS in Iceland (eg. Siggeirsdottir et al, 2016; Thomson et al, 2022), by my interpretation, make clear that the patient data variables are as follows:

Updated Janus Model

But further investigation of the Patient Vector in the Python code indicates that there are in fact more variables than just these, and that some of the variables are potentially overlapping and conceptually similar. For example there are two discernable variables that appear to be representing the psychological phenomenon of depression for example - one marked 'ht_dep', and another marked 'dass_dep'.

This inference, that some of the elements in the Patient Vector are themselves are derived from standardised data gathering and research informed protocols, as in the case of the aforementioned example, an implementation of the DASS Depression, Anxiety and Stress Score (Lovibond and Lovibond, 1995) - is something that, to my knowledge, has not been outlined in any of the previous project documentation or publications, or widely discussed amongst the project team.

For a designer joining the project, developing as clear a conceptual and contextual schema of the work on the project so far, is fundamental, both to help orient myself in relation to the work conducted so far, but also to to identify how my professional background and competence can best assist the project moving forward, and to assist with the identification of how my research can integrate with colleagues and with that, how we can develop collaborative working relationships.

I also hypothesise that identifying all of the variables, and with that the protocols that underpin the data-gathering processes behind the accumulation of such data is also fundamental in exploring both the practicalities and user-experience of implementing a version of the JANUS Pathway Generator in a Swedish context.

Thus, in this post, I aim to explore and discuss how the conceptualisation and visualisation, and with it the production of project documentation may help us in the process of gathering in the first instance a generalised model of the impact of vocational rehabilitation, and thereafter similarly standardised data in a Swedish context through adoption of evidence-informed or standardised data gathering protocols. Or, where they exist, through identification and acquisition of existing repositories of data that have been gathered by such means or can be deemed to be conceptually equivalent or similar to those used in Iceland.

More generally, differentiating and disambiguating the variables in the Patient Vector is necessary so that it is possible to implement a scalable variant of the Pathway Generator beyond that of it's original context, and it's present implementation as a script running on a single machine. Thus, it is further hypothesised that conceptualisation and visualisation and contextualisation of aspects of the existing Python script will help us with the development and implementation of a more generalised and servitised variant of the Pathway Generator and the necessary teamwork, collaboration and shared mental models this demands.

In addition to this need to identify the different variables to support the translation and technological implementation of the Pathway Generator, and the different layers of conceptual abstraction of the data underlying the Patient Vector, it is furthermore, not clear in my mind at least, from the published literature, project documentation and communication with members of the Pathway Generator Team to date, what form the raw data for each vector point takes, or how such data is stored, or indeed what pre-processing occurs as part of the data pipeline gathering data for, or implementing the results of this algorithm.

In a bid to help with the process of scoping and implementing a variant of The Pathway Generator in Sweden, or for that matter in Scotland, and given the particular need in Sweden to start gathering data before we can begin any process of data science or otherwise testing the validity, reliability or viability of a version of the Pathway Generator in a Swedish context, it was hypothesised as helpful to interrogate and attempt to model the data contained in the JANUS variant.

Aims of this work

To summarise then, this work therefore has a number of aims:

  1. To help understand the JANUS approach to modelling the health and social impacts of workplace / vocational / occupational rehabilitation.

  2. To help assessment of how much comparable data we are likely to be able to gather or interface with in a Swedish context, or what processes and stakeholder engagement activities we are likely going to have to do in a Swedish context in order to have access to comparable data.

  3. To open up a broader dialogue amongst the project team, and the wider stakeholders in our local contexts about the modelling of bio-psychosocial health, and health related quality of life measures, and how these interface with occupational rehabilitation in general, and specifically how such models can support the implementation of data-science as part of the vocational rehabilitation process.

  4. Related to the above, to open up conversation about the ethics of these various sources of data, and whether we feel that in a Swedish implementation, it would be possible or ethical to gather all the same data points as it has been possible for JANUS to use in Iceland.

  5. To identify alternative sources of data, or specific workplace rehabilitation data that might be preferable or more contextually appropriate to use in a Swedish context, and to support a process of the project team being able to replace specific variables in the Pathway Generator implementation, if appropriate.

  6. The above point 4, also relates to a need in a Swedish context of integrating the Pathway Generator in conjunction with the implementation of BIP (Beskæftigelses Indikator Projektet) data (Væksthuset og New Insight, 2012; Væksthuset, 2019). There exists no prior known discussion about either using BIP data for the purposes of data science, nor, assessing the parity of the BIP data with the JANUS data. This work aims to begin the process of addressing that need.

Abstracting the JANUS Model

The aim of this stage of the work was to derive an understanding of the concepts, and symbolic knowledge that underpin, explicitly or implicitly the Pathway Generator Patient Vector and the Pathway Generator more generally.

A full elaboration of the role of a number of design tools in 'reverse engineering' conceptual models from data models and code artefacts such as these is forthcoming. [I will add a link to this separate post in due course].

The first step in analysing or identifying the relevant variables used in the Icelandic implementation of the Pathway Generator, has been to develop a list of all the raw variables from the Python Code.

interface PatientVector {
	
	patient_id : UUID // or number or String?

	// Is this gender?
	person : string // (gender?) enum male|female|neither?

	age_start : number
	date_added : Date

	// Presumably their family status, in terms of children
	children_total : number
	children_in_care : number
	
	// Presumed boolean - although the protocol / assessment threshold for this is unclear
	// Recreational drug use?
	// Prescription drug use?
	// Including alcohol? - At what point is 'use' relevant, or problematic for work
	// rehabilitation / referral?
	drugs : boolean

	// Presumed boolean as to whether dyslexic or not - similar questions around diagnosis
	// process are relevant here, should all participants be assessed, are they, what protocol?
	dyslexic : boolean

	// Hard to interpret this... 
	// a boolean on whether they consider they are a victim of being bullied?? 
	// - what protocol is used to assess this?
	// Really interesting as a potential proxy for ADHD / autism for eg. but how do we see it
	// in relation to the other variables / assessment protocols for these things specifically?
	// What other duty of care do we have / around asking this question?
	bullying : boolean

	// Relationship Measures
	relationship_sambud : boolean // sambúð - cohabiting
	relationship_osambud : boolean // single
	relationship_giftur : boolean // married

	// Education Measures - level of education
	education_none : boolean
	education_starfsr : boolean // In employment
	education_secondary : boolean // In secondary education
	education_grunn : boolean // Primary
	education_byrjar_framh : boolean // Begun further education
	education_byrjar_uni : boolean // Begun uni
	education_university : boolean // Uni
	education_annad : boolean // Other
	education_byrjar_idn : boolean // Begin industrial / apprenticeship

	// Income Measures
	income_social : boolean
	income_rehab : boolean
	income_unemployement : boolean
	income_tekjulaus : boolean // without income
	income_sick_union : boolean
	income_sickpay : boolean
	income_salary : boolean
	income_disability : boolean
	income_other : boolean
	income_husaleigubaetur : boolean // housing benefit
	income_pension : boolean
	income_family : boolean
	income_vidbotarlifeyrissparnadur : boolean // Private pension
	income_faedingarorlof : boolean // Maternity Leave
	income_edu_loan : boolean // Student Loan

	// Housing Measures
	housing_none : boolean
	housing_alm_leiga : boolean // Rental accomdation
	housing_afangaheimili : boolean // Care home
	housing_eigid : boolean // Own home
	housing_husnaedi : boolean // Shared accomodation
	housing_felo : boolean // unknown
	housing_husnaedislaus : boolean // homeless

	// Whether they have participated before
	rehab_before : boolean

	// Tengla - 'links' in Icelandic?? So, 'social connectedness', 
	// or maybe the number of 'contacts' the ind. has had with the service?
	tengla_change_count : number

	// 'JE' Janus Engagement - "90% sure, this is whether they have engaged with JANUS before"
	je_before : boolean

	// Those below all seem to be some sort of metadata on effects 
	// of rehab / service use / interventions

	// "This is the variable which is the crux of everything, 
	// because it’s what we are predicting in the machine 
	// learning models."
	rehab_result : boolean
	current_rehab_duration : DateInterval

	// Again, like most of these variables, unclear if this self-assessed...?
	// Or administered through some other means...? 
	// And, whether a boolean, or some more complex measure / assessment protocol

	// Length of Unemployment
	unemployed : number

	// Education Engagement Metrics
	edu_difficulty : boolean

	// In education
	starts_edu : boolean
	starts_edu_time : DateTime

	// In work
	starts_work : boolean
	starts_work_time : DateTime

	// In training
	starts_training : boolean
	starts_training_time : DateTime

	// Looking for work
	starts_seeking : boolean
	starts_seeking_time : DateTime

	// Service Engagement / Attendance Metadata - somewhat hazy...

	// Some sort of score of their current engagement?
	attendance : number
	dropout : boolean

	// Baseline Service Effect Variables

	// Physical Health - unclear what protocol for gathering
	mean_physical : number // mean.physical
	stddev_physical : number // stddev.physical
	range_physical : number // range.physical
	linear_gradient_physical : number // linear.gradient.physical

	// Psychological Health - unclear what protocol for gathering
	mean_psychological : number // mean.psychological
	stddev_psychological : number // stddev.psychological
	range_psychological : number // range.psychological

	// Social Health or Social Competence, 
	// or Happiness/satisfaction with Social Situation?
	mean_social : number // mean.social
	stddev_social : number // stddev.social
	range_social : number // range.social

	// Financial Situation or Financial Competence, self assessed? 
	// or more objective?, unclear...
	mean_financial : number // mean.financial
	stddev_financial : number // stddev.financial
	range_financial : number // range.financial
	linear_gradient_financial : number // linear.gradient.financial

	// Self Discipline - unclear what protocol for gathering
	mean_self_discipline : number // mean.self.discipline
	stddev_self_discipline : number // stddev.self.discipline
	range_self_discipline : number // range.self.discipline

	// Self Reflection - unclear what protocol for gathering
	mean_self_reflection : number // mean.self.reflection
	stddev_self_reflection : number // stddev.self.reflection
	range_self_reflection : number // range.self.reflection

	// Creativity - unclear what protocol for gathering
	mean_creativity : number // mean.creativity
	stddev_creativity : number // stddev.creativity
	range_creativity : number // range.creativity

	// Technical  - unclear what protocol for gathering
	mean_technical : number // mean.technical
	stddev_technical : number // stddev.technical
	range_technical : number // range.technical
	linear_gradient_technical : number // linear.gradient.technical

	// Time Management - unclear what protocol for gathering
	mean_time_mgmt : number // mean.time.mgmt
	stddev_time_mgmt : number // stddev.time.mgmt
	range_time_mgmt : number // range.time.mgmt
	linear_gradient_time_mgmt : number // linear.gradient.time.mgmt

	// Empowerment - unclear what protocol for gathering
	mean_empowerment : number // mean.empowerment
	stddev_empowerment : number // stddev.empowerment
	range_empowerment : number // range.empowerment
	linear_gradient_empowerment : number // linear.gradient.empowerment

	// Resourcefulness - unclear what protocol for gathering
	mean_resourcefulness : number // mean.resourcefulness
	stddev_resourcefulness : number // stddev.resourcefulness
	range_resourcefulness : number // range.resourcefulness

	// is a pathway feature which takes the variance 
	// from the vector of points for each service aspect, 
	// e.g. [1,0,2] (simplified example) might tell us that the pathway has 
	// 1 psychological point, 0 physical points, and 2 technical points. 
	//The variance_between_facets is the variance on that vector.
	variance_between_facets : number // variance.between.facets
	
	// Pathway variables which mention “gaps” or “overlapping” days 
	// - a gap is a section of someone’s pathway where they have no 
	// current service running; an overlap is where they have more than one.

	rangeratio_between_lengthservices_lengthdays : number //rangeratio.between.lengthservices.lengthdays
	mean_length_between_gaps : number // mean.length.between.gaps
	number_overlapping_days : number // number.overlapping.days
	mean_length_overlapping_days : number // mean.length.overlapping.days

	// Mael & Ashforth (1992) Organizational Identification Scale? 
	// - just guessing, and that ae vowel combo could be something Icelandic 
	// - maelir - measure?
	
	// If it is, and much like the COPM measures of occupational fit, 
	// this would then be highly contextualised, 
	// ie. need to be reassessed after every "work placement" or work experience. 
	
	// Or is this measure used to assess fit between the individual and 
	// the intervention...? rather than the individual and their current organisation?

	ht_mael_first_measure : number
	ht_mael_last_measure : number
	ht_mael_change_eval_start : number
	ht_mael_change_start_last : number
	ht_mael_change_last : number

	// Líkamsrækt - Exercise / Physical Health 
	// - but in which case how is this different to ht heils
	// Presumably SF-36's Physical Health assessment scores 
	// could be a suitable proxy, or alternative?

	ht_likams_last_measure : number
	ht_likams_change_last : number
	ht_likams_change_start_last : number
	ht_likams_change_eval_start : number
	ht_likams_first_measure : number

	// Pain? 
	// Again, unclear what this is, or which protocol is used.
	ht_verkir_first_measure : number
	ht_verkir_last_measure : number
	ht_verkir_change_eval_start : number
	ht_verkir_change_last : number
	ht_verkir_change_start_last : number

	// General Health, presumably self assessed overall (physical) health, 
	// or combined overall physical and psychological health?
	// Again, unclear what this is, and how it is measured.
	ht_heils_change_eval_start : number
	ht_heils_change_start_last : number
	ht_heils_first_measure : number
	ht_heils_last_measure : number
	ht_heils_change_last : number

	// There is a need perhaps for greater clarity in our own future implementations 
	// about how this assessment differs from physical health, i.e. 'ht_likams' or 'mean_physical' 
	// or even how 'ht_heils' is distinguished from measures of 'psychological health' 
	// do we consciously embrace a model of health that separates 'physical' and 'psychological', 
	// especially if 'pain' is also another separate metric

	// We also need to ensure that our questions / data gathering with the participant on this front are clear, 
	// and if we are using these overlapping metrics that maybe we test to see if they are correlated?

	// Can 'health' really usefully be reduced to a single metric, and if so, on what scale? 

	// More generic health metadata, how many times ht has been assessed?
	ht_last_measure_time : Date
	ht_count : number
	ht_before_start : number
	ht_mean_between : number // Could also be a DateInterval

	// Depression
	// Assessment protocol unclear - and correlation / relationship to DASS Depression unclear
	ht_dep_first_measure : number
	ht_dep_last_measure : number
	ht_dep_change_start_last : number
	ht_dep_change_eval_start : number
	ht_dep_change_last : number

	// Ein - alone? loneliness?, assessed using which protocol?
	// Assessment protocol unclear
	ht_einb_first_measure : number
	ht_einb_last_measure : number
	ht_einb_change_start_last : number
	ht_einb_change_eval_start : number
	ht_einb_change_last : number

	// lið - team(?) an - without(?) - autonomy? / independence	
	// Or something else teamwork/collaboration-related? Just guessing..
	// Assessment protocol unclear
	ht_lidan_first_measure : number
	ht_lidan_last_measure : number
	ht_lidan_change_start_last : number
	ht_lidan_change_eval_start : number
	ht_lidan_change_last : number

	// Sjálfsagi (self-discipline) or sjalfseing (self-reflection)
	// - we name both of these as variables in some documentation, 
	// yet there is only one variable with 'sjalfs' so unclear which one this is.
	// Assessment protocol unclear
	ht_sjalfs_first_measure : number
	ht_sjalfs_last_measure : number
	ht_sjalfs_change_start_last : number
	ht_sjalfs_change_eval_start : number
	ht_sjalfs_change_last : number

	// Kvíði - Anxiety
	// Assessment protocol unclear
	ht_kvidi_first_measure : number
	ht_kvidi_last_measure : number
	ht_kvidi_change_start_last : number
	ht_kvidi_change_eval_start : number
	ht_kvidi_change_last : number

	// Svefn - Sleep
	// Assessment protocol unclear
	ht_svefn_first_measure : number
	ht_svefn_last_measure : number
	ht_svefn_change_eval_start : number
	ht_svefn_change_start_last : number
	ht_svefn_change_last : number

	// þrek/threk - (Physical) Endurance / Stamina?
	// Assessment protocol unclear
	ht_threk_first_measure : number
	ht_threk_last_measure : number
	ht_threk_change_eval_start : number
	ht_threk_change_start_last : number
	ht_threk_change_last : number

	// Struggling on this one, but maybe samsköpun? - collaboration?
	// Assessment protocol unclear
	ht_samsk_first_measure : number
	ht_samsk_last_measure : number
	ht_samsk_change_eval_start : number
	ht_samsk_change_start_last : number
	ht_samsk_change_last : number

	// Financial Situation? or Financial Self Competence
	// Assessment protocol unclear
	ht_fjarh_first_measure : number
	ht_fjarh_last_measure : number
	ht_fjarh_change_last : number
	ht_fjarh_change_start_last : number
	ht_fjarh_change_eval_start : number

	// Disability Measures

	// Unclear how this is assessed?
	disability_proportion : number
	icd_count_other : number
	
	// Relatively easy to discern what these are, but less so how
	// the data is measured or summarised for these purposes.

	// M - Musculoskeletal system and connective tissue
	icd_count_M : number
	// J - Respiratory diseases and symptoms
	icd_count_J : number
	// I - Circulatory diseases and symptoms
	icd_count_I : number
	// F - Mental and behavioural disorders
	icd_count_F : number
	
	// Don't know what this is, or how it is derived.
	icd_count_variation : number

	// SPS - Social Phobia Score
	sps_before_start : number
	sps_first_measure : number
	sps_mean_between : number
	sps_last_measure_time : Date
	sps_count : number
	sps_last_measure : number

	// Young Person ADHD Test Score - Conners-Wells’ Adolescent Self-Report Scale?
	adhd_young_total_score : number
	adhd_young_total_sdev : number
	// athygli - attention
	adhd_young_athygli_score : number
	adhd_young_athygli_sdev : number
	// ofvirkni - hyperactivity
	adhd_young_ofvirkni_sdev : number
	adhd_young_ofvirkni_score : number

	// Adult ADHD Test Score
	adhd_adult_total_score : number
	adhd_adult_total_sdev : number
	// athygli - attention
	adhd_adult_athygli_score : number
	adhd_adult_athygli_sdev : number
	// ofvirkni - hyperactivity
	adhd_adult_ofvirkni_sdev : number
	adhd_adult_ofvirkni_score : number

	// DASS - Depression, Anxiety and Stress Score 
	// - does that mean ht_dep is not depression?
	dass_count : number
	dass_mean_between : number
	dass_before_start : number
	dass_last_measure_time : Date

	// DASS - Stress
	dass_first_measure_stress : number
	dass_last_measure_stress : number
	dass_change_eval_start_stress : number
	dass_change_start_last_stress : number

	// DASS - Anxiety
	dass_first_measure_anxiety : number
	dass_last_measure_anxiety : number
	dass_change_eval_start_anxiety : number
	dass_change_start_last_anxiety : number

	// DASS - Depression
	dass_first_measure_depression : number
	dass_last_measure_depression : number
	dass_change_eval_start_depression : number
	dass_change_start_last_depression : number

	// SIAS - Social Interaction Anxiety Score
	sias_last_measure : number
	sias_mean_between : number
	sias_first_measure : number
	sias_before_start : number
	sias_last_measure_time : Date
	sias_count : number

	// Rosenborg Self-Efficacy Scale
	rose_first_measure : number
	rose_last_measure : number
	rose_count : number
	rose_before_start : number
	rose_change_eval_start : number
	rose_change_start_last : number
	rose_last_measure_time : DateTime
	rose_change_last : number
	rose_mean_between : number // Or possibly DateInterval
 
	// DSIM - Autism Assessment Score
	dsim_score : number
	dsim_sdev : number

}

After identifying the distinct variables it was possible to identify a number of patterns, and potential abstractions in the vector.

Chief amongst these was the recurrence of mean, stddev and range amongst many of the underlying variables.

interface Statistical_Variable {
	mean : number
	stddev : number
	range : number
	linear_gradient? : number
}

And also that of first_measure, last_measure, change_last, change_eval_start, change_start_last

interface Temporal_Variable {
	first_measure : number
	last_measure : number
	change_last : number
	change_eval_start : number
	change_start_last : number
}

Thus, most of the subsequently discussed and visualised data points, could inherit or extend parent classes, which derive these statistical or temporal data representations from the underlying raw dataset / database. Or, any subsequent re-creation / implementation of the Pathway Generator in another context would need to provide functions and APIs that handle, and deliver these data representations as part of the process of pulling user data.

class Physical_Health implements Statistical_Variable {
	...
}

...

class DSIM_Autism_Score implements Statistical_Variable {
	...
}

// OR

class Financial_Situation implements Temporal_Variable {
	...
}

Visualising the JANUS Model

Once these statistically derived variables have been stripped out, and further to the process of identifying the included data variables and how they are derived, it is next necessary to move up several conceptual levels - and with that, address more domain specific knowledge, to explore the different facets of service user data, or user capability assessment data that form the vector.

To assist with converting this flat data structure into something easier to understand, communicate and navigate I have explored clustering a number of explicitly or implicitly similar variables into thematically coherent groups. This, it is hoped can assist in the identification of overlapping and conceptually similar clusters of data, and identify potential sources of said data, and thereafter, help us to build engagement / data gathering processes around these data sources.

The identification of such thematic clusters could also form the basis of some sort of distributed or federated data architecture, and thus is a key step (I hope) in helping identify where, given my own research interests, I can potentially personally contribute to the TPG project moving forward.

This data representation contains a lot of apparently overlapping and duplicated variables, or at least, variables that seem to assess similar conceptual facets of users' health-related quality of life / biopsychosocial health.

So if we remove variables that are clearly duplicates, and nest certain variables in others that are clearly overarching - Anxiety, Stress and Depression can be nested in DASS for example - we can 'simplify' the model to the following:

Simplifed Representation of TPG Patient Vector

Visualising the Patient Vector

TPG Patient Vector Modelled Using Yang et al’sHorizontal-Federated Learning PerspectiveFergus BissetServiceVariableOutcomeVariablePatientVariableSamplesFeaturesPatient VectorPhysical Capabilities Feature SpaceLife Situation Feature SpacePsychological Capabilities Feature SpaceSocial Capabilities Feature SpacePhysicalActivityPhysicalDisabilitiesSleepStressFinancialSituationEmploymentSituationRehab.SituationHousingRelationshipStatusTechnicalSkillsDepressionAutonomyAnxietyResilienceSelf-ReflectiveCapabilityLonelinessCreativityTimeManagementEmpowermentResourcefulnessSocialActivityBullyingCollaborationSelf-Discipline

Next Steps

More to come...

Review of Comparable Models of Vocational Rehabilitation

Work ongoing, see emergent results and discussion here.

Identification of Data Sources

Development of Synthetic Data

References

Lovibond, S.H.; Lovibond, P.F. (1995), Manual for the Depression Anxiety Stress Scales (2nd ed.), Sydney: Psychology Foundation (Available from The Psychology Foundation, Room 1005 Mathews Building, University of New South Wales, NSW 2052, Australia)

Siggeirsdottir, K., Brynjolfsdottir, R.D., Haraldsson, S.O., Vidar, S., Gudmundsson, E.G., Brynjolfsson, J.H., Jonsson, H., Hjaltason, O. and Gudnason, V. (2016). Determinants of outcome of vocational rehabilitation, Work, Vol. 55, pp. 577-583, DOI: 10.3233/wor-162436, IOS Press http://doi.org/10.3233/wor-162436