JANUS Pathway Generator Variables
Published: 18th July, 2022
This post is currently in draft
Content is not final, and may change at any time. The diagrams, which are drawn programmatically, are currently in beta - and should not be considered final. Known issues exist around their legibility on non-desktop devices.
Background to this work
Over the last month or so I have been doing a bit of data archaeology on the Pathway Generator Patient Vector
and Pathway Generator Service Vector
.
The published articles from the Pathway Generator team detailing the previous work at JANUS in Iceland (eg. Siggeirsdottir et al, 2016; Thomson et al, 2022), by my interpretation, make clear that the patient data variables are as follows:
But further investigation of the Patient Vector
in the Python code indicates that there are in fact more variables than just these, and that some of the variables are potentially overlapping and conceptually similar. For example there are two discernable variables that appear to be representing the psychological phenomenon of depression for example - one marked 'ht_dep
', and another marked 'dass_dep
'.
This inference, that some of the elements in the Patient Vector
are themselves are derived from standardised data gathering and research informed protocols, as in the case of the aforementioned example, an implementation of the DASS Depression, Anxiety and Stress Score (Lovibond and Lovibond, 1995) - is something that, to my knowledge, has not been outlined in any of the previous project documentation or publications, or widely discussed amongst the project team.
For a designer joining the project, developing as clear a conceptual and contextual schema of the work on the project so far, is fundamental, both to help orient myself in relation to the work conducted so far, but also to to identify how my professional background and competence can best assist the project moving forward, and to assist with the identification of how my research can integrate with colleagues and with that, how we can develop collaborative working relationships.
I also hypothesise that identifying all of the variables, and with that the protocols that underpin the data-gathering processes behind the accumulation of such data is also fundamental in exploring both the practicalities and user-experience of implementing a version of the JANUS Pathway Generator in a Swedish context.
Thus, in this post, I aim to explore and discuss how the conceptualisation and visualisation, and with it the production of project documentation may help us in the process of gathering in the first instance a generalised model of the impact of vocational rehabilitation, and thereafter similarly standardised data in a Swedish context through adoption of evidence-informed or standardised data gathering protocols. Or, where they exist, through identification and acquisition of existing repositories of data that have been gathered by such means or can be deemed to be conceptually equivalent or similar to those used in Iceland.
More generally, differentiating and disambiguating the variables in the Patient Vector
is necessary so that it is possible to implement a scalable variant of the Pathway Generator beyond that of it's original context, and it's present implementation as a script running on a single machine. Thus, it is further hypothesised that conceptualisation and visualisation and contextualisation of aspects of the existing Python script will help us with the development and implementation of a more generalised and servitised variant of the Pathway Generator and the necessary teamwork, collaboration and shared mental models this demands.
In addition to this need to identify the different variables to support the translation and technological implementation of the Pathway Generator, and the different layers of conceptual abstraction of the data underlying the Patient Vector
, it is furthermore, not clear in my mind at least, from the published literature, project documentation and communication with members of the Pathway Generator Team to date, what form the raw data for each vector point takes, or how such data is stored, or indeed what pre-processing occurs as part of the data pipeline gathering data for, or implementing the results of this algorithm.
In a bid to help with the process of scoping and implementing a variant of The Pathway Generator
in Sweden, or for that matter in Scotland, and given the particular need in Sweden to start gathering data before we can begin any process of data science or otherwise testing the validity, reliability or viability of a version of the Pathway Generator in a Swedish context, it was hypothesised as helpful to interrogate and attempt to model the data contained in the JANUS variant.
Aims of this work
To summarise then, this work therefore has a number of aims:
To help understand the JANUS approach to modelling the health and social impacts of workplace / vocational / occupational rehabilitation.
To help assessment of how much comparable data we are likely to be able to gather or interface with in a Swedish context, or what processes and stakeholder engagement activities we are likely going to have to do in a Swedish context in order to have access to comparable data.
To open up a broader dialogue amongst the project team, and the wider stakeholders in our local contexts about the modelling of bio-psychosocial health, and health related quality of life measures, and how these interface with occupational rehabilitation in general, and specifically how such models can support the implementation of data-science as part of the vocational rehabilitation process.
Related to the above, to open up conversation about the ethics of these various sources of data, and whether we feel that in a Swedish implementation, it would be possible or ethical to gather all the same data points as it has been possible for JANUS to use in Iceland.
To identify alternative sources of data, or specific workplace rehabilitation data that might be preferable or more contextually appropriate to use in a Swedish context, and to support a process of the project team being able to replace specific variables in the Pathway Generator implementation, if appropriate.
The above point 4, also relates to a need in a Swedish context of integrating the Pathway Generator in conjunction with the implementation of BIP (Beskæftigelses Indikator Projektet) data (Væksthuset og New Insight, 2012; Væksthuset, 2019). There exists no prior known discussion about either using BIP data for the purposes of data science, nor, assessing the parity of the BIP data with the JANUS data. This work aims to begin the process of addressing that need.
Abstracting the JANUS Model
The aim of this stage of the work was to derive an understanding of the concepts, and symbolic knowledge that underpin, explicitly or implicitly the Pathway Generator Patient Vector
and the Pathway Generator
more generally.
A full elaboration of the role of a number of design tools in 'reverse engineering' conceptual models from data models and code artefacts such as these is forthcoming. [I will add a link to this separate post in due course].
The first step in analysing or identifying the relevant variables used in the Icelandic implementation of the Pathway Generator, has been to develop a list of all the raw variables from the Python Code.
interface PatientVector {
patient_id : UUID // or number or String?
// Is this gender?
person : string // (gender?) enum male|female|neither?
age_start : number
date_added : Date
// Presumably their family status, in terms of children
children_total : number
children_in_care : number
// Presumed boolean - although the protocol / assessment threshold for this is unclear
// Recreational drug use?
// Prescription drug use?
// Including alcohol? - At what point is 'use' relevant, or problematic for work
// rehabilitation / referral?
drugs : boolean
// Presumed boolean as to whether dyslexic or not - similar questions around diagnosis
// process are relevant here, should all participants be assessed, are they, what protocol?
dyslexic : boolean
// Hard to interpret this...
// a boolean on whether they consider they are a victim of being bullied??
// - what protocol is used to assess this?
// Really interesting as a potential proxy for ADHD / autism for eg. but how do we see it
// in relation to the other variables / assessment protocols for these things specifically?
// What other duty of care do we have / around asking this question?
bullying : boolean
// Relationship Measures
relationship_sambud : boolean // sambúð - cohabiting
relationship_osambud : boolean // single
relationship_giftur : boolean // married
// Education Measures - level of education
education_none : boolean
education_starfsr : boolean // In employment
education_secondary : boolean // In secondary education
education_grunn : boolean // Primary
education_byrjar_framh : boolean // Begun further education
education_byrjar_uni : boolean // Begun uni
education_university : boolean // Uni
education_annad : boolean // Other
education_byrjar_idn : boolean // Begin industrial / apprenticeship
// Income Measures
income_social : boolean
income_rehab : boolean
income_unemployement : boolean
income_tekjulaus : boolean // without income
income_sick_union : boolean
income_sickpay : boolean
income_salary : boolean
income_disability : boolean
income_other : boolean
income_husaleigubaetur : boolean // housing benefit
income_pension : boolean
income_family : boolean
income_vidbotarlifeyrissparnadur : boolean // Private pension
income_faedingarorlof : boolean // Maternity Leave
income_edu_loan : boolean // Student Loan
// Housing Measures
housing_none : boolean
housing_alm_leiga : boolean // Rental accomdation
housing_afangaheimili : boolean // Care home
housing_eigid : boolean // Own home
housing_husnaedi : boolean // Shared accomodation
housing_felo : boolean // unknown
housing_husnaedislaus : boolean // homeless
// Whether they have participated before
rehab_before : boolean
// Tengla - 'links' in Icelandic?? So, 'social connectedness',
// or maybe the number of 'contacts' the ind. has had with the service?
tengla_change_count : number
// 'JE' Janus Engagement - "90% sure, this is whether they have engaged with JANUS before"
je_before : boolean
// Those below all seem to be some sort of metadata on effects
// of rehab / service use / interventions
// "This is the variable which is the crux of everything,
// because it’s what we are predicting in the machine
// learning models."
rehab_result : boolean
current_rehab_duration : DateInterval
// Again, like most of these variables, unclear if this self-assessed...?
// Or administered through some other means...?
// And, whether a boolean, or some more complex measure / assessment protocol
// Length of Unemployment
unemployed : number
// Education Engagement Metrics
edu_difficulty : boolean
// In education
starts_edu : boolean
starts_edu_time : DateTime
// In work
starts_work : boolean
starts_work_time : DateTime
// In training
starts_training : boolean
starts_training_time : DateTime
// Looking for work
starts_seeking : boolean
starts_seeking_time : DateTime
// Service Engagement / Attendance Metadata - somewhat hazy...
// Some sort of score of their current engagement?
attendance : number
dropout : boolean
// Baseline Service Effect Variables
// Physical Health - unclear what protocol for gathering
mean_physical : number // mean.physical
stddev_physical : number // stddev.physical
range_physical : number // range.physical
linear_gradient_physical : number // linear.gradient.physical
// Psychological Health - unclear what protocol for gathering
mean_psychological : number // mean.psychological
stddev_psychological : number // stddev.psychological
range_psychological : number // range.psychological
// Social Health or Social Competence,
// or Happiness/satisfaction with Social Situation?
mean_social : number // mean.social
stddev_social : number // stddev.social
range_social : number // range.social
// Financial Situation or Financial Competence, self assessed?
// or more objective?, unclear...
mean_financial : number // mean.financial
stddev_financial : number // stddev.financial
range_financial : number // range.financial
linear_gradient_financial : number // linear.gradient.financial
// Self Discipline - unclear what protocol for gathering
mean_self_discipline : number // mean.self.discipline
stddev_self_discipline : number // stddev.self.discipline
range_self_discipline : number // range.self.discipline
// Self Reflection - unclear what protocol for gathering
mean_self_reflection : number // mean.self.reflection
stddev_self_reflection : number // stddev.self.reflection
range_self_reflection : number // range.self.reflection
// Creativity - unclear what protocol for gathering
mean_creativity : number // mean.creativity
stddev_creativity : number // stddev.creativity
range_creativity : number // range.creativity
// Technical - unclear what protocol for gathering
mean_technical : number // mean.technical
stddev_technical : number // stddev.technical
range_technical : number // range.technical
linear_gradient_technical : number // linear.gradient.technical
// Time Management - unclear what protocol for gathering
mean_time_mgmt : number // mean.time.mgmt
stddev_time_mgmt : number // stddev.time.mgmt
range_time_mgmt : number // range.time.mgmt
linear_gradient_time_mgmt : number // linear.gradient.time.mgmt
// Empowerment - unclear what protocol for gathering
mean_empowerment : number // mean.empowerment
stddev_empowerment : number // stddev.empowerment
range_empowerment : number // range.empowerment
linear_gradient_empowerment : number // linear.gradient.empowerment
// Resourcefulness - unclear what protocol for gathering
mean_resourcefulness : number // mean.resourcefulness
stddev_resourcefulness : number // stddev.resourcefulness
range_resourcefulness : number // range.resourcefulness
// is a pathway feature which takes the variance
// from the vector of points for each service aspect,
// e.g. [1,0,2] (simplified example) might tell us that the pathway has
// 1 psychological point, 0 physical points, and 2 technical points.
//The variance_between_facets is the variance on that vector.
variance_between_facets : number // variance.between.facets
// Pathway variables which mention “gaps” or “overlapping” days
// - a gap is a section of someone’s pathway where they have no
// current service running; an overlap is where they have more than one.
rangeratio_between_lengthservices_lengthdays : number //rangeratio.between.lengthservices.lengthdays
mean_length_between_gaps : number // mean.length.between.gaps
number_overlapping_days : number // number.overlapping.days
mean_length_overlapping_days : number // mean.length.overlapping.days
// Mael & Ashforth (1992) Organizational Identification Scale?
// - just guessing, and that ae vowel combo could be something Icelandic
// - maelir - measure?
// If it is, and much like the COPM measures of occupational fit,
// this would then be highly contextualised,
// ie. need to be reassessed after every "work placement" or work experience.
// Or is this measure used to assess fit between the individual and
// the intervention...? rather than the individual and their current organisation?
ht_mael_first_measure : number
ht_mael_last_measure : number
ht_mael_change_eval_start : number
ht_mael_change_start_last : number
ht_mael_change_last : number
// Líkamsrækt - Exercise / Physical Health
// - but in which case how is this different to ht heils
// Presumably SF-36's Physical Health assessment scores
// could be a suitable proxy, or alternative?
ht_likams_last_measure : number
ht_likams_change_last : number
ht_likams_change_start_last : number
ht_likams_change_eval_start : number
ht_likams_first_measure : number
// Pain?
// Again, unclear what this is, or which protocol is used.
ht_verkir_first_measure : number
ht_verkir_last_measure : number
ht_verkir_change_eval_start : number
ht_verkir_change_last : number
ht_verkir_change_start_last : number
// General Health, presumably self assessed overall (physical) health,
// or combined overall physical and psychological health?
// Again, unclear what this is, and how it is measured.
ht_heils_change_eval_start : number
ht_heils_change_start_last : number
ht_heils_first_measure : number
ht_heils_last_measure : number
ht_heils_change_last : number
// There is a need perhaps for greater clarity in our own future implementations
// about how this assessment differs from physical health, i.e. 'ht_likams' or 'mean_physical'
// or even how 'ht_heils' is distinguished from measures of 'psychological health'
// do we consciously embrace a model of health that separates 'physical' and 'psychological',
// especially if 'pain' is also another separate metric
// We also need to ensure that our questions / data gathering with the participant on this front are clear,
// and if we are using these overlapping metrics that maybe we test to see if they are correlated?
// Can 'health' really usefully be reduced to a single metric, and if so, on what scale?
// More generic health metadata, how many times ht has been assessed?
ht_last_measure_time : Date
ht_count : number
ht_before_start : number
ht_mean_between : number // Could also be a DateInterval
// Depression
// Assessment protocol unclear - and correlation / relationship to DASS Depression unclear
ht_dep_first_measure : number
ht_dep_last_measure : number
ht_dep_change_start_last : number
ht_dep_change_eval_start : number
ht_dep_change_last : number
// Ein - alone? loneliness?, assessed using which protocol?
// Assessment protocol unclear
ht_einb_first_measure : number
ht_einb_last_measure : number
ht_einb_change_start_last : number
ht_einb_change_eval_start : number
ht_einb_change_last : number
// lið - team(?) an - without(?) - autonomy? / independence
// Or something else teamwork/collaboration-related? Just guessing..
// Assessment protocol unclear
ht_lidan_first_measure : number
ht_lidan_last_measure : number
ht_lidan_change_start_last : number
ht_lidan_change_eval_start : number
ht_lidan_change_last : number
// Sjálfsagi (self-discipline) or sjalfseing (self-reflection)
// - we name both of these as variables in some documentation,
// yet there is only one variable with 'sjalfs' so unclear which one this is.
// Assessment protocol unclear
ht_sjalfs_first_measure : number
ht_sjalfs_last_measure : number
ht_sjalfs_change_start_last : number
ht_sjalfs_change_eval_start : number
ht_sjalfs_change_last : number
// Kvíði - Anxiety
// Assessment protocol unclear
ht_kvidi_first_measure : number
ht_kvidi_last_measure : number
ht_kvidi_change_start_last : number
ht_kvidi_change_eval_start : number
ht_kvidi_change_last : number
// Svefn - Sleep
// Assessment protocol unclear
ht_svefn_first_measure : number
ht_svefn_last_measure : number
ht_svefn_change_eval_start : number
ht_svefn_change_start_last : number
ht_svefn_change_last : number
// þrek/threk - (Physical) Endurance / Stamina?
// Assessment protocol unclear
ht_threk_first_measure : number
ht_threk_last_measure : number
ht_threk_change_eval_start : number
ht_threk_change_start_last : number
ht_threk_change_last : number
// Struggling on this one, but maybe samsköpun? - collaboration?
// Assessment protocol unclear
ht_samsk_first_measure : number
ht_samsk_last_measure : number
ht_samsk_change_eval_start : number
ht_samsk_change_start_last : number
ht_samsk_change_last : number
// Financial Situation? or Financial Self Competence
// Assessment protocol unclear
ht_fjarh_first_measure : number
ht_fjarh_last_measure : number
ht_fjarh_change_last : number
ht_fjarh_change_start_last : number
ht_fjarh_change_eval_start : number
// Disability Measures
// Unclear how this is assessed?
disability_proportion : number
icd_count_other : number
// Relatively easy to discern what these are, but less so how
// the data is measured or summarised for these purposes.
// M - Musculoskeletal system and connective tissue
icd_count_M : number
// J - Respiratory diseases and symptoms
icd_count_J : number
// I - Circulatory diseases and symptoms
icd_count_I : number
// F - Mental and behavioural disorders
icd_count_F : number
// Don't know what this is, or how it is derived.
icd_count_variation : number
// SPS - Social Phobia Score
sps_before_start : number
sps_first_measure : number
sps_mean_between : number
sps_last_measure_time : Date
sps_count : number
sps_last_measure : number
// Young Person ADHD Test Score - Conners-Wells’ Adolescent Self-Report Scale?
adhd_young_total_score : number
adhd_young_total_sdev : number
// athygli - attention
adhd_young_athygli_score : number
adhd_young_athygli_sdev : number
// ofvirkni - hyperactivity
adhd_young_ofvirkni_sdev : number
adhd_young_ofvirkni_score : number
// Adult ADHD Test Score
adhd_adult_total_score : number
adhd_adult_total_sdev : number
// athygli - attention
adhd_adult_athygli_score : number
adhd_adult_athygli_sdev : number
// ofvirkni - hyperactivity
adhd_adult_ofvirkni_sdev : number
adhd_adult_ofvirkni_score : number
// DASS - Depression, Anxiety and Stress Score
// - does that mean ht_dep is not depression?
dass_count : number
dass_mean_between : number
dass_before_start : number
dass_last_measure_time : Date
// DASS - Stress
dass_first_measure_stress : number
dass_last_measure_stress : number
dass_change_eval_start_stress : number
dass_change_start_last_stress : number
// DASS - Anxiety
dass_first_measure_anxiety : number
dass_last_measure_anxiety : number
dass_change_eval_start_anxiety : number
dass_change_start_last_anxiety : number
// DASS - Depression
dass_first_measure_depression : number
dass_last_measure_depression : number
dass_change_eval_start_depression : number
dass_change_start_last_depression : number
// SIAS - Social Interaction Anxiety Score
sias_last_measure : number
sias_mean_between : number
sias_first_measure : number
sias_before_start : number
sias_last_measure_time : Date
sias_count : number
// Rosenborg Self-Efficacy Scale
rose_first_measure : number
rose_last_measure : number
rose_count : number
rose_before_start : number
rose_change_eval_start : number
rose_change_start_last : number
rose_last_measure_time : DateTime
rose_change_last : number
rose_mean_between : number // Or possibly DateInterval
// DSIM - Autism Assessment Score
dsim_score : number
dsim_sdev : number
}
After identifying the distinct variables it was possible to identify a number of patterns, and potential abstractions in the vector.
Chief amongst these was the recurrence of mean
, stddev
and range
amongst many of the underlying variables.
interface Statistical_Variable {
mean : number
stddev : number
range : number
linear_gradient? : number
}
And also that of first_measure
, last_measure
, change_last
, change_eval_start
, change_start_last
interface Temporal_Variable {
first_measure : number
last_measure : number
change_last : number
change_eval_start : number
change_start_last : number
}
Thus, most of the subsequently discussed and visualised data points, could inherit or extend parent classes, which derive these statistical or temporal data representations from the underlying raw dataset / database. Or, any subsequent re-creation / implementation of the Pathway Generator
in another context would need to provide functions and APIs that handle, and deliver these data representations as part of the process of pulling user data.
class Physical_Health implements Statistical_Variable {
...
}
...
class DSIM_Autism_Score implements Statistical_Variable {
...
}
// OR
class Financial_Situation implements Temporal_Variable {
...
}
Visualising the JANUS Model
Once these statistically derived variables have been stripped out, and further to the process of identifying the included data variables and how they are derived, it is next necessary to move up several conceptual levels - and with that, address more domain specific knowledge, to explore the different facets of service user data, or user capability assessment data that form the vector.
To assist with converting this flat data structure into something easier to understand, communicate and navigate I have explored clustering a number of explicitly or implicitly similar variables into thematically coherent groups. This, it is hoped can assist in the identification of overlapping and conceptually similar clusters of data, and identify potential sources of said data, and thereafter, help us to build engagement / data gathering processes around these data sources.
The identification of such thematic clusters could also form the basis of some sort of distributed or federated data architecture, and thus is a key step (I hope) in helping identify where, given my own research interests, I can potentially personally contribute to the TPG project moving forward.
This data representation contains a lot of apparently overlapping and duplicated variables, or at least, variables that seem to assess similar conceptual facets of users' health-related quality of life / biopsychosocial health.
So if we remove variables that are clearly duplicates, and nest certain variables in others that are clearly overarching - Anxiety, Stress and Depression can be nested in DASS for example - we can 'simplify' the model to the following:
Visualising the Patient Vector
Next Steps
More to come...
Review of Comparable Models of Vocational Rehabilitation
Work ongoing, see emergent results and discussion here.
Identification of Data Sources
Development of Synthetic Data
References
Lovibond, S.H.; Lovibond, P.F. (1995), Manual for the Depression Anxiety Stress Scales (2nd ed.), Sydney: Psychology Foundation (Available from The Psychology Foundation, Room 1005 Mathews Building, University of New South Wales, NSW 2052, Australia)
Siggeirsdottir, K., Brynjolfsdottir, R.D., Haraldsson, S.O., Vidar, S., Gudmundsson, E.G., Brynjolfsson, J.H., Jonsson, H., Hjaltason, O. and Gudnason, V. (2016). Determinants of outcome of vocational rehabilitation, Work, Vol. 55, pp. 577-583, DOI: 10.3233/wor-162436, IOS Press http://doi.org/10.3233/wor-162436