// ***************************************************************************** // PROGRAM - master OMCA // Created by Jane Fry and Clare Boulton, Productivity Commission, Melbourne // August, 2013. // ***************************************************************************** // THIS PROGRAM IS A MASTER DO FILE THAT LINKS WITH THREE OTHER DO FILES TO // EXTRACT THE HILDA CALENDAR DATA INTO A SINGLE DATA FILE, CONDUCT OPTIMAL // MATCHING AND CLUSTER ANALYSIS (OMCA), AND PRODUCE PLOTS AND CHARTS FOR THE // RESULTING PATHWAYS. // NOTE: THIS CODE IS DESIGNED TO WORK WITH HILDA RELEASE 10.0 (CONFIDENTIALISED) // DATA FILES. THE CODE WILL NEED TO BE ADAPTED TO EXTRACT THE CALENDAR DATA INTO // A SINGLE DATA FILE FOR OTHER RELEASES OF HILDA DATA. // The OMCA1 and OMCA2 programs use commands from the SQ suite written by // Brzinsky-Fay, C., Kohler, U. and Luniak, M. 2006, 'Sequence analysis with Stata', // The Stata Journal, vol. 6, no. 4, pp. 435-60. // Running the following command will download the most recent version of this Stata ado-package: // ssc install sq // Alternatively, the SQ program suite can be downloaded from the following website // (maintained by Brendan Halpin): // http://teaching.sociology.ul.ie/seqanal/ // Before running the programs users must specify directories and the file // pathway for the HILDA data. // specify location of datafiles, programs and results cd "H:\Hilda R10\100c" log using "Master_OMCA.log", replace // The clustering commands below and subsequent descriptive analysis is set up for // youths with 5 pathways. For a different age group, or with a different number // of pathways, changes are needed as indicated below and in the other programs // (OMCA1 and OMCA2). // ***************************************************************************** // Aggregate HILDA data into monthly data, then merge into a single wide dataset // ***************************************************************************** do "10yrs HILDA calendar" // This program creates the file "activity_month_wide.dta" // containing the calendar data for analysis. // ***************************************************************************** // Optimal matching and cluster analysis example: YOUTHS // ***************************************************************************** use "activity_month_wide.dta", clear // Restrict sample to youths (for example) keep if w1age==1 // ssc install sq // run this command to install SQ program suite // Perform optimal matching and generate a dendrogram do "OMCA1" // Save and analyse dendrogram for youths graph save "youth dendrogram", replace // Decide on the number of clusters and generate indicator // variable for the pathway number (5 pathways in this example) cluster generate groupWARD_5=groups(5), name(distWARD) sqclusterdat, return // Create pathway labels to use label define pathno 1 "pathway 1" 2 "pathway 2" 3 "pathway 3" /// 4 "pathway 4" 5 "pathway 5" label values groupWARD_5 pathno // Descriptive analysis (sequence index plots, determine the concentration of // sequences in the data & produce data for chronographs) do "OMCA2" log close