// ***************************************************************************** // PROGRAM - OMCA1 // Created by Jane Fry and Clare Boulton, Productivity Commission, Melbourne // August, 2013. // ***************************************************************************** // USING THE HILDA CALENDAR DATA EXTRACTED IN "activity_month_wide.dta" // THIS PROGRAM PERFORMS OPTIMAL MATCHING AND THE FIRST STAGE OF CLUSTER // ANALYSIS RESULTING IN THE PRODUCTION OF A DENDROGRAM. THE DENDROGRAM MUST BE // ANALYSED TO DETERMINE THE NUMBER OF PATHWAYS BEFORE THE OMCA2 PROGRAM CAN // BE RUN. // Reshape data to long file reshape long activity, i(xwaveid) j(order) // Tell Stata that data are sequence data. SQ suite of programs must be available // for this command to work. sqset activity xwaveid order // Generate a measure of dissimilarity using optimal matching //************************* // if sqom (below) doesn't run, you need to rebuild the Mata library index // by running the next 3 lines of code. //mata // This is usually necessary if not using the latest version of Stata available //mata mlib index //end //************************* sqom, name(om1) // OM between each sequence and most frequent sequence // in dataset. // om1 variable is used in sequence index plots. // OM using subcost matrix determined by probability of transitions // occuring in data sqom, subcost(meanprobdistance) full k(2) // OM between each unique pairing // of sequences. // this is what is used in clustering // Distance matrix can be saved so that don't have to re-run sqom before re-doing // dendrogram or cluster analysis. Use the following save/use commands as appropriate. // sqom save "youths.mmat", replace // sqom save "young adults.mmat", replace // sqom save "mature adults.mmat", replace // sqom save "seniors.mmat", replace // Use saved distance matrix: // sqom use "youths.mmat", replace // sqom use "young adults.mmat", replace // sqom use "mature adults.mmat", replace // sqom use "seniors.mmat", replace sqclusterdat // Clustering based on Ward's method: clustermat wardslinkage SQdist, name(distWARD) add // ***************************************************************************** // DENDROGRAM // ***************************************************************************** // Draw dendrogam. Note that the 'cutnumber' option specifies how many of the final stages // of the agglomerative clustering process will be shown on the dendrogram. // 20 is usually all that is required. cluster tree distWARD, cutnumber(20) // Inspect dendrogram to decide on the number of clusters (pathways) and // return to master OMCA file.