Manipulating data: Deriving variables, handling missing data, and cleaning data - practices, services and standards

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
 14 views
of 14

Please download to get full document.

View again

Description
Manipulating data: Deriving variables, handling missing data, and cleaning data - practices, services and standards. Paul Lambert (Dept. Applied Social Science, Univ. Stirling) Vernon Gayle (Dept. Applied Social Science, Univ. Stirling and ISER, Univ. Essex) 27 th January 2009
Share
Transcript
Manipulating data: Deriving variables, handling missing data, and cleaning data - practices, services and standardsPaul Lambert (Dept. Applied Social Science, Univ. Stirling)Vernon Gayle (Dept. Applied Social Science, Univ. Stirling and ISER, Univ. Essex)27th January 2009Presented to the workshop ‘The significance of data management for social survey research’, University of Essex, a workshop organised by the Economic and Social Data Service (www.esds.ac.uk) and the Data Management through e-Social Science’ research Node of the National Centre for e-Social Science (www.dames.org.uk).Manipulating data
  • Operations performed on datasets by researchers and/or data distributors
  • At any stage of the research lifecycle
  • Of considerable consequence to analytical results
  • DAMES Node:
  • ‘Data Management’ = manipulation of data, and documenting/assisting the processes of manipulation
  • E-Social Science approach to facilitating data manipulation (metadata resources; data access facilities; ‘workflow models’)
  • Deriving variables, handling missing data, and cleaning data..Especially common types of data manipulation..
  • Deriving variables = computing new measures for purposes of analysis
  • E.g. recoding complex categorical variables; standardising scores; linking micro- and macro-data
  • {Creating composite vars., e.g. selection model hazards, propensity scores, weights}
  • Handling missing data = strategies for item or case non-response
  • E.g. imputation approaches; listwise/pairwise deletion
  • {deriving ‘missing variables’ via ‘data fusion’}
  • Clarifying, stating & documenting assumptions (see www.missingdata.org.uk)
  • Cleaning data = monitoring and adjusting responses across a given set of variables
  • E.g. extreme values; erroneous values; re-scaling distributions;
  • In this talk…Practices, services and standards…For deriving variables, handling missing data, and cleaning data…
  • Practices
  • Key, or common, features of current approaches
  • Services
  • Resources available/conceivable
  • Standards
  • Preliminary thoughts on standards setting
  • (i) An brief illustrative example from the UK RAE 2008
  • Research Assessment Exercise data published Dec 2008
  • Extended reporting on basic data by media/within HE sector, e.g.
  • Cambridge leads the way
  • Nursing raises its status
  • Numerous enhancements/amendments to data & analysis could be easily generated, and often lead to a different story
  • Lambert, P.S. and Gayle, V. (2008). Data management and standardisation: A methodological comment on using results from the UK Research Assessment Exercise 2008 , University of Stirling: Technical Paper 2008-3 of the Data Management through e-Social Science Research Node (www.dames.org.uk).
  • …Extending analysis of the 2008 RAE using data manipulations...
  • Deriving variables
  • Commonly used RAE ‘Grade point average’
  • [4.(%4*) + 3.(%3*) + 2.(%2*) + (%1*)] / 100
  • Calculate alternative GPA measures
  • Standardise GPA within Units of Assessment
  • Rate Units of Assessment by external measures of relative ‘prestige’
  • Link with 2001 standard thresholds
  • Other external data – e.g. Univ. typologies; RAE panel membership
  • Cleaning data
  • Of 159 HEI’s, 27 HEIs have only 1 UoA
  • cf.mean 15 UoA’s within HEI, max 53 (Manchester)
  • The single UoA HEI’s often have outlying GPAs
  • Analyses of averages might excluding these HEIs
  • Handling missing data
  • Less conventionally missing data (admin dataset)
  • But - not all HEI staff included within RAE; consider analysis accounting for number of excluded staff..?
  • Conventional RAE 2008 results for Univ. EssexAlternative RAE 2008 measures for Univ. Essex (within- and between-subject standardisations)RAE data manipulations example – practices, services and standards
  • Practices
  • Media/HEI announcements concentrate upon simplistic, unweighted, unstandardised rankings/averages
  • Various alternative measures tell different stories – we found..
  • LSE outranks Cambridge
  • Nursing ranks 6 least prestigious UoA from 67
  • Services
  • Raw data available online: www.rae.ac.uk
  • Relevant supplementary data: www.hesa.ac.uk ; www.dames.org.uk
  • Standards
  • RAE level documentation on grading criteria and approach, www.rae.ac.uk
  • Software based Workflow approach (cf. Scott Long, 2009)
  • In our paper we show Stata syntax for derived variables (www.dames.org.uk)
  • (ii) Some wider thoughts on data manipulation practices, services and standardsCurrently…,
  • Practices are messy and painful
  • Lack of replication and consistency in data manipulation tasks with complex survey data
  • Few people relish data manipulations!
  • Services exist but are under-exploited
  • Standards are not agreed
  • Ignoring standards no barrier to publication
  • Practices: apparent trends Deriving variables, handling missing data, cleaning data
  • More interest in harmonisation and comparability
  • Longitudinal and cross-national data
  • Documentation challenges encourage simplifying approaches
  • New data and analytical opportunities
  • Increasing opportunities for enhancing data by linking at micro- or aggregate level
  • Increasing availability of routines for missing values, extreme values
  • Raising standards in secondary analysis of large scale surveys
  • Inadequacy of simple analyses which ignore multivariate relations, missing data, multiprocess systems, hierarchical structures
  • Data manipulations often conducted outside these considerations
  • Desirability of replication
  • Services: key challenges Deriving variables, handling missing data, cleaning data
  • Software issues
  • Dominance of major proprietary database packages
  • Other specialist/minority packages (e.g. MLwiN)
  • Documentation / replication between packages..?
  • Data security
  • Few services can offer to let experts take over a dataset
  • Approaches to reviewing data ought to avoid inspecting cases, duplicate copies
  • Keeping up-to-date?
  • Finding data - need for search facilities [via metadata]
  • Updating specialist advice
  • E.g. of GEODE, occupational data out of date before completion
  • NSI’s strict focus on contemporary data
  • Standards: key requirements Deriving variables, handling missing data, cleaning data
  • Need for documentation for replication
  • Detailed accounts of process
  • Citation of sources
  • DAMES – to facilitate with metadata and process tools
  • Resolving some difficult debates
  • Approaches to comparative research (measurement equivalence v’s meaning equivalence)
  • Necessary standards for analysis/reporting on missing data
  • Appropriate approaches to extreme values, e.g. robust regressions
  • Forthcoming DAMES contributions
  • Summer workshops on documenting manipulation and analysis of complex survey data
  • ‘To Stata and beyond..’
  • Services for improving data manipulation activities
  • Specialist data on occupations, ethnicity, education
  • Specialist data on social care, mental health
  • Tools for performing data manipulations (linking data and operationalising variables)
  • Services for recording data manipulation activities
  • Workflow modelling tools
  • Metadata records for data linkages and variables
  • Citation information
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks
    SAVE OUR EARTH

    We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

    More details...

    Sign Now!

    We are very appreciated for your Prompt Action!

    x