State and Local Agency Digital Geospatial Data Preservation

|
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
 5 views
of 31

Please download to get full document.

View again

Description
State and Local Agency Digital Geospatial Data Preservation. The North Carolina Experience. Earth Sciences Information Partners (ESIP) Workshop July 8, 2009. Steve Morris NCSU Libraries. NC Geospatial Data Archiving Project (NCGDAP).
Share
Transcript
State and Local Agency Digital Geospatial Data PreservationThe North Carolina ExperienceEarth Sciences Information Partners (ESIP) WorkshopJuly 8, 2009Steve MorrisNCSU LibrariesNC Geospatial Data Archiving Project (NCGDAP)
  • One of eight initial collection building projects in the Library of Congress NDIIPP (National Digital Information Infrastructure and Preservation Program)
  • Lead organizations: North Carolina State University Libraries and North Carolina Center for Geographic Information & Analysis (NCCGIA)
  • Focus:
  • State and local government geospatial data in NC
  • Repository development as catalyst for discussion
  • Goal: Engage spatial data infrastructure in data archiving
  • Initial 3 year project extended to Dec. 2009
  • NCGDAP Data Types – Raster
  • Digital orthophotography
  • Satellite imagery
  • Static dataNCGDAP Data Types – Vector Data
  • Point, line, and polygon
  • Attached attribute data
  • Often updatedImagery = DurableStatic Simple structureMostly open formatsVector data = VolatileFrequent updateComplex structureMostly proprietary formatsImagery = DurableStatic Simple structureMostly open formatsVector data = VolatileFrequent updateComplex structureMostly commercial formatsDowntown Raleigh Near State Capitol2005 Wake County OrthoDowntown Raleigh, NC Near State Capitol2005 Wake County OrthoNote: Percentages based on the actual number of respondents to each questionNCGDAP Data Types – Spatial Databases
  • Vector and raster data
  • Relationships
  • Behaviors
  • Annotation
  • Data Models
  • Geospatial Data: Compelling Issues
  • Dynamic content
  • Constantly updated information
  • Data versioning
  • Digital object complexity
  • Spatially-enabled databases
  • Complicated, multi-component formats
  • Proprietary formats
  • Ingest Challenges: General
  • Data consists of multi-file, multi-format objects
  • Ancillary data files can be shared by datasets
  • Some format conversions involve one-to-many relationships
  • Compressed archive files are common and behave unpredictably
  • And all the usual challenges: format validation, validity checking, threat scanning,…
  • Where is the Dataset?Here’s One!Files
  • Multi-file dataset
  • Georeferencing
  • Metadata file
  • Symbolization file
  • Additional
  • documentation
  • License
  • Disclaimer
  • More
  • Metadata
  • FGDC
  • Acquisition metadata
  • Transfer metadata
  • Ingest metadata
  • Archive rights
  • Archive processes
  • Collection metadata
  • Series metadata
  • Ingest Challenges: Metadata
  • Metadata is encoded in a variety or ways
  • The FGDC content standard for metadata lacked an encoding standard (arrived pre-XML), addressed in ISO 19115/19139 North American Profile implementation
  • XML (varied schemas), TXT, HTML
  • Metadata is missing
  • Only about 25% of local agencies use FGDC
  • Metadata is wrong
  • Metadata is commonly asynchronous with the data
  • Inconsistent use of dataset naming, etc.
  • e.g., “Streets” vs. “Wake County Streets”
  • NCGDAP Metadata Summary
  • Existing geospatial metadata often needs:
  • Remediation – to fix errors or omissions
  • Normalization – to adhere to a standard structure
  • Synchronization – so that the data at hand matches the metadata
  • If no metadata then:
  • Can build minimal metadata using templates and auto-extraction
  • Lose key information such as data quality, lineage, data dictionaries
  • Automating metadata for repository ingest
  • Raster data is easy – large sets of consistently structured files
  • Vector data is hard – each dataset is a different story
  • Many additional administrative and technical metadata elements not accommodated by FGDC
  • Data Receipt Content Producers Format ProcessingIndustryMetadata ProcessingStandards OrganizationsIngest ProcessesExtended Curation: Feedback and OutreachSpatial Data Infrastructure and ArchivingMetadata standards and outreach
  • Metadata quality, best practices
  • Inventories
  • Reduce “contact fatigue”, shareable information store
  • Content exchange networks
  • Leverage more compelling business reasons to put data in motion
  • Automate process, add technical & administrative metadata
  • Framework data communities
  • Snapshot frequency, schemas, format strategies
  • Content Packaging Issues
  • Geospatial datasets are typically complex, multi-file objects
  • Data are often accompanied by ancillary data, which must be associated with the data item
  • Rights information and licenses must be associated with the item
  • Various implementations in different domains (METS, IMS-CP, XFDU, etc.)
  • Simpler .zip-based packages also used (MEF, KMZ, etc.)
  • Spatial Database ApproachesManage database forward over timeExtract data layers to preservable formSet aside archival snapshot of databaseGeoMAPP: Geospatial Multistate Archival and Preservation Partnership
  • Partners (NC, KY, UT, Library of Congress, NCSU):
  • State geospatial organizations
  • State Archives
  • State-to-state and geo-to-Archives collaboration
  • Organizational and technical diversity across states
  • Archives as part of spatial data infrastructure
  • Selection and appraisal processes
  • Retention schedule development
  • Data transfer to archives
  • Development of enhanced business cases
  • NCGDAP Learning OutcomesPreservation of GIS projects is needed to support re-creation of past work Preservation of data representations is needed to document decision-making processes Validation, remediation, and conversion of data and metadata is expensive: push for improvements upstreamSome repositories handle “items”: can result in “atomization” of dataFor vendors, frame data preservation as a “customer problem” -- must build the business caseThank You!Steve MorrisHead, Digital Library InitiativesNorth Carolina State University Librariessteven_morris@ncsu.eduNorth Carolina Geospatial Data Archiving Projecthttp://www.lib.ncsu.edu/ncgdapGeoMAPPhttp://www.geomapp.netDraft of Utah’s GIS to Archives Data Flow
  • All Metadata is completed to FGDC Standards
  • AGRC creates geoPDF files of individual datasets, plus ZIP files of the native format.
  • One ZIP file would contain all the pieces belonging to one shapefile or, alternatively, the file would contain a geodatabase.
  • Geodatabases would not be just one big database with everything in it (multiple series and years).
  • Instead, the native files would be composed of a single downloadable file per series per year.
  • AGRC exports data from SGID and splits out datasets by series. Metadata occasionally incomplete completeLocal governments supply GIS datasets on CD/DVD to AGRC. Metadata often missingAGRC copies these files to Archives’ FTP server.Example FTP Site Structure:
  • ftp.archives-agrc.utah.gov/Archives Metadata harvested to populate Archive’s Finding Aids
  • Biota Dublin Core Metadata
  • Boundaries Dublin Core Metadata
  • MunicipalityRecords-Series-26846Dublin Core Metadata
  • 2000
  • MunicipalBoundaries.zip FGDC Metadata
  • MunicipalBoundaries.pdf FGDCMetadata
  • 2001
  • 2002
  • 2003
  • CountyBoundaries-Series-26845 Dublin Core Metadata
  • 2003
  • 2004
  • Kentucky Metadata Workflow into DSpace and iRODS EnvironmentUNCotherKDLADatabase with Dublin CoreDescriptive and Administrative MetadataMetadata & contententered by agencies using template and modified by ArchivistDSpaceSingle item & batch ingest into DSpace by Archivist Database with Administrative & Preservation MetadataContentFiles iRODSBatch metadata extraction using iRODS rulesPreservation metadata from iRODS rulesDistributedStorage LayerSource Metadata Translation
  • Hub-and-spoke model a la Echo DEPository
  • repository agnostic
  • modular conversion hub
  • facilitate repository software migration & inter-archive exchange
  • GeoMAPP: Geospatial Multistate Archival and Preservation Partnership
  • Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress
  • Partners:
  • State geospatial organizations of Kentucky and Utah
  • State Archives of Kentucky and Utah
  • NCSU Libraries in catalytic/advisory role
  • State-to-state and geo-to-Archives collaboration
  • 2 year project: Nov. 2007-Dec. 2009
  • Archives as part of Spatial Data Infrastructure
  • GeoMAPP: Project Components
  • Introduce GIS organizations and State Archives to each other
  • Archival selection and appraisal processes
  • Retention schedule development
  • Data transfer to archives
  • Development of enhanced business case
  • NC Geospatial Data Archiving Project (NCGDAP)
  • Repository Goal
  • Capture at-risk data
  • Explore technical and organizational challenges
  • Project End Goal
  • Data Producers: Improved temporal data management practices
  • Archives: More efficient means of acquiring and preserving data;
  • Progress towards best practicesTemporal data management vs. long-term preservationGeospatial Data Preservation Challenges
  • Data capture
  • Backups are common, but not long-term archives
  • Producer focus on current data
  • Shift to web services-based access
  • Inadequate or non-existent metadata
  • Consistent NC survey statistics: Only 40% of data producers create and maintain metadata
  • Existing metadata often needs to be normalized, synchronized with the data, and remediated
  • Loss of memory about the data is also a problemOngoing Challenges
  • When to automate and when not to
  • Learn first from human intervention
  • Minimizing risk of error related to human intervention
  • Accepting that ingest packages used will evolve over time (implications for archive?)
  • Handling post-ingest migrations
  • Challenge: Preservation MetadataResults from a 2006 survey of all 100 NC counties and 25 largest NC municipalitiesSome Key Metadata Decisions
  • Capture “transfer set” metadata
  • Normalize, synchronize, and remediate existing metadata, and retain original metadata record
  • Treat contact information as archival
  • Update metadata with format conversions
  • Use ESRI Profile of FGDC
  • added technical and administrative elements
  • Has an XML schema
  • ArcCatalog tool support
  • Use simple rights encoding scheme
  • Record metadata in a workflow management database
  • Digital Preservation in State Government - WilmingtonSIP Item Creation: Workflow
  • Submission Information Package grouping
  • Ontology logic based on defined multi-file complex format components and directory structure
  • Repository-agnostic item grouping
  • Metadata Overview
  • Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata
  • Version one (1994) mandated for use by federal agencies
  • Descriptive metadata, plus some administrative and technical
  • Extensive use at state level, spotty use at local level
  • Problem: content standard without an encoding spec
  • FGDC profiles: ESRI, NBII, Remote Sensing, etc.
  • ISO Standards
  • ISO 19115: Geospatial Information – Metadata (2003)
  • ISO 19139: Geospatial Information – Metadata – XML (2007)
  • North American Profile of ISO to replace FGDC CGDSM
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks