This site is powered and sponsored by OnlineGroups.net where groups can collaborate easily using email.

Biodata Management Guide Reference

23 December 2013 by Dan Randow. Licensed under CC BY 3.0. with financial assistance from the Terrestrial and Freshwater Biodiversity Information System (TFBIS) Programme (Project 263: Biodata Management Framework: Phase Two), and support from Horizons Regional Council and Dataversity.

This document explains and provides overall maturity criteria for the following.

This document also contains:

Maturity Levels

Definition

Levels of effectiveness and efficiency of data management.

Description

Mature data management aims to achieve two goals.

  1. Data is fit for purpose (effectiveness). The data available is useful for the purpose for which it was collected, and for other purposes that may arise.
  2. Costs are managed (efficiency). The costs of managing data are minimised.

Effective management can provide useful data, even with basic tools. Once fitness for purpose is consistently being achieved, investments in greater maturity can increase efficiency.

The maturity level of data management should be no higher than the level necessary to ensure adequate fitness of purpose of data, within available budgets.

The overall level of data management maturity is determined by the lowest level of maturity of any data management activity. A consistent level of maturity should usually also be achieved across all maturity factors.

This Guide defines the following Maturity Levels.

Resources

Fragmented

Definition

Data management is ad hoc and inconsistent.

Description

Fragmented data management arises in the absence of planning and efforts to ensure consistent data management. Data sets are difficult to find and to use.

Maturity Criteria

Maturity criteria for Fragmented across Data Management Activities and Maturity Factors.

Maturity Level: Fragmented
Data management is ad hoc and inconsistent.
Fragmented Capture
Data capture is ad hoc and inconsistent.
Fragmented Ingest
Ingestion is ad hoc and inconsistent.
Fragmented Store
Storage is is ad hoc and inconsistent.
Fragmented Share
Sharing is ad hoc and inconsistent.
Fragmented Analyse
Analysis is ad hoc and inconsistent.
Fragmented Processes
Processes are ad hoc and inconsistent.
Data is seldom captured following documented processes. Data is seldom ingested following documented processes. Data is seldom stored following documented processes. Data is seldom shared following documented processes. Data is seldom analysed following documented processes.
Fragmented Tools
The use of tools is ad hoc and inconsistent.
Data is captured using varying digital and non-digital tools. No tools are used to control the process of ingesting data. Data is stored in diverse, ad hoc repositories with no record of its location. Data is only shared on request and using ad hoc tools such as emailing a file. Data is analysed using varying digital and non-digital tools.
Fragmented Formats
Unstructured analogue and proprietary digital formats are used.
Data capture is memory-reliant with verbal location description and no written record. Data is ingested in unstructured and analogue or proprietary digital formats. N/A Data is shared in unstructured and analogue or proprietary digital formats. The output of analysis uses unstructured and analogue or proprietary digital formats.
Fragmented Licensing
Licence information is seldom recorded and permissions are inconsistently enforced.
Licence information is not known or recorded at time of capture. No licence information is recorded when data is ingested. Storage system does not record or enforce licence contraints. Data is shared without enforcing constraints or including licence information. Licence information relating to source data is often lost during analysis.
Fragmented Reliability
Reliability is not explicitly managed.
Data provenance information is not recorded when data is captured. Data provenance information is not recorded when data is ingested. Data provenance information is not associated with stored data. Data provenance information is not provided when data is shared. Data provenance information is not provided in the output of data analysis.
Fragmented Standards
No data standard is used.
Data is captured without using standards. Data is ingested without using standards. Data is stored without using standards. Data is shared without using standards. The output of data analysis does not use standards.

Improvised

Definition

Some data management is planned and consistent.

Description

Some data is managed in a planned and consistent way. Those data sets are easy to find and to use.

Data management is expert-driven. In some areas, repeatable processes ensuring that all maturity factors are attended to.

Maturity Criteria

Maturity criteria for Improvised across Data Management Activities and Maturity Factors.

Maturity Level: Improvised
Some data management is planned and consistent.
Improvised Capture
Some data capture is planned and consistent.
Improvised Ingest
Some ingestion is planned and consistent.
Improvised Store
Some storage is planned and consistent.
Improvised Share
Some sharing is planned and consistent.
Improvised Analyse
Some analysis is planned and consistent.
Improvised Processes
Some documented processes are followed.
Data is sometimes captured following documented processes. Data is sometimes ingested following documented processes. Data is sometimes stored following documented processes. Data is sometimes shared following documented processes. Data is sometimes analysed following documented processes.
Improvised Tools
The use of some (usually generic) tools is planned and consistent.
Some data is captured using managed paper forms or digital tools. Data is manually synchronised or migrated. Some data is stored using designated generic tools such as spreadsheets or cloud storage. Some data is shared by migration to a designated shared repository. Some data is analysed using a managed set of tools, usually spreadsheets.
Improvised Formats
Structured analogue and proprietary digital formats are used.
Data is captured using paper forms or forms in a proprietary digital format. Data is ingested in a structured form and a proprietary format. N/A Data is shared in a structured form using proprietary formats. The output of analysis uses structured proprietary formats.
Improvised Licensing
Licence information is sometimes recorded and permissions are sometimes enforced.
Licence information is sometimes recorded at the time of capture. Some licence information is included when data is ingested. Some licence information is explicitly associated with stored data. Permissions are enforced manually. Licence information is sometimes included with shared data. Licence constraints and explicit licences are sometimes managed manually during analysis.
Improvised Reliability
Reliability is sometimes managed.
Data provenance information is sometimes recorded when data is captured. Data provenance information is sometimes recorded when data is ingested. Data provenance information is sometimes associated with stored data. Data provenance information is sometimes provided when data is shared. Data provenance information is sometimes provided in the output of data analysis.
Improvised Standards
Internal data standards are used.
Data captured uses internal standards. Data ingested uses internal standards. Data stored uses internal standards. Data stored uses internal standards. Data stored uses internal standards.

Managed

Definition

Data is consistently managed.

Description

Data is managed consistently. Data is consistently easy to find and to use. The consistency of data management is measured. Data management may involve time-consuming manual work.

Maturity Criteria

Maturity criteria for Managed across Data Management Activities and Maturity Factors.

Maturity Level: Managed
Data is consistently managed.
Managed Capture
Data capture is consistently managed.
Managed Ingest
Data ingestion is consistently managed.
Managed Store
Data storage is consistently managed.
Managed Share
Data sharing is consistently managed.
Managed Analyse
Data analysis is consistently managed.
Managed Processes
Documented processes are consistently followed. Adoption of processes is measured.
Data is consistently captured following a documented process. Data is consistently ingested following a documented process. Management processes consistently provide for availability, backup and archiving. Data is consistently shared following a documented process. Data is consistently analysed following a documented process.
Managed Tools
Generic tools or legacy custom systems are consistently managed.
Data is consistently captured using survey-specific paper or digital tools that support validation. Data is consistently ingested using managed data synchronisation or manual transcription or migration. Data is stored and catalogued using a controlled set of generic tools or a proprietary dedicated tool. Data is consistently shared using well-managed generic tools, or a proprietary dedicated tool. Data is analysed using controlled generic tools or a proprietary dedicated tool.
Managed Formats
Structured open formats are used.
Data is captured using open digital structured data or media formats. Data is ingested in structured open formats. N/A Data is shared in structured and open formats. The output of data analysis is in structured and open formats.
Managed Licensing
Licence information is consistently recorded and permissions are consistently enforced.
Licence information is consistently recorded at the time of capture. Licence information is consistently included when data is ingested. Licence information is consistently associated with stored data. Permissions are managed consistently using manual processes. Licence is explicit on shared data. Licence constraints and explicit licences are consistently managed manually during analysis.
Managed Reliability
Reliability is consistently managed.
Data provenance information is consistently recorded when data is captured. Data provenance information is consistently recorded when data is ingested. Data provenance information is consistently associated with stored data. Data provenance information is consistently provided with shared data. Data provenance information is consistently provided with the output of data analysis.
Managed Standards
A data standard, external where one exists, is consistently used and referenced.
Data is captured using referenced external standards. Data ingested uses, or is transformed to use referenced external standards. Data stored uses referenced external standards. Data shared uses referenced external standards. The outputs of data analysis uses referenced external standards.

Automated

Definition

Tools enable consistent and efficient data management.

Description

Tools are used to increase the efficiency of consistent data management. Data can be consistently accessed and used with little manual work.

Maturity Criteria

Maturity criteria for Automated across Data Management Activities and Maturity Factors.

Maturity Level: Automated
Tools enable consistent and efficient data management.
Automated Capture
Tools enable consistent and efficient data capture.
Automated Ingest
Tools enable consistent and efficient data ingestion.
Automated Store
Tools enable consistent and efficient data storage.
Automated Share
Tools enable consistent and efficient data sharing.
Automated Analyse
Tools enable consistent and efficient data analysis.
Automated Processes
Tools enable consistent adherence to documented processes. Processes are maintained.
Tools support standard data capture processes. Tools support the ingestion process by tracking and controlling changes. Data catalogue and repository are managed as part of organisation's processes. Tools support data sharing following standard processes. Tools support data analysis following standard processes.
Automated Tools
Recognised off-the-shelf or well-maintained custom tools are used.
Data is captured using specialised digital tools with integrated GPS. Tools support a controlled ingestion workflow with authorisation, auditing, and roll-back. Data and metadata are held in well-maintained specialised systems. Data and metadata are shared dynamically by data storage systems. Data analysis tools are dynamically linked to the data repository.
Automated Formats
Individual rows are dynamically available with universally unique IDs (UUIDs).
Records are assigned a unique ID on capture. IDs are verified as globally unique when data is ingested. N/A Data shared uses UUIDs. Data catalogue can be federated. UUIDs are tracked through the data analysis process.
Automated Licensing
Licence information is handled and permissions are enforced automatically.
Licence information is recorded at row level at the time of capture. Licence information is recorded at row level when data is ingested. Licence information is stored at row level. Licence information is recorded at row level and permissions are enforced automatically when data is shared. Analysis systems dynamically maintain explicit row level licence information.
Automated Reliability
Reliability indices are used.
Data is captured using a process with a reliability index. A reliability index is given to each dataset ingested. A reliability index is associated with each stored dataset. A reliability index is provided with shared data. A reliability index is provided with the output of data analysis.
Automated Standards
References to standards used are machine-readable.
Data is captured using terms that are linked to authorities. Data is ingested using terms that are linked to authorities. Data is stored using terms that are linked to authorities. Data is shared using terms that are linked to authorities. Outputs of data analysis uses terms that are linked to authorities.

Integrated

Definition

All systems support consistent and efficient data management.

Description

Effective and efficient data management is built in to the regular functioning of all systems. Data is easily exchanged.

Process and the systems to support them are under ongoing review and improvement.

Maturity Criteria

Maturity criteria for Integrated across Data Management Activities and Maturity Factors.

Maturity Level: Integrated
All systems support consistent and efficient data management.
Integrated Capture
Data capture systems are integrated with other systems.
Integrated Ingest
Data ingestion systems are integrated with other systems.
Integrated Store
Data storage systems are integrated with other systems.
Integrated Share
Data sharing systems are integrated with other systems.
Integrated Analyse
Data analysis systems are integrated with other systems.
Integrated Processes
Processes and their support by tools are under ongoing review and improvement.
Data capture processes and their support by tools are under ongoing review and improvement. Data ingestion processes and their support by tools are under ongoing review and improvement. Management of data catalogue and repository is under ongoing review. Data sharing processes and their support by tools are under ongoing review and improvement. Data analysis processes and their support by tools are under ongoing review and improvement.
Integrated Tools
Modular, interoperable, cross-platform tools are managed by organisational IT processes.
Field tools support automated measurement and identification. Ingestion system dynamically updates to reflect changes to remote datasets. Data storage tools integrate easily with others via open interfaces. Data is exposed for federation and can be externally harvested. Data is analysed using tools that are closely integrated with local and remote data systems.
Integrated Formats
Data uses RDF.
Data is captured as XML or JSON. Data is ingested as XML or JSON. N/A Data is shared using RDF. Outputs of data analysis use RDF.
Integrated Licensing
All systems handle licences that vary with aggregation, and over elapsed time.
Resolution-dependent licence information is recorded at the time of capture. Resolution-dependent licence information is recorded as data is ingested. Resolution-dependent licence information is recorded with stored data. Resolution-dependent licence information is recorded with shared data. Analysis systems automate the enforcement of licence constraints at varying levels of aggregation.
Integrated Reliability
Data depreciation is supported.
Data is captured using a process with a depreciation profile. A depreciation profile is associated with all data ingested. The data storage system periodically recalculates reliability indices to reflect depreciation. Only data of high reliabililty is automatically shared only with users who are not qualified to use unreliable data. Data analysis automatically aggregates the reliability indices of source data sets.
Integrated Standards
Only well-established standards are used.
Data captured uses well-established standards. Data ingested uses well-established standards. Stored data uses well-established standards. Shared data uses well-established standards. The outputs of data analysis uses well-established standards.

Data Management Activities

Definition

Events in the data lifecycle when data is produced, transformed or consumed.

Description

Data management involves various activities that are carried out at different stages in the life of a dataset. These activities may be carried out in linear or cyclic ways. For example, analysis may produce new data that is ingested.

Maturity of biodata management should be consistent across all data management activities.

This Guide defines the following Data Management Activities.

Capture

Definition

Data is recorded in the field or at a desk.

Description

The aim of data capture is record data representing observations or measurements completely and correctly.

Observations and measurements are made in ways ranging from a casual observation to a methodical regional survey. The survey method used is outside the scope of this Guide.

Data is often validated at the time of capture.

Most biodata is captured in the field. Some data is captured from samples or from aerial or satellite images.

Field data systems should support Field Staff to achieve the following.

  • Access information about the survey site.
  • Access information about the survey method, including what to observe, what data to record, and how to record it.
  • Easily identify species.
  • Validate data as it is recorded.
  • Capture data using standard organism names, ecosystem assessment criteria and notation.
  • Capture data at the most detailed level that it is available.
  • Capture data about the purpose and method of the survey, who carried out the survey and the conditions under which it was carried out.
  • Capture data about the version of the data capture systems (paper, or electronic) used.
  • Understand and capture data about the copyright and sharing constraints applicable to the data.

Maturity Criteria

Maturity criteria for Capture across Maturity Levels and Maturity Factors.

Data Management Activity: Capture
Data is recorded in the field or at a desk.
Capture Fragmented
Data capture is ad hoc and inconsistent.
Capture Improvised
Some data capture is planned and consistent.
Capture Managed
Data capture is consistently managed.
Capture Automated
Tools enable consistent and efficient data capture.
Capture Integrated
Data capture systems are integrated with other systems.
Capture Processes
Processes ensure that data is captured accurately and comprehensively.
Data is seldom captured following documented processes. Data is sometimes captured following documented processes. Data is consistently captured following a documented process. Tools support standard data capture processes. Data capture processes and their support by tools are under ongoing review and improvement.
Capture Tools
Tools are used to facilitate data capture.
Data is captured using varying digital and non-digital tools. Some data is captured using managed paper forms or digital tools. Data is consistently captured using survey-specific paper or digital tools that support validation. Data is captured using specialised digital tools with integrated GPS. Field tools support automated measurement and identification.
Capture Formats
Structured and open formats are used for captured data.
Data capture is memory-reliant with verbal location description and no written record. Data is captured using paper forms or forms in a proprietary digital format. Data is captured using open digital structured data or media formats. Records are assigned a unique ID on capture. Data is captured as XML or JSON.
Capture Licensing
Licence information is recorded as data is captured.
Licence information is not known or recorded at time of capture. Licence information is sometimes recorded at the time of capture. Licence information is consistently recorded at the time of capture. Licence information is recorded at row level at the time of capture. Resolution-dependent licence information is recorded at the time of capture.
Capture Reliability
Data provenance information is recorded at the time of capture.
Data provenance information is not recorded when data is captured. Data provenance information is sometimes recorded when data is captured. Data provenance information is consistently recorded when data is captured. Data is captured using a process with a reliability index. Data is captured using a process with a depreciation profile.
Capture Standards
Standard terms are used to describe captured data.
Data is captured without using standards. Data captured uses internal standards. Data is captured using referenced external standards. Data is captured using terms that are linked to authorities. Data captured uses well-established standards.

Resources

Also see Data Management Resources for the Regional Council Indicators.

Ingest

Definition

Data is introduced into a primary repository and catalogued.

Description

The aim of ingesting data is to make it easy to find and use the data.

Ingesting data involves two key steps.

  1. Data is introduced into a primary repository.
  2. Data is catalogued.

The data being ingested may be newly captured, or it may be data that already exists in another repository. In some cases, the data itself may not be moved or copied but simply catalogued with metadata describing it and pointing to its location.

Data ingested from a remote repository may be duplicated when it is ingested. In some cases a dynamic connection with a remote repository automatically allows new data in the remote system to become available to the local system.

The ingestion process provides an opportunity to enhance the data in the following ways.

  • Validation, β€œcleaning up” and standardisation of data content.
  • Verification of species identifications.
  • Standardisation of data definitions and structure.
  • Addition of metadata.
  • expert authorisation of the data for ingestion.

All changes that are made to a dataset should be made in a controlled way, with copies of earlier versions kept and catalogued to ensure that the data can be traced back to its origins.

Data may be ingested using manual processes and ad hoc tools, such as a series of spreadsheets. More sophisticated systems have an ingestion workflow integrated with the primary data repository and catalogue.

The data may be ingested with the intention of long term storage, or for the purpose of a specific task. Even in the latter case, a persistent record of the data that was used means that the analysis can be audited, if required.

Maturity Criteria

Maturity criteria for Ingest across Maturity Levels and Maturity Factors.

Data Management Activity: Ingest
Data is introduced into a primary repository and catalogued.
Ingest Fragmented
Ingestion is ad hoc and inconsistent.
Ingest Improvised
Some ingestion is planned and consistent.
Ingest Managed
Data ingestion is consistently managed.
Ingest Automated
Tools enable consistent and efficient data ingestion.
Ingest Integrated
Data ingestion systems are integrated with other systems.
Ingest Processes
Processes ensure that data is ingested consistently.
Data is seldom ingested following documented processes. Data is sometimes ingested following documented processes. Data is consistently ingested following a documented process. Tools support the ingestion process by tracking and controlling changes. Data ingestion processes and their support by tools are under ongoing review and improvement.
Ingest Tools
Tools are used to facilitate ingesting data.
No tools are used to control the process of ingesting data. Data is manually synchronised or migrated. Data is consistently ingested using managed data synchronisation or manual transcription or migration. Tools support a controlled ingestion workflow with authorisation, auditing, and roll-back. Ingestion system dynamically updates to reflect changes to remote datasets.
Ingest Formats
Structured and open formats are used during ingestion.
Data is ingested in unstructured and analogue or proprietary digital formats. Data is ingested in a structured form and a proprietary format. Data is ingested in structured open formats. IDs are verified as globally unique when data is ingested. Data is ingested as XML or JSON.
Ingest Licensing
Licence information is recorded as data is ingested.
No licence information is recorded when data is ingested. Some licence information is included when data is ingested. Licence information is consistently included when data is ingested. Licence information is recorded at row level when data is ingested. Resolution-dependent licence information is recorded as data is ingested.
Ingest Reliability
Data provenance information is associated with data when it is ingested.
Data provenance information is not recorded when data is ingested. Data provenance information is sometimes recorded when data is ingested. Data provenance information is consistently recorded when data is ingested. A reliability index is given to each dataset ingested. A depreciation profile is associated with all data ingested.
Ingest Standards
Standards-compliance is ensured as data is ingested.
Data is ingested without using standards. Data ingested uses internal standards. Data ingested uses, or is transformed to use referenced external standards. Data is ingested using terms that are linked to authorities. Data ingested uses well-established standards.

Resources

Also see Data Management Resources for the Regional Council Indicators.

Store

Definition

Data and metadata are retained for the required term.

Description

The aim of data storage is to ensure that data and metadata are available when they are needed.

Storage requires a data repository and a data catalogue.

The data repository stores the data. It can be made up of a number of separate repositories. Ideally, a limited and explicitly defined set of repositories is designated as a primary repository.

A data catalogue contain a record of each dataset that is stored in the repository, with information describing the dataset.

The technology used for the data repository could be shelves, filing cabinets, folders in a file system or an online repository such as Google Drive or DropBox. It could be an EDMS, custom database, digital asset management system or a combination of these.

Permissions may be handled by keeping data in separate repositories with different privacy settings, or by a single repository that supports different privacy settings.

The data catalogue may use a spreadsheet or a specialised data catalogue tool, or it may be integrated into the data repository.

Maturity Criteria

Maturity criteria for Store across Maturity Levels and Maturity Factors.

Data Management Activity: Store
Data and metadata are retained for the required term.
Store Fragmented
Storage is is ad hoc and inconsistent.
Store Improvised
Some storage is planned and consistent.
Store Managed
Data storage is consistently managed.
Store Automated
Tools enable consistent and efficient data storage.
Store Integrated
Data storage systems are integrated with other systems.
Store Processes
Processes ensure that data is reliably stored.
Data is seldom stored following documented processes. Data is sometimes stored following documented processes. Management processes consistently provide for availability, backup and archiving. Data catalogue and repository are managed as part of organisation's processes. Management of data catalogue and repository is under ongoing review.
Store Tools
Tools are used to facilitate data storage.
Data is stored in diverse, ad hoc repositories with no record of its location. Some data is stored using designated generic tools such as spreadsheets or cloud storage. Data is stored and catalogued using a controlled set of generic tools or a proprietary dedicated tool. Data and metadata are held in well-maintained specialised systems. Data storage tools integrate easily with others via open interfaces.
Store Formats
N/A
N/A N/A N/A N/A N/A
Store Licensing
Licence information is recorded with stored data.
Storage system does not record or enforce licence contraints. Some licence information is explicitly associated with stored data. Licence information is consistently associated with stored data. Licence information is stored at row level. Resolution-dependent licence information is recorded with stored data.
Store Reliability
Data provenance information is associated with stored data.
Data provenance information is not associated with stored data. Data provenance information is sometimes associated with stored data. Data provenance information is consistently associated with stored data. A reliability index is associated with each stored dataset. The data storage system periodically recalculates reliability indices to reflect depreciation.
Store Standards
Standard terms are used to describe stored data.
Data is stored without using standards. Data stored uses internal standards. Data stored uses referenced external standards. Data is stored using terms that are linked to authorities. Stored data uses well-established standards.

Resources

Also see Data Management Resources for the Regional Council Indicators.

Share

Definition

Data and metadata are available to the required internal and external people and systems.

Description

The aim of data sharing is to ensure that data is easily available to people who are authorised to access it.

Data may be shared with people within or outside the organisation managing the data, or shared directly to other computer systems.

Data may be shared using manual processes and ad hoc tools such as email. More sophisticated systems have human and machine readable interfaces for data-sharing integrated with the primary data repository and catalogue.

Metadata should make it easy to discover the data, for example by browsing a data catalogue or by viewing links to the data in GIS layers.

Data-sharing systems should ideally capture information about what data has been shared, with whom, and any feedback offered by those who have accessed the data.

Maturity Criteria

Maturity criteria for Share across Maturity Levels and Maturity Factors.

Data Management Activity: Share
Data and metadata are available to the required internal and external people and systems.
Share Fragmented
Sharing is ad hoc and inconsistent.
Share Improvised
Some sharing is planned and consistent.
Share Managed
Data sharing is consistently managed.
Share Automated
Tools enable consistent and efficient data sharing.
Share Integrated
Data sharing systems are integrated with other systems.
Share Processes
Processes ensure that data is shared consistently.
Data is seldom shared following documented processes. Data is sometimes shared following documented processes. Data is consistently shared following a documented process. Tools support data sharing following standard processes. Data sharing processes and their support by tools are under ongoing review and improvement.
Share Tools
Tools are used to facilitate data sharing.
Data is only shared on request and using ad hoc tools such as emailing a file. Some data is shared by migration to a designated shared repository. Data is consistently shared using well-managed generic tools, or a proprietary dedicated tool. Data and metadata are shared dynamically by data storage systems. Data is exposed for federation and can be externally harvested.
Share Formats
Structured and open formats are used for shared data.
Data is shared in unstructured and analogue or proprietary digital formats. Data is shared in a structured form using proprietary formats. Data is shared in structured and open formats. Data shared uses UUIDs. Data catalogue can be federated. Data is shared using RDF.
Share Licensing
Licence information is enforced and recorded with shared data.
Data is shared without enforcing constraints or including licence information. Permissions are enforced manually. Licence information is sometimes included with shared data. Permissions are managed consistently using manual processes. Licence is explicit on shared data. Licence information is recorded at row level and permissions are enforced automatically when data is shared. Resolution-dependent licence information is recorded with shared data.
Share Reliability
Data provenance information is associated with shared data.
Data provenance information is not provided when data is shared. Data provenance information is sometimes provided when data is shared. Data provenance information is consistently provided with shared data. A reliability index is provided with shared data. Only data of high reliabililty is automatically shared only with users who are not qualified to use unreliable data.
Share Standards
Standard terms are used to describe shared data.
Data is shared without using standards. Data stored uses internal standards. Data shared uses referenced external standards. Data is shared using terms that are linked to authorities. Shared data uses well-established standards.

Resources

Also see Data Management Resources for the Regional Council Indicators.

Analyse

Definition

Datasets are combined, compared, summarised and presented.

Description

The aim of analysis is to use data, often originating from more than one source, to answer specific questions while maintaining the integrity of the data.

The output of analysis is new dataset that should be ingested.

The output of data analysis can only be as fit for purpose as its input datasets. Mature data management ensures that the fitness for purpose of the data is not degraded in the analysis process.

Common data analysis tasks include the following.

  • Compare two or more datasets with each other or with other datasets to determine relationships between them. Data about species or ecosystems may, for example, be related to environmental or climatic data, or to status under a designations, set of priorities or work programme.
  • Compare data using spatial, temporal and taxonomic criteria.
  • Compare data at varying levels of spatial, temporal and taxonomic resolution.
  • Build and save queries for routine analyses.
  • Automatically interpret data using specific indicators such as ecological priority.
  • Statistically interpolate data. Generate visualisations, legends, layers, maps and interactive models.
  • Integrate a biodata system with a decision-support system.
  • Compare datasets that have limited comparability due to variations in sampling methods and data standards.
  • Distinguish duplicate records across datasets.
  • Review the process and original data sets underlying analysed data to determine their fitness for purpose.

Maturity Criteria

Maturity criteria for Analyse across Maturity Levels and Maturity Factors.

Data Management Activity: Analyse
Datasets are combined, compared, summarised and presented.
Analyse Fragmented
Analysis is ad hoc and inconsistent.
Analyse Improvised
Some analysis is planned and consistent.
Analyse Managed
Data analysis is consistently managed.
Analyse Automated
Tools enable consistent and efficient data analysis.
Analyse Integrated
Data analysis systems are integrated with other systems.
Analyse Processes
Processes ensure that data is analysed consistently.
Data is seldom analysed following documented processes. Data is sometimes analysed following documented processes. Data is consistently analysed following a documented process. Tools support data analysis following standard processes. Data analysis processes and their support by tools are under ongoing review and improvement.
Analyse Tools
Tools are used to facilitate data analysis.
Data is analysed using varying digital and non-digital tools. Some data is analysed using a managed set of tools, usually spreadsheets. Data is analysed using controlled generic tools or a proprietary dedicated tool. Data analysis tools are dynamically linked to the data repository. Data is analysed using tools that are closely integrated with local and remote data systems.
Analyse Formats
Structured and open formats are used for the outputs of analysis.
The output of analysis uses unstructured and analogue or proprietary digital formats. The output of analysis uses structured proprietary formats. The output of data analysis is in structured and open formats. UUIDs are tracked through the data analysis process. Outputs of data analysis use RDF.
Analyse Licensing
Licence information is recorded with the data resulting from analysis.
Licence information relating to source data is often lost during analysis. Licence constraints and explicit licences are sometimes managed manually during analysis. Licence constraints and explicit licences are consistently managed manually during analysis. Analysis systems dynamically maintain explicit row level licence information. Analysis systems automate the enforcement of licence constraints at varying levels of aggregation.
Analyse Reliability
Data provenance information is associated with the outputs of data analysis.
Data provenance information is not provided in the output of data analysis. Data provenance information is sometimes provided in the output of data analysis. Data provenance information is consistently provided with the output of data analysis. A reliability index is provided with the output of data analysis. Data analysis automatically aggregates the reliability indices of source data sets.
Analyse Standards
Standard terms are used to describe the output of data analysis.
The output of data analysis does not use standards. Data stored uses internal standards. The outputs of data analysis uses referenced external standards. Outputs of data analysis uses terms that are linked to authorities. The outputs of data analysis uses well-established standards.

Maturity Factors

Definition

Factors that determine the fitness for purpose of biodata.

Description

Mature data management requires that a variety of different but equally important factors are attended to. The Maturity Factors defined here encompass the full scope of the job of data management, regardless of which activity or activities are carried out.

The way that each maturity factor is attended to depends on the requirements of each data management system.

This Guide defines the following Maturity Factors.

Processes

Definition

Processes for managing data are maintained and followed.

Description

The aim of processes is to ensure that data is consistently managed in the best possible way.

Good processes can be used to achieve a high level of data fitness for purpose, even if generic tools are used. They reduce the impact of staff turnover on the long term consistency of data management.

Processes should be used to determine the following.

  • How data management activities are carried out, usually with step by step instructions.
  • The roles and responsibilities of people carrying out data management.

Process documentation should follow a defined format that includes a change log and version numbering.

Maturity Criteria

Maturity criteria for Processes across Maturity Levels and Data Management Activities.

Maturity Factor: Processes
Processes for managing data are maintained and followed.
Processes Fragmented
Processes are ad hoc and inconsistent.
Processes Improvised
Some documented processes are followed.
Processes Managed
Documented processes are consistently followed. Adoption of processes is measured.
Processes Automated
Tools enable consistent adherence to documented processes. Processes are maintained.
Processes Integrated
Processes and their support by tools are under ongoing review and improvement.
Processes Capture
Processes ensure that data is captured accurately and comprehensively.
Data is seldom captured following documented processes. Data is sometimes captured following documented processes. Data is consistently captured following a documented process. Tools support standard data capture processes. Data capture processes and their support by tools are under ongoing review and improvement.
Processes Ingest
Processes ensure that data is ingested consistently.
Data is seldom ingested following documented processes. Data is sometimes ingested following documented processes. Data is consistently ingested following a documented process. Tools support the ingestion process by tracking and controlling changes. Data ingestion processes and their support by tools are under ongoing review and improvement.
Processes Store
Processes ensure that data is reliably stored.
Data is seldom stored following documented processes. Data is sometimes stored following documented processes. Management processes consistently provide for availability, backup and archiving. Data catalogue and repository are managed as part of organisation's processes. Management of data catalogue and repository is under ongoing review.
Processes Share
Processes ensure that data is shared consistently.
Data is seldom shared following documented processes. Data is sometimes shared following documented processes. Data is consistently shared following a documented process. Tools support data sharing following standard processes. Data sharing processes and their support by tools are under ongoing review and improvement.
Processes Analyse
Processes ensure that data is analysed consistently.
Data is seldom analysed following documented processes. Data is sometimes analysed following documented processes. Data is consistently analysed following a documented process. Tools support data analysis following standard processes. Data analysis processes and their support by tools are under ongoing review and improvement.

Tools

Definition

Tools are used to facilitate data management.

Description

Tools are used to improve the efficiency, reliability and reproducibility of biodata management.

Four main sets of tools are used for biodata management:

  1. Field Data Capture Tools. Tools to support field surveys, capture of the data collected and delivery of that data for ingestion.
  2. Ingestion tools to manage the review, validation and correction of data. And tools for automated discovery.
  3. A Data Repository. One or more physical or digital containers where data resides permanently.
  4. A Metadata Repository. A catalogue of datasets containing data about location of the data as well as the purpose, methods, measurements and formats used.
  5. Tools for sharing with both humans and machines.
  6. Analysis Tools. Tools for combining, comparing summarising and visualising data.

The data repository, data catalogue and analysis tools may be separate or integrated. More than one repository may be used to support sharing.

The following considerations apply to digital data management tools.

  • Proven – Uses proven technologies with a good life-expectancy that is aligned with in-house policies.
  • Modular – Architecture is scalable, flexible, extensible, modular and open.
  • Cross-Platform – Has desktop, web and mobile interfaces.
  • Usable – Uses web-standards, and follows accessibility and usability guidelines.
  • Interoperable – Has machine interfaces (ie web services and an API). Supports integration with in-house and external systems that carry out specialised functions that are beyond its scope. Integration should handle authentication and permissions.
  • Licensing – Tools are available cost-effectively under a licence that meets business needs.
  • Support – Support is available from a reliable source.
  • Maintenance – Regular updates are available.
  • Hosting – Hosting provides adequate performance, security, integrity and archiving.

Maturity Criteria

Maturity criteria for Tools across Maturity Levels and Data Management Activities.

Maturity Factor: Tools
Tools are used to facilitate data management.
Tools Fragmented
The use of tools is ad hoc and inconsistent.
Tools Improvised
The use of some (usually generic) tools is planned and consistent.
Tools Managed
Generic tools or legacy custom systems are consistently managed.
Tools Automated
Recognised off-the-shelf or well-maintained custom tools are used.
Tools Integrated
Modular, interoperable, cross-platform tools are managed by organisational IT processes.
Tools Capture
Tools are used to facilitate data capture.
Data is captured using varying digital and non-digital tools. Some data is captured using managed paper forms or digital tools. Data is consistently captured using survey-specific paper or digital tools that support validation. Data is captured using specialised digital tools with integrated GPS. Field tools support automated measurement and identification.
Tools Ingest
Tools are used to facilitate ingesting data.
No tools are used to control the process of ingesting data. Data is manually synchronised or migrated. Data is consistently ingested using managed data synchronisation or manual transcription or migration. Tools support a controlled ingestion workflow with authorisation, auditing, and roll-back. Ingestion system dynamically updates to reflect changes to remote datasets.
Tools Store
Tools are used to facilitate data storage.
Data is stored in diverse, ad hoc repositories with no record of its location. Some data is stored using designated generic tools such as spreadsheets or cloud storage. Data is stored and catalogued using a controlled set of generic tools or a proprietary dedicated tool. Data and metadata are held in well-maintained specialised systems. Data storage tools integrate easily with others via open interfaces.
Tools Share
Tools are used to facilitate data sharing.
Data is only shared on request and using ad hoc tools such as emailing a file. Some data is shared by migration to a designated shared repository. Data is consistently shared using well-managed generic tools, or a proprietary dedicated tool. Data and metadata are shared dynamically by data storage systems. Data is exposed for federation and can be externally harvested.
Tools Analyse
Tools are used to facilitate data analysis.
Data is analysed using varying digital and non-digital tools. Some data is analysed using a managed set of tools, usually spreadsheets. Data is analysed using controlled generic tools or a proprietary dedicated tool. Data analysis tools are dynamically linked to the data repository. Data is analysed using tools that are closely integrated with local and remote data systems.

Formats

Definition

Structured and open formats are used.

Description

The aim of data formats is to enable the data to be easily accessed and manipulated.

The format in which data is stored determines how easy it is to access and manipulate the data. Data in paper notebooks is not as easily used and as data on structured paper forms. Data in proprietary formats that can only be accessed using particular tools and can be lost altogether when the format and its tools become obsolete. Data published in an open format such as a CSV file, for example is more easily accessed than data published using an old version of the MS Access file format.

Tabular data that is published as an image or even in PDF format is not as easily manipulated as data published in a spreadsheet or csv file. The more machine-comprehensible information available, the more easily data is used and exchanged. This is achieved more easily when the meaning (semantic value) of the data is encoded into the structure, and not the data itself. JSON is ideal for data serialisation.

Ideally each record has a universally unique identifier (UUID). All terms used should be linked to an authority. For example, rather than colour=brown, a reference is given to an authoritative definition for "colour", and a defined set of values for "colour" including "brown".

Maturity Criteria

Maturity criteria for Formats across Maturity Levels and Data Management Activities.

Maturity Factor: Formats
Structured and open formats are used.
Formats Fragmented
Unstructured analogue and proprietary digital formats are used.
Formats Improvised
Structured analogue and proprietary digital formats are used.
Formats Managed
Structured open formats are used.
Formats Automated
Individual rows are dynamically available with universally unique IDs (UUIDs).
Formats Integrated
Data uses RDF.
Formats Capture
Structured and open formats are used for captured data.
Data capture is memory-reliant with verbal location description and no written record. Data is captured using paper forms or forms in a proprietary digital format. Data is captured using open digital structured data or media formats. Records are assigned a unique ID on capture. Data is captured as XML or JSON.
Formats Ingest
Structured and open formats are used during ingestion.
Data is ingested in unstructured and analogue or proprietary digital formats. Data is ingested in a structured form and a proprietary format. Data is ingested in structured open formats. IDs are verified as globally unique when data is ingested. Data is ingested as XML or JSON.
Formats Store
N/A
N/A N/A N/A N/A N/A
Formats Share
Structured and open formats are used for shared data.
Data is shared in unstructured and analogue or proprietary digital formats. Data is shared in a structured form using proprietary formats. Data is shared in structured and open formats. Data shared uses UUIDs. Data catalogue can be federated. Data is shared using RDF.
Formats Analyse
Structured and open formats are used for the outputs of analysis.
The output of analysis uses unstructured and analogue or proprietary digital formats. The output of analysis uses structured proprietary formats. The output of data analysis is in structured and open formats. UUIDs are tracked through the data analysis process. Outputs of data analysis use RDF.

Resources

Also see Data Management Resources for the Regional Council Indicators.

Licensing

Definition

Applicable copyright and licence information is recorded and permissions are enforced.

Description

The aim of licensing is to ensure that data is only used in ways that are consistent with the rights granted by the owner of the copyright to the data.

Maturity with respect to licensing does not require that any particular licence is used. It requires that only authorised people can access the data, and that people who access the data can find out who owns copyright to the data and what uses of the data are permitted.

The following permissions information should be associated with all datasets.

  • Copyright Owner – The agency, client, landowner or other party who owns the copyright to the data.
  • Data Licence – The uses of the data that are permitted by the copyright owner.

Almost all biodata is of potential value in protecting biodiversity. Most biodata can be shared with the public without constraints. In some cases, however it is necessary to restrict access to biodata.

Data may be collected under an agreement with a land-owner that restricts sharing. Some data relates to the occurrence of species that could be the target of rare species trafficking or other types of over-exploitation. Some data may be sensitive to stakeholders in ways that are not anticipated.

Resolution-dependent licence constraints vary as data is aggregated spatially, taxonomically or temporally. Access to data may be restricted when specific species, places or times are referred to but unrestricted when viewed at a lower spatial, taxonomic or temporal resolution. In other cases, access to data may expire after a defined period of time.

Where data itself can not be disclosed, the existence of the data should be discoverable.

Data sharing agreements should be used to make the licence explicit when data is captured or shared.

Maturity Criteria

Maturity criteria for Licensing across Maturity Levels and Data Management Activities.

Maturity Factor: Licensing
Applicable copyright and licence information is recorded and permissions are enforced.
Licensing Fragmented
Licence information is seldom recorded and permissions are inconsistently enforced.
Licensing Improvised
Licence information is sometimes recorded and permissions are sometimes enforced.
Licensing Managed
Licence information is consistently recorded and permissions are consistently enforced.
Licensing Automated
Licence information is handled and permissions are enforced automatically.
Licensing Integrated
All systems handle licences that vary with aggregation, and over elapsed time.
Licensing Capture
Licence information is recorded as data is captured.
Licence information is not known or recorded at time of capture. Licence information is sometimes recorded at the time of capture. Licence information is consistently recorded at the time of capture. Licence information is recorded at row level at the time of capture. Resolution-dependent licence information is recorded at the time of capture.
Licensing Ingest
Licence information is recorded as data is ingested.
No licence information is recorded when data is ingested. Some licence information is included when data is ingested. Licence information is consistently included when data is ingested. Licence information is recorded at row level when data is ingested. Resolution-dependent licence information is recorded as data is ingested.
Licensing Store
Licence information is recorded with stored data.
Storage system does not record or enforce licence contraints. Some licence information is explicitly associated with stored data. Licence information is consistently associated with stored data. Licence information is stored at row level. Resolution-dependent licence information is recorded with stored data.
Licensing Share
Licence information is enforced and recorded with shared data.
Data is shared without enforcing constraints or including licence information. Permissions are enforced manually. Licence information is sometimes included with shared data. Permissions are managed consistently using manual processes. Licence is explicit on shared data. Licence information is recorded at row level and permissions are enforced automatically when data is shared. Resolution-dependent licence information is recorded with shared data.
Licensing Analyse
Licence information is recorded with the data resulting from analysis.
Licence information relating to source data is often lost during analysis. Licence constraints and explicit licences are sometimes managed manually during analysis. Licence constraints and explicit licences are consistently managed manually during analysis. Analysis systems dynamically maintain explicit row level licence information. Analysis systems automate the enforcement of licence constraints at varying levels of aggregation.

Resources

Also see Data Management Resources for the Regional Council Indicators.

Reliability

Definition

The user has sufficient information to determine how reliable the data is.

Description

The aim of managing reliability is to enable the user of the data to determine what purpose the data is fit for.

All data is of limited reliability. Almost all data is of some use.

Good data-management is needed, whatever the method used to make observations.

To determine the reliability of a dataset, the user of the data should be able determine the provenance of the data, that is to find out when, how and why the data was collected, and who collected it.

Some systems use crowd-sourced ways to improve reliability, for example where experts can verify citizen observations (NatureWatch), or where data users can report errors (ALA).

Data management systems should provide a mechanism whereby users can provide feedback and corrections, and a process (similar to that used for ingestion) for tracking those data changes.

In some systems a reliability index is associated with each dataset. Only data of higher reliability should be shown to users who are not qualified to interpret data of low reliability. Where data of low reliability is not shown, the existence of the data should be discoverable. More sophisticated systems provide an index of reliability for determining current state.

Data Depreciation

The data depreciation model developed by Bay of Plenty Regional Council functions as follows.

The reliability of data depends on the method used to collect it, the rigour with which that method was followed. The reliability of data as an indicator of current state depends on those things and the age of the data. Data that is highly volatile (eg bird counts) loses reliability faster than data of low volatility (eg GPS references).

Each dataset has a current reliability value denoting its value to non-specialists for determining current state. The system uses the current reliability value to determine the sort order of datasets or to filter out data of low reliability.

When each dataset is ingested, it is assigned an initial reliability value. A casual observation of a bird in a wetland has a lower reliability than a survey carried out by the same person using a rigorous methodology.

The dataset is also assigned a depreciation profile that determines the rate, and changes to the rate at which a dataset depreciates over time. Data does not depreciate in a straight line. It may depreciate slowly for the first ten years, and then more quickly for the ten years after that.

Data depreciation profiles are determined by technical experts for specific dataset types depending on the species group, purpose, methodology and conditions of the collection.

Current values for all datasets are recalculated nightly.

Maturity Criteria

Maturity criteria for Reliability across Maturity Levels and Data Management Activities.

Maturity Factor: Reliability
The user has sufficient information to determine how reliable the data is.
Reliability Fragmented
Reliability is not explicitly managed.
Reliability Improvised
Reliability is sometimes managed.
Reliability Managed
Reliability is consistently managed.
Reliability Automated
Reliability indices are used.
Reliability Integrated
Data depreciation is supported.
Reliability Capture
Data provenance information is recorded at the time of capture.
Data provenance information is not recorded when data is captured. Data provenance information is sometimes recorded when data is captured. Data provenance information is consistently recorded when data is captured. Data is captured using a process with a reliability index. Data is captured using a process with a depreciation profile.
Reliability Ingest
Data provenance information is associated with data when it is ingested.
Data provenance information is not recorded when data is ingested. Data provenance information is sometimes recorded when data is ingested. Data provenance information is consistently recorded when data is ingested. A reliability index is given to each dataset ingested. A depreciation profile is associated with all data ingested.
Reliability Store
Data provenance information is associated with stored data.
Data provenance information is not associated with stored data. Data provenance information is sometimes associated with stored data. Data provenance information is consistently associated with stored data. A reliability index is associated with each stored dataset. The data storage system periodically recalculates reliability indices to reflect depreciation.
Reliability Share
Data provenance information is associated with shared data.
Data provenance information is not provided when data is shared. Data provenance information is sometimes provided when data is shared. Data provenance information is consistently provided with shared data. A reliability index is provided with shared data. Only data of high reliabililty is automatically shared only with users who are not qualified to use unreliable data.
Reliability Analyse
Data provenance information is associated with the outputs of data analysis.
Data provenance information is not provided in the output of data analysis. Data provenance information is sometimes provided in the output of data analysis. Data provenance information is consistently provided with the output of data analysis. A reliability index is provided with the output of data analysis. Data analysis automatically aggregates the reliability indices of source data sets.

Resources

Useful resources.

Also see Data Management Resources for the Regional Council Indicators.

Standards

Definition

Data and metadata use standard terms.

Description

The goal of standards is to enable datasets to be combined or compared. That requires the data to use the same terms to describe the same phenomena.

To facilitate data-sharing internally, with other NZ agencies and via international mechanisms, data should use international standards where they exist.

The following types of data standards are important for biodata management.

  • Data Structure Standards – How the survey protocol is represented in a structure. How the elements relate to each other therefore accurately represent (or don't represent) the protocol. These vary between highly structured and narrow purpose (eg NVS) to unstructured and versatile (eg NatureWatch).
  • Data Definition (or Descriptor) Standards – These define the data elements (eg the names of fields in a table such as "size"). Data capture practices should reflect the what you call the data elements you are collecting. Darwin Core provides data definitions for species occurrence data.
  • Data Content Standards – These determine what what goes in the fields. They may be standards such as ISO dates, Unicode, Latitude and Longitude, or specific vocabularies, pick-lists or species names. With biodata, it is particularly important that standard species names are used.
  • Metadata Standards – These determine how datasets are described. Darwin Core Archive and Ecological Metadata Language (EML) define key metadata standards for biodata. The use of standard metadata makes it easy to list a dataset in standards-compliant data catalogues.
  • Service Interfaces (and Data Exchange Standards) – These define how computers exchange data and include application programming interfaces (APIs) and geospatial data standards such as OGC, WFS and WMS, and the OGC Catalog Service.

Maturity Criteria

Maturity criteria for Standards across Maturity Levels and Data Management Activities.

Maturity Factor: Standards
Data and metadata use standard terms.
Standards Fragmented
No data standard is used.
Standards Improvised
Internal data standards are used.
Standards Managed
A data standard, external where one exists, is consistently used and referenced.
Standards Automated
References to standards used are machine-readable.
Standards Integrated
Only well-established standards are used.
Standards Capture
Standard terms are used to describe captured data.
Data is captured without using standards. Data captured uses internal standards. Data is captured using referenced external standards. Data is captured using terms that are linked to authorities. Data captured uses well-established standards.
Standards Ingest
Standards-compliance is ensured as data is ingested.
Data is ingested without using standards. Data ingested uses internal standards. Data ingested uses, or is transformed to use referenced external standards. Data is ingested using terms that are linked to authorities. Data ingested uses well-established standards.
Standards Store
Standard terms are used to describe stored data.
Data is stored without using standards. Data stored uses internal standards. Data stored uses referenced external standards. Data is stored using terms that are linked to authorities. Stored data uses well-established standards.
Standards Share
Standard terms are used to describe shared data.
Data is shared without using standards. Data stored uses internal standards. Data shared uses referenced external standards. Data is shared using terms that are linked to authorities. Shared data uses well-established standards.
Standards Analyse
Standard terms are used to describe the output of data analysis.
The output of data analysis does not use standards. Data stored uses internal standards. The outputs of data analysis uses referenced external standards. Outputs of data analysis uses terms that are linked to authorities. The outputs of data analysis uses well-established standards.

Resources

Also see Data Management Resources for the Regional Council Indicators.

Data Management Resources for the Regional Council Indicators

Also known as the Tier One Indicators, or simply the Indicators, these provide a framework for monitoring and reporting on terrestrial biodiversity. The Indicators are based on the report Recommended monitoring framework for regional councils assessing biodiversity outcomes in terrestrial ecosystems by William G. Lee, and Robert B. Allen of Landcare Research. That report is in turn based on an earlier report that is the foundation for the Department of Conservation's Natural Heritage Mangement System (NHMS) and Tier One monitoring programme.

This is a list of data management resources for the Regional Council Terrestrial Biodiversity Indicators in the following categories:

State and Condition

Indicator 1: Land under indigenous vegetation

Measure 1 Indigenous land cover (ha, %) of cover classes, habitat types, across LENZ and Ecological District units requires analysis of data obtained from LCDB, LENZ and other external systems.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Indicator 2: Biodiversity condition

Measure 2 Vegetation structure and composition requires data on the native species present in vegetation survey plots.

Measure 3 Avian representation requires data on native bird species.

Measure 5 Vulnerable ecosystems requires extent maps of wetlands, dunes and naturally rare ecosystems.

  • Use in house geospatial capacity to build spatial layers. The key is to use agreed standards on classification of habit type and condition. Good starting points for these are the NZ Wetlands Trust and Susan Wiser's factsheet on uncommonly rare ecosystems.
  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Threats and pressure

Indicator 3: Weed and animal pests

Measure 6 Number of new naturalisations requires analysis of data on alien plants that may flourish and or reproduce in an area.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Measure 7 Distribution and abundance requires analysis of data on species occupancy.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Indicator 4: Habitat loss

Measure 8 Change in area under intensive land use.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Measure 9 Habitat and vegetation loss.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Indicator 5: Climate change

Measure 11 Change in temperature and precipitation (ppt.).

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Effectiveness of policy and management

Indicator 6: Biodiversity protection

Measure 12 Change in extent and protection of indigenous cover or habitats or naturally uncommon ecosystems.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Measure 13 Threatened species habitat.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Measure 14 Vegetation consents compliance.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Indicator 7: Pest management

Measure 15 Indigenous ecosystems released from pests requires analysis of data on spatial extent of land under vertebrate intensive pest control or exclusion that reduce the measured levels to [some] target level.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Measure 16 Change in the abundance of indigenous plants and animals susceptible to introduced herbivores and carnivores.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Indicator 8: Ecosystem services

Measure 17 Extent of indigenous cover in water catchment.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Community engagement

Indicator 9: Protection and restoration

Measure 18 Area and type of biodiversity protection achieved on private land.

  • Consider use of the MAIN Trust DataMap online data collection and analysis system. It is designed to fit with DOC's statistical trap-catch analysis. The online GIS is based upon the DOC Taranaki Excel workbooks and can be customised for each project – as baits or traps differ- as do the pest species caught. View the Datamap user manual.
  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Measure 19 Contribution of initiatives to (i) species translocations and (ii) habitat restoration.

  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Indicator 10: Weed and pest control

Measure 20 Community contribution to weed and animal pest control and reductions.

  • Possible systems to use include NatureSpace.
  • Request for resources: if you willing to share documented procedures, tools or other useful data management resouces for this Measure, please contribute them.

Glossary

5 Star Open Data Measure

The 5 * Open Data scheme developed by Tim Berners-Lee.

Atlas of Living Australia

The Atlas of Living Australia.

Audobon Core

Audobon Core defines metadata for biodiversity multimedia resources and collections.

Australian National Data Service

The Australian National Data Service provides a comprehensive set of Data Management Guides and Resources.

Biodiversity Information Standards (TDWG)

Biodiversity Information Standards (TDWG), also known as the Taxonomic Databases Working Group, is an international collaboration among biological database projects that develops and promotes standards for the exchange of biological/biodiversity data.

CKAN

CKAN is a fully-featured, mature, open source data portal and data management solution.

Darwin Core

Darwin Core and Darwin Core Occurrence

Data Management Maturity Assessment (DMMA) model

The Capability Maturity Model (CMM), developed originally by Carnegie Mellon University, has been adapted by April Reeve for use with the Data Management Association International (DAMA) Data Management Body of Knowledge (DMBOK). In the Data Management Maturity Assessment (DMMA) model, Reeve defines the following five levels for data management process maturity.

  1. Immature (Initial). The best practice activities are not performed by the organization. The best practice tools are not available or not used.
  2. Repeatable (Repeatable). Some parts of organization are using recommended tools and processes while other parts are not.
  3. Managed (Defined). The organization has a documented standard for performing the assessed activity or activities consistently and using applicable tools effectively.
  4. Monitored (Managed). The process in question is established, tracked and monitored. Recommended tools are in place and being used consistently across the organization.
  5. Continuous Improvement (Optimizing). The activity is continually reassessed, improved upon, tracked and built in to process.

data.govt.nz

data.govt.nz is a catalogue of New Zealand government sector datasets.

Digitisation: A strategic approach for natural history collections

The Atlas of Living Australia provides guidance on various aspects of digitisation. In particular, Digitisation: A strategic approach for natural history collections, by Bryan Kalms outlines in detail a strategic approach to digitisation of Natural History collections. While this resource is oriented towards legacy collections, it is applicable to all biodata management projects. It provides particular emphasis on governing, planning, managing and monitoring digitisation initiatives.

Five Minute Bird Count Resources

The Department of Conservation Five Minute Bird Count Resources include a spreadsheet template.

FORMAK

The Forest Monitoring and Assessment Kit (FORMAK) is a monitoring kit designed for immediate use by land owners, landcare groups, community groups and other "hands on" users interested in assessing the condition of New Zealand native forest ecosystems.

Fulcrum

Fulcrum is a generic IOS and Android field data collection application that can be configured for geotagged field capture of biodata.

GeoNetwork

GeoNetwork is a widely-used spatially oriented open source data catalogue. It provides metadata editing and search functions as well as a map view

Global Biodiversity Information Facility (GBIF)

GBIF is an initiative to globally aggregate species occurrence data.

National Vegetation Survey (NVS) Databank

The National Vegetation Survey (NVS) Databank hosted by Landcare Research. The NVS Express Data Entry, Validation and Analysis Tool is a Windows-based tool that can can be used to locally validate, store and analyse 20m x 20m permanent plot data. It also facilitates data export to NVS.

NatureWatch NZ

NatureWatch NZ is a community-oriented website for species observation data.

NatureWatch provides an iPhone app that can be used for field data capture of species observations. The app supports offline use and uses GPS to geotag a single point for each observation. The user can use narrative to describe an area or quantity of individuals. Images can be associated with an observation. Species names are obtain from NZOR. Observation The user can ask for species identifications from experts. Data privacy and licence can be set on capture.

New Zealand Government Open Access Licensing framework

The New Zealand Government Open Access Licensing framework (NZGOAL) provides Guidance Note 2: File Formats which provides practical advice for agencies when selecting the formats for releasing public information and data for re-use.