The following information will guide you through the process of preparing your data sets for long-term archive. Objectives & factors driving this process:
- LBA-ECO final data, fully quality-assured and documented, must be archived at the ORNL DAAC, the designated archive for NASA’s Earth Observing System Biogeochemical Dynamics data. The LBA-ECO Project Office will send copies of the final data and documentation to LBA DIS in Brazil for archive as well.
- The ORNL DAAC is one member of a network of NASA archives comprising the Earth Observing System DIS (EOS DIS), and is committed to observing archive and metadata standards adopted by all participants of EOS DIS.
The LBA-ECO science team and project management decided early in the project not to attempt to create an integrated, consolidated database containing all LBA-ECO data. Instead, the investigator has considerable latitude with regard to internal data management practices. Similarly, the LBA-ECO Project avoided a top-down approach with regard to data quality assurance. It is assumed that you have processed your data and applied quality assurance protocols consistent with those accepted and expected in your peer research community. Project-level standards exist only for metadata and documentation with an eye toward assimilating these data into the NASA archive system.
Keep in mind that these metadata and documentation requirements are designed to support NASA’s data archive goal, i.e. to ensure availability and usefulness of the LBA-ECO data long after the project has ended. Everything you have done to date regarding your data has brought you closer to the project objective of archiving the LBA-ECO data in the NASA longterm archive system. For example, the metadata you registered earlier in the project that were used to facilitate searching in the project’s metadata search engine, Beija-flor, will become part of the final data set documentation at the ORNL DAAC. Once you have prepared the final data set for archive and used the LBA Metadata Editor to complete all updates to the related metadata file, that metadata file will be used to generate the official data set document, i.e. the Data Set User’s Guide, that will accompany your data set when downloaded from the archive. The Data Set User’s Guide will help others understand your data (the experiment design, parameters measured, data collection conditions, instrumentation, calibration and QA procedures, and known problems) so that they can apply your data appropriately to new research questions and objectives.
This website is a resource to help you prepare data for archive. If you have questions or need assistance, contact firstname.lastname@example.org
In addition, we encourage you to consult "Best Practices for Preparing Ecological and Ground-Based Data Sets to Share and Archive"
1.1.1 You may want to change the “granularity” of your data files
Data file granularity is the unit defining what you include in a single data file, e.g.
All data for a single day, or month, or year
All data for an individual site (all years)
All data for all sites (all years)
All data of a given type for a day, or month, or year
All of the various types of data collected for a day, or month, or year
You may want to combine the contents of many similar small-granule data files into a single data file, e.g. if you have hundreds of daily files in the same file format, it would be best to combine these files into monthly or annual files.
Do not try to combine data files containing different sets of variables into a single file. A data file should contain a header section and column headings that label each field contained in the data file. Do not change the headings or variables within the body of a file.
1.1.2 You may need to reconsider what constitutes a “data set”
What you considered a logical data set entity during the early stages of the project may no longer make sense. You probably now have data covering a longer time period, data from additional study sites, complementary data of different types for a particular intensively studied area, and preliminary data have probably been further processed, derived, integrated, and/or synthesized to generate unique, new products.
Each data set in the final archive will have a Data Set User’s Guide to document it. The Data Set User’s Guide is the web-ready, html-formatted documentation exported directly from your LME file. Each metadata file corresponds to one data set and to one Data Set User’s Guide.
Data Set => LME file => Data Set User’s Guide
In the early stages of your project, you may have created a separate metadata file to describe each type of data you were collecting or from each study site where you were working, resulting in several almost identical metadata files describing almost identical data sets, e.g. Soil temperature data at SiteA, Soil temperature data at SiteB…. Now that you are preparing to document the data for archive, you may want to consider using a single metadata file (and therefore a single Data Set User’s Guide) to describe these closely related data and to associate the data files as a single data set. This may eliminate some repetitious cutting & pasting and will produce a more cohesive and useful product at the same time.
Be cautioned, however, that if you attempt to combine too broad a range of data and consider it a “data set”, it may become difficult for you to produce a useful, descriptive document. Similarity of the data should be the determining factor in whether to consolidate or divide data sets. If you find yourself frequently cutting & pasting from one metadata file to another, you could probably consider combining these into a single, related data set. If you find yourself uncertain what to put in the LME data set documentation fields, you may be trying to combine too many disparate data types into a “one-size-fits-all” data set.
For example, you could consider a data set to be all of the data collected during the same intensive field campaign—after all, they are related, but you would probably find it difficult to describe the data processing, data quality measures, instrument calibration, and data characteristics because there were several types of data collected and processed. It would make more sense to define the data sets according to types of data collected during the intensive field campaign, e.g. Data Set 1 = soil chemistry; Data Set 2= biomass; Data Set 3 = climate.
Data sets could also be defined by:
Grouping similar data collected from a given study site,
Grouping similar data collected from multiple sites, or
Association with a specific version of a model, e.g. code, input and output.
Note that a data set can consist of one or more data files. It is not necessary for the data files to be identical in format or in content (as long as the files are internally consistent), but the data files should relate to each other and to the data set theme.
1.1.3 Archive Directory structure
All LBA-ECO data will be archived within an LBA area on the ORNL DAAC ftp site. Within the LBA area, directories will be arranged by science theme (based on the investigation team ID), and each data set will reside in its own subdirectory within the science theme, e.g.
ORNL DAAC ftp site > LBA data > Science Theme > Data Set
As mentioned, all files related to each data set will be contained within a subdirectory reserved for that data set. Each data set directory will contain 2 subdirectories:
/data (containing all data files and/or data subdirectories) and
/comp (containing “companion” files, i.e. the additional information that will help a user understand and utilize the data set.
The data set will be given a unique data set ID (a concatenation of Team ID + Data Set Code). This unique Data Set ID will be used as the directory name.
- /data subdirectory contains all actual data files\
- /comp subdirectory contains all "companion" files: read-me files, Data Set User’s Guide, and other supporting information
If you have a preferred directory structure for your data, this structure can be preserved in the archived location at the ORNL DAAC, as long as the /data and /comp conventions are preserved
Remember to provide the necessary information in your Data Set User’s Guide (Data Characteristics section) to help a user understand the naming conventions you've used in your data directories, particularly if you have coded your file names.
1.2.1 Data set titles must conform to NASA and ORNL DAAC guidelines:
Title must be 85 characters or less
Title must begin with “LBA-ECO”, followed by the team ID, e.g. LBA-ECO CD-99
The remainder of the 85 characters should include (a) the type of data, (b) study area, site, or region, and (c) time frame, if any of these are applicable, as space allows.
LBA-ECO CD-99 CO2 Flux in Undisturbed Forests in the Amazon Basin, Brazil, 2004
1.2.2 Data and supporting documentation, read-me files, etc. must be written in English.
This includes the contents of data files as well, e.g. data values, header information, and column labels. Information may also be provided in Portuguese if you choose to do so.
1.2.3 Data file contents
You must provide information to help a user understand the content of your data files, e.g. the variables included, their labels, the organization of columns in your data files, units of measurement, possible values (for coded fields), and interpretations of any coded values. This information should be provided as header information included in the data file itself, e.g. the first few lines of a spreadsheet. A sample of the data record should be provided in the Data Set User’s Guide, Data Characteristics section.
Avoid using special characters in column headings and data values if at all possible. Users all over the world will be accessing these data files and different computers may misinterpret special characters, causing errors in the data.
File contents should be internally consistent. Provide column headings to describe every data column and do not switch to a different set of column headings within the same file. If you have a different set of columns to report, they should be included in a separate data file.
Do not leave a missing value blank. Missing numeric values should be represented by a consistent placeholder, preferably –9999 (notation recommended by ORNL) and note in the documentation what convention you have chosen to use.
If your data contain data classes, e.g. site characterization classes, size classes, soil composition, or other coded information, you must provide the user information to decode the values.
1.2.4 Data file formats
Tabular data should be character-delimited ASCII files, e.g. comma delimited, but you should avoid using a delimiter that is likely to be contained in any of your data values.
If you have spreadsheets that may lose valuable information if exported to ASCII, you may include both versions of the data in the archive.
If your spreadsheets contain multiple worksheets, you should export each worksheet to ASCII.
Image data should be provided in non-proprietary data formats, e.g. ASCII grid, and the necessary projection files must be included. Most proprietary software packages provide utilities that will export data to non-proprietary data exchange formats. However, the project office archive team is working with the ORNL DAAC to develop a list of archive-friendly formats. This information will be posted on the LBA-ECO website as soon as it becomes available.
2.1.1 Refer to Investigation Abstracts and Profiles [URL] to see that all of your LBA-ECO related publications are included in the listing.
2.1.2 To submit new publications or request changes to the publications list, please send an e-mail to the Website Administrator. If you have a pdf of the publication, send that, too.
2.2 Linking publications to data sets
2.2.1 Bibligraphic citations
Are any of these publications related to the data set(s) you are preparing for archive? If so, add the bibliographic citation to the LME metadata in the section “Related Publications”.
2.2.2 The LBA-ECO Investigation Abstracts and Profiles web page
for each team provides a list of all data sets registered by each science team and shows any “Related Publications” that have been linked to the data sets (section 2.2.1 above). You may view a matrix display of the relationships between your team’s publications and data sets from each project's profile.
All LBA-ECO final data will be archived at the ORNL DAAC as well as within LBA DIS in Brazil. The ORNL DAAC is part of a network of data centers chosen to archive NASA’s data collections and to ensure the availability and usefulness of the data for 20 years into the future. To guarantee long-term usefulness, each data set must be accompanied by adequate documentation to help users understand the data long after the project is over.
To help you produce this documentation, i.e. the Data Set User’s Guide, we have added a relatively new section to the LBA Metadata Editor. Open the LME file for your data set and scroll down the page and you will see the section "Data Set Documentation". If you click on the "+" sign to expand the section, you will see the subsections:
Each of these fields is defined in the glossary (click on the field name in LME to access its glossary text).
The LME has the capability to output a Data Set User’s Guide, formatted nicely for printing or posting on the web. The information in the User’s Guide is pulled directly from the information you have entered in the LME metadata field. It also prompts you as the user to supplement where information was missing or insufficient.
To view the User’s Guide as it is being built, click the button at the top of the LME page where you see "Preview as Template for Data Set Doc". As the button indicates, a click will open a new window that displays a formatted web page containing the relevant documentation information contained from the LME data entry page. If you have not provided information in any of the required fields, you will see in red text instructions to provide the needed information. It is advisable to provide as much of this required information as you can within the LME editor. Note that you can cut & paste from other documents into the LME data entry screen, and even do some basic html formatting using the LME.
At this point, you have 2 options:
Option 1 - If the information you'd like to include in the Data Set User’s Guide is straightforward and is adequately represented by the document template produced by the LME / Documentation section, click your browser's "File" option and select "SAVE PAGE AS" and save the html file to your local system for your records. You should then notify us by email (email@example.com) that you've completed your documentation and the data set is ready for archive. You will also need to indicate where the final data and companion files can be accessed. Merilyn will then access your LME file, export the Data Set User’s Guide, and work with the ORNL DAAC staff to complete the archive process for your data set.
Option 2 - If you would like to incorporate graphs, jpgs, or tables into your data set document or do other html formatting, you may prefer to import the Data Set User’s Guide into an html editor on your desktop. If you choose this option, click your browser's "File" option and select "SAVE PAGE AS" and save the html file to your local system. You can then use your preferred html editor to complete the editing. When you’re satisfied with the Data Set User’s Guide, send Merilyn Gentry (firstname.lastname@example.org) a copy of the completed data set document as an email attachment, and let us know where we can access the final data and companion files.
After your data set and supporting information have been archived at the DAAC, LBA-ECO Project Office staff will ensure that a copy of the final data & documentation is sent to the LBA DIS for long-term archive.