Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

USDA Logo U.S. Department of Agriculture

A substantial amount of still-valuable research data is only preserved on paper or outdated digital media. For example, the Forest Service’s Fort Valley Experimental Forest still has paper-based research data from over 100 years ago. Bringing such data into the digital 21st century can take a substantial amount of work, variously labeled “data rescue” (see, for example, this USGS fact sheet) or “data archaeology”. Unfortunately, even when a data collection reflects expenditure of significant amounts of money it does not always make sense to attempt to excavate a particular site. After all, funding for data archaeology is limited and needs to be spent carefully so that projects can demonstrate modern value. On this page, we’ve provided some insights on deciding whether to initiate a data archaeology project and how to execute such a project.

We follow our advice. For example, when we were considering whether to start a dig at the Penobscot Experimental Forest, we determined that:

On the basis of this prioritization activity we ran a multi-year dig and created sets of digital data and administrative records that have served the research team and the public.

For additional information about preferred archival formats and specifications see NARA recommendations from 2014. Even when the guidance is outdated, this can be a good starting place.


How to convert paper-based materials to electronic files

  1. Prioritize

    It can be costly to convert paper-based materials to electronic files, so prioritizing ensures the most important data/files are converted first. Here are some things to consider when prioritizing:

    • Do you have sufficient metadata (who, what, when, why, where, how, etc.) to make the content useful?
    • Are the media fragile and in danger of not being readable?
    • Are the media in an old format (e.g., punch cards) that will require addition work?
    • Is the content important (e.g., relevant to a current study) and/or frequently requested?
    • Are you in danger of losing important information about the files because a scientist is retiring or leaving?
    • Are these materials permanent Forest Service records?
  2. Prepare

    Having the right tools and the right staff can make this job easier. Here are some recommendations:

    Tools
    • Scanner
      • High resolution is a must (minimum 600 dpi or significantly higher for slides)
      • Ability to auto feed (optional, but could be important)
      • File formats must be in archival format (JPG, TIFF, PDF, etc.)
    • Optical character recognition (OCR) software
      • Enables a computer to “read” the scanned text/data
      • Can help make documents 508 compliant
      • Helps convert scanned data to a useable form (doesn’t work well for hand-written data in most cases)
    Staff
    • Meticulous
    • Good organizational skills
    • Subject matter knowledge extremely helpful
  3. Digitize

    How to digitize materials varies based on the content type. Here are some recommendations:

    • Pictures / Slides
      • Scan printed photos: 600 dpi (grayscale recommended for black/white)
      • Scan slides or negatives: minimum of 2400 dpi
      • Scan only 1 picture or slide per file
    • Documents / Maps / Other Files
      • Scan at 600 dpi (grayscale recommended for black/white)
      • Scan multiple pages of a single document as 1 file
    • Data
      • Scan at 600 dpi (grayscale recommended), use OCR software after scan, verify accuracy of the OCR
      • Consider hiring someone to hand-enter data

    *Files and filenames should be as simple and transparent as possible. Folders can be used to break files into meaningful categories and help keep filenames shorter.

  4. Archive
    To properly archive data or other types of files, proper documentation (metadata) is needed if the information is to be useful in the future. Here are some examples of the type of information needed based on content type:
    • Pictures / Slides
      • Description, which should include what and where
      • When photo taken
      • Photographer (if known)
    • Documents / Maps / Other Files
      • Description
      • Author(s)
      • When written
    • Data
      • Description of data (needs to include complete description of each variable)
      • Who collected the data
      • Why data were collected
      • Where data were collected
      • Quality of the data

    *Important data/files, once electronic, should ideally be archived in an electronic data repository. It is important to think of stability, long-term preservation, discovery, and access capabilities when choosing where the data/files reside. Consider submitting data to the Forest Service Research Data Archive.

For more information on archiving and data management contact the archive team.

https://www.fs.usda.gov/rds/archive/digitizing