Skip to Main Content

Data Handling and Storage

This page covers both the handling of files and the storage of files during research.

These consist of separate boxes each with their own tabbed subtopics

  • Data Handling includes: File naming, Version Control, and Workflows
  • Data Storage includes: Basic Guidance, Choosing Storage, Large Scale Options, and Back-up Plans

Please contact Santi Thompson with any further questions.

Data Handling

Naming and Organizing Files

Organizing and structuring files at the beginning of a project will ease the research process and prevent losses and mix-ups.

Tips for creating file naming conventions

  • Choose more than one distinctive descriptor for the name.

Distinctive descriptors:

  • ​​​​​Experiment name
  • Location/spatial coordinates
  • Researcher name/initials
  • Date, time, date range
  • Type of data
  • Conditions
  • Version number
  • Be consistent- you can use batch naming software later as needed.
  • Practice simple version control: v1, v2 … (Don’t use Final because it probably isn’t!)
  • Use international standards for dates: YYYY-MM-DD
  • Stick with letters, numbers, - and _. (Avoid other characters ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' " and | 
  • Avoid spaces
  • Keep things short and concise
  • Create a master document (see template below) that describes your convention and folder contents.

For a list of best practices: Stanford University’s File-naming best practices

Visual Diagram example: Sample File-Naming Convention Visual 

Defining Version Control

This is the strategy we employ to keep track of the changes to files over time.

During collaborative work, versioning is essential and more complex.

Data Management Plans will often include methods for managing versions of data. 

More about version control from Git 

Implementing Version control

Manually - For basic small scale needs:

  • Use ISO standard dates (YYYY-MM-DD) at the end of your files, or V1, V2…save a new version each time.

Tools for Version control:

Better for groups and larger projects and when involving activities such as models, code, etc.

What is a Workflow?

Workflows are the steps you take to move from start to finish in your research activities.

Things to consider:

  • Basic elements of work: 

    • Individual data collection

    • Data aggregation

    • Analysis processes

  • Parts of workflows may be computational processes automated via the use of scripts.

  • Environments and circumstances contextualize decisions and processes.

Documenting

Documenting the workflow aids your ability to pick up where you left off, and to communicate effectively with collaborators.

Three key practices  - Justin Kitzes, The Basic Reproducible Workflow Template

Useful Tools

  • Jupyter Notebooks - an open-source application allowing researchers to generate and collaborate on documents containing live code, equations, visualizations and text.

  • Electronic Lab Notebooks (ELNs) - See the  Harvard ELN matrix for information

  • Docker - containers for computational environments

Data Storage

Storing and Backing up Data

Storage is where data and research materials reside during the process of collection and analysis. 

Back-up strategies are ways to insure that files are intact and up to date.

Guidance for Storage and Back-up of standard data files* 

  • Use the "3-2-1" Rule: 3 copies, 2 different media, 1 copy off-site. 

  • Don’t rely solely on the Cloud, make sure you keep a local back-up.

  • Designate one copy as the working copy and sync or update at designated intervals.

  • Automate back-up whenever possible.

  • Test your backups periodically.

  • Document these locations and who is responsible.

  • Pay special attention to raw data files - they are most valuable.

  • Keep an sharp eye out for vulnerabilities both internal and external.

*data without sensitive content such as personally identifiers or proprietary information
Sample Storage and Back-up Table
Location  URL/filepath Description Responsible Party
Department Server M:/user/somefile... Working Copy Jane Dough - Dept IT
UH MS OneDrive http://sharepoint/file... Copy 2 John Smyth - Post doc
My External Hard Drive E:/somefile/file…. Copy 3 Sal E. Mander - PI

For additional storage and back - up tips: Ways to Avoid a Data-Storage Disaster by Jeffrey Perkel, Nature 568, 131-132 (2019)

Storage Choices

UH researchers will want to reach out early and often to department IT with any specific questions and needs related to departmental storage.

  • Storage within the UH network ensures compliance. Information security adheres to specific protocols designed to keep university systems secure. 

  • Consider choosing an additional trusted cloud option for one of your storage solutions. (Do not rely on this as your sole storage.)

    • Free options you might consider: Box, Dropbox, Google Drive

Large Scale Data 

The growing scale of data is one of the biggest challenges we face in research and data services. 

First and foremost you will want to seek the advice of your department IT and others in your field who are encountering similar challenges.

Potential options include

  • Network attached storage (NAS)

These devices contain storage and associated management software - sort of like a small computer with a large amount of storage capacity. They are internet accessible which allows you to centralize data collected in multiple ways and then access files for analysis in one spot. Most models contain multiple hard drives and are set up with RAID to protect against data loss in case of a hard drive failure. (The cost ranges widely approximately $300-500.)

  • Cloud Storage Services

Beyond the free and institutional storage, there are varying levels of cloud storage services options available, some with additional back-up features.

Amazon Web Services is one of the most common choices, but there may be other options more suitable to your needs and budget.

Back-up Plans

We advise keeping a document that lays out the following:

  • A list of data files, average size, and format

  • Three storage locations

  • Medium of storage

  • Responsible party

  • Methods of back-up (Manual, automatic, software used, etc.)

  • Timing - Daily, weekly, monthly will depend on your output

  • Log of back-up dates (Verify that back-up is complete)

  • For Groups: Contingency plans should someone leave