Introduction

Installation

For Windows and macOS users, there are installers available at the release page. The Windows installer also includes the command line interface (CLI) mpldc.exe. You might want to add the installation directory to your Windows PATH variable for conveniently using the CLI (see e.g. this guide).

If you have Python installed, you can install MPL-Data-Cast with:

pip install mpl_data_cast

The Command-line interface is available via:

mpldc
# or
python -m mpl_data_cast.cli

To start the GUI, run:

mpldc-gui
# or
python -m mpl_data_cast.gui

Issue tracker

If you encounter bugs or have suggestions for improvement, open an issue on GitHub.

Motivation

If you do a lot of experiments, you have a lot of data. If you want to analyze the data on a different computer, you have to make sure that the data are transferred correctly to that computer. MPL-Data-Cast can help you with that process and it also allows you to simplify the subsequent data analysis by reformatting your raw input data. The main features of MPL-Data-Cast are:

  • Copy a directory tree from a local file system to a network share (or any other mountable location) and verify that all files are copied correctly via comparison of MD5 checksums

  • Convert your raw data to a file format that your data analysis pipeline understands (e.g. convert TIF files to HDF5 files) or repack/compress your input data

  • Append additional meta data from your experiments that you were not able to set in your data acquisition software

MPL-Data-Cast is developed at the Max Planck Institute for the Science of Light, where we have a lot of such data. For us, it addresses several data management issues:

  • When you copy a directory tree containing large (~20GB) files from a local Windows machine to a remote network share using the built-in File Explorer, sometimes the files were corrupted (e.g. HDF5 files could not be opened anymore).

  • When you deal with RT-DC data, you always have to (losslessly) compress the data (or at least repack the file so that a dataset is not scattered over the entire HDF5 file) to properly work with it.

  • We also have very basic data acquisition pipelines that work with Micro-Manager which outputs TIF files and metadata files. We needed to merge these files into properly formatted (including metadata) HDF5 files.

Design

MPL-Data-Cast is a Python library with a command-line interface (CLI) and a very simple GUI on top that lets you apply a “recipe” to data files. A recipe defines how your acquisition data is transformed into the final raw data for your analysis pipeline. You can define your own recipes or use the recipes that come with MPL-Data-Cast.

See also:

Good to know

  • MPL-Data-Cast does not create a directory with the same name as the input directory in the output directory, it copies the content of the input directory into the output directory. That means that you have to be careful when entering the output path, or otherwise you might end up mixing up data.

  • Right now, it is not possible to copy only selected files or subfolders from a given directory, you can only transfer complete directories.