Introduction

ddf_utils is a Python library and command line tool for people working with Tabular Data Package in DDF model. It provides various functions for ETL tasks, including string formatting, data transforming, generating datapackage.json, reading data form DDF datasets, running recipes, a decleative DSL designed to manipulate datasets to generate new datasets, and other functions we find useful in daily works in Gapminder.

Installation

Python 3.6+ is required in order to run this module.

To install this package from pypi, run:

$ pip install ddf_utils

To install from the latest source, run:

$ pip3 install git+https://github.com/semio/ddf_utils.git

For Windows users

If you encounter failed to create process when you run the ddf command, please try updating setuptools to latest version:

$ pip3 install -U setuptools

Usage

ddf_utils can be use as a library and also a commandline utility.

Library

ddf_utils’ helper functions are divided into a few modules based on their domain, namely:

see above links for documents for each module.

Command line helper

We provide a commandline utility ddf for common etl tasks. For now supported commands are:

$ ddf --help

Usage: ddf [OPTIONS] COMMAND [ARGS]...

Options:
  --debug / --no-debug
  --help                Show this message and exit.

Commands:
  build_recipe        create a complete recipe by expanding all...
  cleanup             clean up ddf files or translation files.
  create_datapackage  create datapackage.json
  diff                give a report on the statistical differences...
  from_csv            create ddfcsv dataset from a set of csv files
  merge_translation   merge all translation files from crowdin
  new                 create a new ddf project
  run_recipe          generate new ddf dataset with recipe
  split_translation   split ddf files for crowdin translation
  validate_recipe     validate the recipe

run ddf <command> --help for detail usage on each command.