Introduction¶
ddf_utils is a Python library and command line tool for people working with Tabular Data Package in DDF model. It provides various functions for ETL tasks, including string formatting, data transforming, generating datapackage.json, reading data form DDF datasets, running recipes, a decleative DSL designed to manipulate datasets to generate new datasets, and other functions we find useful in daily works in Gapminder.
Installation¶
Python 3.6+ is required in order to run this module.
To install this package from pypi, run:
$ pip install ddf_utils
To install from the latest source, run:
$ pip3 install git+https://github.com/semio/ddf_utils.git
For Windows users¶
If you encounter failed to create process
when you run the ddf command, please
try updating setuptools to latest version:
$ pip3 install -U setuptools
Usage¶
ddf_utils can be use as a library and also a commandline utility.
Library¶
ddf_utils’ helper functions are divided into a few modules based on their domain, namely:
chef
: Recipe cooking functions. See Recipe Cookbook (draft) for how to write recipesi18n
: Splitting/merging translation filespackage
: Generating/updating datapackage.jsonmodel.ddf
/model.package
: Data Models for dataset and datapackagepatch
: Applying patch in daff formatqa
: Functions for QA tasksstr
: Functions for string/number formattingtransformer
: Data transforming functions, such as column/row translation, trend bridge, etc.
see above links for documents for each module.
Command line helper¶
We provide a commandline utility ddf
for common etl tasks. For now supported
commands are:
$ ddf --help
Usage: ddf [OPTIONS] COMMAND [ARGS]...
Options:
--debug / --no-debug
--help Show this message and exit.
Commands:
build_recipe create a complete recipe by expanding all...
cleanup clean up ddf files or translation files.
create_datapackage create datapackage.json
diff give a report on the statistical differences...
from_csv create ddfcsv dataset from a set of csv files
merge_translation merge all translation files from crowdin
new create a new ddf project
run_recipe generate new ddf dataset with recipe
split_translation split ddf files for crowdin translation
validate_recipe validate the recipe
run ddf <command> --help
for detail usage on each command.