ddf_utils.chef.procedure package

Available Procedures

extract_concepts procedure for recipes

ddf_utils.chef.procedure.extract_concepts.extract_concepts(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, join=None, overwrite=None, include_keys=False) → ddf_utils.chef.model.ingredient.ConceptIngredient

extract concepts from other ingredients.

Procedure format:

procedure: extract_concepts
ingredients:  # list of ingredient id
  - ingredient_id_1
  - ingredient_id_2
result: str  # new ingredient id
options:
  join:  # optional
    base: str  # base concept ingredient id
    type: {'full_outer', 'ingredients_outer'}  # default is full_outer
  overwrite:  # overwrite some concept types
    country: entity_set
    year: time
  include_keys: true  # if we should include the primaryKeys concepts
Parameters:

ingredients – any numbers of ingredient that needs to extract concepts from

Keyword Arguments:
 
  • join (dict, optional) – the base ingredient to join
  • overwrite (dict, optional) – overwrite concept types for some concepts
  • include_keys (bool, optional) – if we shuld include the primaryKeys of the ingredients, default to false

See also

ddf_utils.transformer.extract_concepts() : related function in transformer module

Note

  • all concepts in ingredients in the ingredients parameter will be extracted to a new concept ingredient
  • join option is optional; if present then the base will merge with concepts from ingredients
  • full_outer join means get the union of concepts; ingredients_outer means only keep concepts from ingredients

filter procedure for recipes

ddf_utils.chef.procedure.filter.filter(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, **options) → ddf_utils.chef.model.ingredient.Ingredient

filter items and rows just as what value and filter do in ingredient definition.

Procedure format:

- procedure: filter
  ingredients:
      - ingredient_id
  options:
      item:  # just as `value` in ingredient def
          $in:
              - concept_1
              - concept_2
      row:  # just as `filter` in ingredient def
          $and:
              geo:
                  $ne: usa
              year:
                  $gt: 2010

  result: output_ingredient

for more information, see the ddf_utils.chef.ingredient.Ingredient class.

Parameters:
  • chef (Chef) – the Chef instance
  • ingredients – list of ingredient id in the DAG
  • result (str) –
Keyword Arguments:
 
  • item (list or dict, optional) – The item filter
  • row (dict, optional) – The row filter

flatten procedure for recipes

ddf_utils.chef.procedure.flatten.flatten(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient

flattening some dimensions, create new indicators.

procedure format:

procedure: flatten
ingredients:
    - ingredient_to_run
options:
    flatten_dimensions:
        - entity_1
        - entity_2
    dictionary:
        "concept_name_wildcard": "new_concept_name_template"
    skip_totals_among_entities:
        - entity_1
        - entity_2

The dictionary can have multiple entries, for each entry the concepts that matches the key in wildcard matching will be flatten to the value, which should be a template string. The variables for the templates will be provided with a dictionary contains concept, and all columns from flatten_dimensions as keys.

Parameters:
  • chef (Chef) – the Chef instance
  • ingredients (list) – a list of ingredients
  • result (str) – id of result ingredient
  • skip_totals_among_entities (list) – a list of total among entities, which we don’t add to new indicator names
Keyword Arguments:
 
  • flatten_dimensions (list) – a list of dimension to be flattened
  • dictionary (dict) – the dictionary for old name -> new name mapping

groupby procedure for recipes

ddf_utils.chef.procedure.groupby.groupby(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient

group ingredient data by column(s) and run aggregate function

Procedure format:

procedure: groupby
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  groupby: str or list  # column(s) to group
  aggregate: dict  # function block
  transform: dict  # function block
  filter: dict  # function block

The function block should have below format:

aggregate:
  column1: func_name1
  column2: func_name2

or

aggrgrate:
  column1:
    function: func_name
    param1: foo
    param2: baz

wildcard is supported in the column names. So aggreagte: {"*": "sum"} will run on every indicator in the ingredient

Keyword Arguments:
 
  • groupby (str or list) – the column(s) to group, can be a list or a string
  • insert_key (dict) – manually insert keys in to result. This is useful when we want to add back the aggregated column and set them to one value. For example geo: global inserts the geo column with all values are “global”
  • aggregate
  • transform
  • filter (dict, optinoal) – the function to run. only one of aggregate, transform and filter should be supplied.

Note

  • Only one of aggregate, transform or filter can be used in one procedure.
  • Any columns not mentioned in groupby or functions are dropped.

merge procedure for recipes

ddf_utils.chef.procedure.merge.merge(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, deep=False) → ddf_utils.chef.model.ingredient.Ingredient

merge a list of ingredients

The ingredients will be merged one by one in the order of how they are provided to this function. Later ones will overwrite the previous merged results.

Procedure format:

procedure: merge
ingredients:  # list of ingredient id
  - ingredient_id_1
  - ingredient_id_2
  - ingredient_id_3
  # ...
result: str  # new ingredient id
options:
  deep: bool  # use deep merge if true
Parameters:
  • chef (Chef) – a Chef instance
  • ingredients – Any numbers of ingredients to be merged
Keyword Arguments:
 

deep (bool, optional) – if True, then do deep merging. Default is False

Notes

deep merge is when we check every datapoint for existence if false, overwrite is on the file level. If key-value (e.g. geo,year-population_total) exists, whole file gets overwritten if true, overwrite is on the row level. If values (e.g. afr,2015-population_total) exists, it gets overwritten, if it doesn’t it stays

merge_entity procedure for recipes

ddf_utils.chef.procedure.merge_entity.merge_entity(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], dictionary, target_column, result, merged='drop') → ddf_utils.chef.model.ingredient.DataPointIngredient

merge entities

run_op procedure for recipes

ddf_utils.chef.procedure.run_op.run_op(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, op) → ddf_utils.chef.model.ingredient.DataPointIngredient

run math operation on each row of ingredient data.

Procedure format:

procedure: run_op
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  op: dict  # a dictionary describing calculation for each columns.
Keyword Arguments:
 op (dict) – a dictionary of concept_name -> function mapping

Examples

for exmaple, if we want to add 2 columns, col_a and col_b, to create an new column, we can write

procedure: run_op
ingredients:
  - ingredient_to_run
result: new_ingredient_id
options:
  op:
    new_col_name: "col_a + col_b"

split_entity procedure for recipes

ddf_utils.chef.procedure.split_entity.split_entity(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], dictionary, target_column, result, splitted='drop') → ddf_utils.chef.model.ingredient.DataPointIngredient

split entities

translate_column procedures for recipes

ddf_utils.chef.procedure.translate_column.translate_column(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, dictionary, column, *, target_column=None, not_found='drop', ambiguity='prompt', ignore_case=False, value_modifier=None) → ddf_utils.chef.model.ingredient.Ingredient

Translate column values.

Procedure format:

procedure: translate_column
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  column: str  # the column to be translated
  target_column: str  # optional, the target column to store the translated data
  not_found: {'drop', 'include', 'error'}  # optional, the behavior when there is values not
                                           # found in the mapping dictionary, default is 'drop'
  ambiguity: {'prompt', 'skip', 'error'}  # optional, the behavior when there is ambiguity
                                          # in the dictionary
  dictionary: str or dict  # file name or mappings dictionary

If base is provided in dictionary, key and value should also in dictionary. In this case chef will generate a mapping dictionary using the base ingredient. The dictionary format will be:

dictionary:
  base: str  # ingredient name
  key: str or list  # the columns to be the keys of the dictionary, can accept a list
  value: str  # the column to be the values of the the dictionary, must be one column
Parameters:
  • chef (Chef) – The Chef the procedure will run on
  • ingredients (list) – A list of ingredient id in the dag to translate
Keyword Arguments:
 
  • dictionary (dict) – A dictionary of oldname -> newname mappings. If ‘base’ is provided in the dictionary, ‘key’ and ‘value’ should also in the dictionary. See ddf_utils.transformer.translate_column() for more on how this is handled.
  • column (str) – the column to be translated
  • target_column (str, optional) – the target column to store the translated data. If this is not set then the column column will be replaced
  • not_found ({'drop', 'include', 'error'}, optional) – the behavior when there is values not found in the mapping dictionary, default is ‘drop’
  • ambiguity ({'prompt', 'skip', 'error'}, optional) – the behavior when there is ambiguity in the dictionary, default is ‘prompt’
  • value_modifier (str, optional) – a function to modify new column values, default is None

See also

ddf_utils.transformer.translate_column() : related function in transformer module

translate_header procedures for recipes

ddf_utils.chef.procedure.translate_header.translate_header(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, dictionary, duplicated='error') → ddf_utils.chef.model.ingredient.Ingredient

Translate column headers

Procedure format:

procedure: translate_header
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  dictionary: str or dict  # file name or mappings dictionary
Parameters:
  • chef (Chef) – The Chef the procedure will run on
  • ingredients (list) – A list of ingredient id in the dag to translate
  • dictionary (dict or str) – A dictionary for name mapping, or filepath to the dictionary
  • duplicated (str) – What to do when there are duplicated columns after renaming. Avaliable options are error, replace
  • result (str) – The result ingredient id

See also

ddf_utils.transformer.translate_header() : Related function in transformer module

all procedures for recipes

ddf_utils.chef.procedure.trend_bridge.trend_bridge(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], bridge_start, bridge_end, bridge_length, bridge_on, result, target_column=None) → ddf_utils.chef.model.ingredient.DataPointIngredient

run trend bridge on ingredients

Procedure format:

procedure: trend_bridge
ingredients:
  - data_ingredient                 # optional, if not set defaults to empty ingredient
result: data_bridged
options:
  bridge_start:
      ingredient: old_data_ingredient # optional, if not set then assume it's the input ingredient
      column:
        - concept_old_data
  bridge_end:
      ingredient: new_data_ingredient # optional, if not set then assume it's the input ingredient
      column:
        - concept_new_data
  bridge_length: 5                  # steps in time. If year, years, if days, days.
  bridge_on: time                   # the index column to build the bridge with
  target_column:
        - concept_in_result  # overwrites if exists. creates if not exists. default to bridge_end.column
Parameters:
  • chef (Chef) – A Chef instance
  • ingredients (list) – The input ingredient. The bridged result will be merged in to this ingredient. If this is None, then the only the bridged result will be returned
  • bridge_start (dict) – Describe the start of bridge
  • bridge_end (dict) – Describe the end of bridge
  • bridge_length (int) – The size of bridge
  • bridge_on (str) – The column to bridge
  • result (str) – The output ingredient id
Keyword Arguments:
 

target_column (list, optional) – The column name of the bridge result. default to bridge_end.column

See also

ddf_utils.transformer.trend_bridge() : related function in transformer module

window procedure for recipes

ddf_utils.chef.procedure.window.window(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient

apply functions on a rolling window

Procedure format:

procedure: window
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  window:
    column: str  # column which window is created from
    size: int or 'expanding'  # if int then rolling window, if expanding then expanding window
    min_periods: int  # as in pandas
    center: bool  # as in pandas
    aggregate: dict

Two styles of function block are supported, and they can mix in one procedure:

aggregate:
  col1: sum  # run rolling sum to col1
  col2: mean  # run rolling mean to col2
  col3:  # run foo to col3 with param1=baz
    function: foo
    param1: baz
Keyword Arguments:
 
  • window (dict) – window definition, see above for the dictionary format
  • aggregate (dict) – aggregation functions

Examples

An example of rolling windows:

procedure: window
ingredients:
    - ingredient_to_roll
result: new_ingredient_id
options:
  window:
    column: year
    size: 10
    min_periods: 1
    center: false
  aggregate:
    column_to_aggregate: sum

Notes

Any column not mentioned in the aggregate block will be dropped in the returned ingredient.