ddf_utils.chef.procedure package¶

Available Procedures¶

extract_concepts procedure for recipes

ddf_utils.chef.procedure.extract_concepts.extract_concepts(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, join=None, overwrite=None, include_keys=False) → ddf_utils.chef.model.ingredient.ConceptIngredient¶

extract concepts from other ingredients.

Procedure format:

procedure: extract_concepts
ingredients:  # list of ingredient id
  - ingredient_id_1
  - ingredient_id_2
result: str  # new ingredient id
options:
  join:  # optional
    base: str  # base concept ingredient id
    type: {'full_outer', 'ingredients_outer'}  # default is full_outer
  overwrite:  # overwrite some concept types
    country: entity_set
    year: time
  include_keys: true  # if we should include the primaryKeys concepts

Keyword Arguments:
Parameters:	ingredients – any numbers of ingredient that needs to extract concepts from
	join (dict, optional) – the base ingredient to join overwrite (dict, optional) – overwrite concept types for some concepts include_keys (bool, optional) – if we shuld include the primaryKeys of the ingredients, default to false

See also

ddf_utils.transformer.extract_concepts() : related function in transformer module

Note

all concepts in ingredients in the ingredients parameter will be extracted to a new concept ingredient
join option is optional; if present then the base will merge with concepts from ingredients
full_outer join means get the union of concepts; ingredients_outer means only keep concepts from ingredients

filter procedure for recipes

ddf_utils.chef.procedure.filter.filter(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, **options) → ddf_utils.chef.model.ingredient.Ingredient¶

filter items and rows just as what value and filter do in ingredient definition.

Procedure format:

- procedure: filter
  ingredients:
      - ingredient_id
  options:
      item:  # just as `value` in ingredient def
          $in:
              - concept_1
              - concept_2
      row:  # just as `filter` in ingredient def
          $and:
              geo:
                  $ne: usa
              year:
                  $gt: 2010

  result: output_ingredient

for more information, see the ddf_utils.chef.ingredient.Ingredient class.

Keyword Arguments:
Parameters:	chef (Chef) – the Chef instance ingredients – list of ingredient id in the DAG result (str) –
	item (list or dict, optional) – The item filter row (dict, optional) – The row filter

flatten procedure for recipes

ddf_utils.chef.procedure.flatten.flatten(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient¶

flattening some dimensions, create new indicators.

procedure format:

procedure: flatten
ingredients:
    - ingredient_to_run
options:
    flatten_dimensions:
        - entity_1
        - entity_2
    dictionary:
        "concept_name_wildcard": "new_concept_name_template"
    skip_totals_among_entities:
        - entity_1
        - entity_2

The dictionary can have multiple entries, for each entry the concepts that matches the key in wildcard matching will be flatten to the value, which should be a template string. The variables for the templates will be provided with a dictionary contains concept, and all columns from flatten_dimensions as keys.

Keyword Arguments:
Parameters:	chef (Chef) – the Chef instance ingredients (list) – a list of ingredients result (str) – id of result ingredient skip_totals_among_entities (list) – a list of total among entities, which we don’t add to new indicator names
	flatten_dimensions (list) – a list of dimension to be flattened dictionary (dict) – the dictionary for old name -> new name mapping

groupby procedure for recipes

ddf_utils.chef.procedure.groupby.groupby(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient¶

group ingredient data by column(s) and run aggregate function

Procedure format:

procedure: groupby
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  groupby: str or list  # column(s) to group
  aggregate: dict  # function block
  transform: dict  # function block
  filter: dict  # function block

The function block should have below format:

aggregate:
  column1: func_name1
  column2: func_name2

or

aggrgrate:
  column1:
    function: func_name
    param1: foo
    param2: baz

wildcard is supported in the column names. So aggreagte: {"*": "sum"} will run on every indicator in the ingredient

Keyword Arguments:

groupby (str or list) – the column(s) to group, can be a list or a string
insert_key (dict) – manually insert keys in to result. This is useful when we want to add back the aggregated column and set them to one value. For example geo: global inserts the geo column with all values are “global”
aggregate –
transform –
filter (dict, optinoal) – the function to run. only one of aggregate, transform and filter should be supplied.

Note

Only one of aggregate, transform or filter can be used in one procedure.
Any columns not mentioned in groupby or functions are dropped.

merge procedure for recipes

ddf_utils.chef.procedure.merge.merge(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, deep=False) → ddf_utils.chef.model.ingredient.Ingredient¶

merge a list of ingredients

The ingredients will be merged one by one in the order of how they are provided to this function. Later ones will overwrite the previous merged results.

Procedure format:

procedure: merge
ingredients:  # list of ingredient id
  - ingredient_id_1
  - ingredient_id_2
  - ingredient_id_3
  # ...
result: str  # new ingredient id
options:
  deep: bool  # use deep merge if true

Keyword Arguments:
Parameters:	chef (Chef) – a Chef instance ingredients – Any numbers of ingredients to be merged
	deep (bool, optional) – if True, then do deep merging. Default is False

Notes

deep merge is when we check every datapoint for existence if false, overwrite is on the file level. If key-value (e.g. geo,year-population_total) exists, whole file gets overwritten if true, overwrite is on the row level. If values (e.g. afr,2015-population_total) exists, it gets overwritten, if it doesn’t it stays

merge_entity procedure for recipes

ddf_utils.chef.procedure.merge_entity.merge_entity(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], dictionary, target_column, result, merged='drop') → ddf_utils.chef.model.ingredient.DataPointIngredient¶: merge entities

run_op procedure for recipes

ddf_utils.chef.procedure.run_op.run_op(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, op) → ddf_utils.chef.model.ingredient.DataPointIngredient¶

run math operation on each row of ingredient data.

Procedure format:

procedure: run_op
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  op: dict  # a dictionary describing calculation for each columns.

Keyword Arguments:
	op (dict) – a dictionary of concept_name -> function mapping

Examples

for exmaple, if we want to add 2 columns, col_a and col_b, to create an new column, we can write

procedure: run_op
ingredients:
  - ingredient_to_run
result: new_ingredient_id
options:
  op:
    new_col_name: "col_a + col_b"

split_entity procedure for recipes

ddf_utils.chef.procedure.split_entity.split_entity(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], dictionary, target_column, result, splitted='drop') → ddf_utils.chef.model.ingredient.DataPointIngredient¶: split entities

translate_column procedures for recipes

ddf_utils.chef.procedure.translate_column.translate_column(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, dictionary, column, *, target_column=None, not_found='drop', ambiguity='prompt', ignore_case=False, value_modifier=None) → ddf_utils.chef.model.ingredient.Ingredient¶

Translate column values.

Procedure format:

procedure: translate_column
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  column: str  # the column to be translated
  target_column: str  # optional, the target column to store the translated data
  not_found: {'drop', 'include', 'error'}  # optional, the behavior when there is values not
                                           # found in the mapping dictionary, default is 'drop'
  ambiguity: {'prompt', 'skip', 'error'}  # optional, the behavior when there is ambiguity
                                          # in the dictionary
  dictionary: str or dict  # file name or mappings dictionary

If base is provided in dictionary, key and value should also in dictionary. In this case chef will generate a mapping dictionary using the base ingredient. The dictionary format will be:

dictionary:
  base: str  # ingredient name
  key: str or list  # the columns to be the keys of the dictionary, can accept a list
  value: str  # the column to be the values of the the dictionary, must be one column

Parameters:

chef (Chef) – The Chef the procedure will run on
ingredients (list) – A list of ingredient id in the dag to translate

Keyword Arguments:

dictionary (dict) – A dictionary of oldname -> newname mappings. If ‘base’ is provided in the dictionary, ‘key’ and ‘value’ should also in the dictionary. See ddf_utils.transformer.translate_column() for more on how this is handled.
column (str) – the column to be translated
target_column (str, optional) – the target column to store the translated data. If this is not set then the column column will be replaced
not_found ({'drop', 'include', 'error'}, optional) – the behavior when there is values not found in the mapping dictionary, default is ‘drop’
ambiguity ({'prompt', 'skip', 'error'}, optional) – the behavior when there is ambiguity in the dictionary, default is ‘prompt’
value_modifier (str, optional) – a function to modify new column values, default is None

See also

ddf_utils.transformer.translate_column() : related function in transformer module

translate_header procedures for recipes

ddf_utils.chef.procedure.translate_header.translate_header(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, dictionary, duplicated='error') → ddf_utils.chef.model.ingredient.Ingredient¶

Translate column headers

Procedure format:

procedure: translate_header
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  dictionary: str or dict  # file name or mappings dictionary

Parameters:

chef (Chef) – The Chef the procedure will run on
ingredients (list) – A list of ingredient id in the dag to translate
dictionary (dict or str) – A dictionary for name mapping, or filepath to the dictionary
duplicated (str) – What to do when there are duplicated columns after renaming. Avaliable options are error, replace
result (str) – The result ingredient id

See also

ddf_utils.transformer.translate_header() : Related function in transformer module

all procedures for recipes

ddf_utils.chef.procedure.trend_bridge.trend_bridge(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], bridge_start, bridge_end, bridge_length, bridge_on, result, target_column=None) → ddf_utils.chef.model.ingredient.DataPointIngredient¶

run trend bridge on ingredients

Procedure format:

procedure: trend_bridge
ingredients:
  - data_ingredient                 # optional, if not set defaults to empty ingredient
result: data_bridged
options:
  bridge_start:
      ingredient: old_data_ingredient # optional, if not set then assume it's the input ingredient
      column:
        - concept_old_data
  bridge_end:
      ingredient: new_data_ingredient # optional, if not set then assume it's the input ingredient
      column:
        - concept_new_data
  bridge_length: 5                  # steps in time. If year, years, if days, days.
  bridge_on: time                   # the index column to build the bridge with
  target_column:
        - concept_in_result  # overwrites if exists. creates if not exists. default to bridge_end.column

Parameters:

chef (Chef) – A Chef instance
ingredients (list) – The input ingredient. The bridged result will be merged in to this ingredient. If this is None, then the only the bridged result will be returned
bridge_start (dict) – Describe the start of bridge
bridge_end (dict) – Describe the end of bridge
bridge_length (int) – The size of bridge
bridge_on (str) – The column to bridge
result (str) – The output ingredient id

Keyword Arguments:

target_column (list, optional) – The column name of the bridge result. default to bridge_end.column

See also

ddf_utils.transformer.trend_bridge() : related function in transformer module

window procedure for recipes

ddf_utils.chef.procedure.window.window(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient¶

apply functions on a rolling window

Procedure format:

procedure: window
ingredients:  # list of ingredient id
  - ingredient_id
result: str  # new ingredient id
options:
  window:
    column: str  # column which window is created from
    size: int or 'expanding'  # if int then rolling window, if expanding then expanding window
    min_periods: int  # as in pandas
    center: bool  # as in pandas
    aggregate: dict

Two styles of function block are supported, and they can mix in one procedure:

aggregate:
  col1: sum  # run rolling sum to col1
  col2: mean  # run rolling mean to col2
  col3:  # run foo to col3 with param1=baz
    function: foo
    param1: baz

Keyword Arguments:
	window (dict) – window definition, see above for the dictionary format aggregate (dict) – aggregation functions

Examples

An example of rolling windows:

procedure: window
ingredients:
    - ingredient_to_roll
result: new_ingredient_id
options:
  window:
    column: year
    size: 10
    min_periods: 1
    center: false
  aggregate:
    column_to_aggregate: sum

Notes

Any column not mentioned in the aggregate block will be dropped in the returned ingredient.