ddf_utils.chef.procedure package¶
Available Procedures¶
extract_concepts procedure for recipes
-
ddf_utils.chef.procedure.extract_concepts.
extract_concepts
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, join=None, overwrite=None, include_keys=False) → ddf_utils.chef.model.ingredient.ConceptIngredient¶ extract concepts from other ingredients.
Procedure format:
procedure: extract_concepts ingredients: # list of ingredient id - ingredient_id_1 - ingredient_id_2 result: str # new ingredient id options: join: # optional base: str # base concept ingredient id type: {'full_outer', 'ingredients_outer'} # default is full_outer overwrite: # overwrite some concept types country: entity_set year: time include_keys: true # if we should include the primaryKeys concepts
Parameters: ingredients – any numbers of ingredient that needs to extract concepts from
Keyword Arguments: - join (dict, optional) – the base ingredient to join
- overwrite (dict, optional) – overwrite concept types for some concepts
- include_keys (bool, optional) – if we shuld include the primaryKeys of the ingredients, default to false
See also
ddf_utils.transformer.extract_concepts()
: related function in transformer moduleNote
- all concepts in ingredients in the
ingredients
parameter will be extracted to a new concept ingredient join
option is optional; if present then thebase
will merge with concepts fromingredients
full_outer
join means get the union of concepts;ingredients_outer
means only keep concepts fromingredients
filter procedure for recipes
-
ddf_utils.chef.procedure.filter.
filter
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, **options) → ddf_utils.chef.model.ingredient.Ingredient¶ filter items and rows just as what value and filter do in ingredient definition.
Procedure format:
- procedure: filter ingredients: - ingredient_id options: item: # just as `value` in ingredient def $in: - concept_1 - concept_2 row: # just as `filter` in ingredient def $and: geo: $ne: usa year: $gt: 2010 result: output_ingredient
for more information, see the
ddf_utils.chef.ingredient.Ingredient
class.Parameters: - chef (Chef) – the Chef instance
- ingredients – list of ingredient id in the DAG
- result (str) –
Keyword Arguments: - item (list or dict, optional) – The item filter
- row (dict, optional) – The row filter
flatten procedure for recipes
-
ddf_utils.chef.procedure.flatten.
flatten
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient¶ flattening some dimensions, create new indicators.
procedure format:
procedure: flatten ingredients: - ingredient_to_run options: flatten_dimensions: - entity_1 - entity_2 dictionary: "concept_name_wildcard": "new_concept_name_template" skip_totals_among_entities: - entity_1 - entity_2
The
dictionary
can have multiple entries, for each entry the concepts that matches the key in wildcard matching will be flatten to the value, which should be a template string. The variables for the templates will be provided with a dictionary containsconcept
, and all columns fromflatten_dimensions
as keys.Parameters: - chef (Chef) – the Chef instance
- ingredients (list) – a list of ingredients
- result (str) – id of result ingredient
- skip_totals_among_entities (list) – a list of total among entities, which we don’t add to new indicator names
Keyword Arguments: - flatten_dimensions (list) – a list of dimension to be flattened
- dictionary (dict) – the dictionary for old name -> new name mapping
groupby procedure for recipes
-
ddf_utils.chef.procedure.groupby.
groupby
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient¶ group ingredient data by column(s) and run aggregate function
Procedure format:
procedure: groupby ingredients: # list of ingredient id - ingredient_id result: str # new ingredient id options: groupby: str or list # column(s) to group aggregate: dict # function block transform: dict # function block filter: dict # function block
The function block should have below format:
aggregate: column1: func_name1 column2: func_name2
or
aggrgrate: column1: function: func_name param1: foo param2: baz
wildcard is supported in the column names. So
aggreagte: {"*": "sum"}
will run on every indicator in the ingredientKeyword Arguments: - groupby (str or list) – the column(s) to group, can be a list or a string
- insert_key (dict) – manually insert keys in to result. This is useful when we want to add back the
aggregated column and set them to one value. For example
geo: global
inserts thegeo
column with all values are “global” - aggregate –
- transform –
- filter (dict, optinoal) – the function to run. only one of aggregate, transform and filter should be supplied.
Note
- Only one of
aggregate
,transform
orfilter
can be used in one procedure. - Any columns not mentioned in groupby or functions are dropped.
merge procedure for recipes
-
ddf_utils.chef.procedure.merge.
merge
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, deep=False) → ddf_utils.chef.model.ingredient.Ingredient¶ merge a list of ingredients
The ingredients will be merged one by one in the order of how they are provided to this function. Later ones will overwrite the previous merged results.
Procedure format:
procedure: merge ingredients: # list of ingredient id - ingredient_id_1 - ingredient_id_2 - ingredient_id_3 # ... result: str # new ingredient id options: deep: bool # use deep merge if true
Parameters: - chef (Chef) – a Chef instance
- ingredients – Any numbers of ingredients to be merged
Keyword Arguments: deep (bool, optional) – if True, then do deep merging. Default is False
Notes
deep merge is when we check every datapoint for existence if false, overwrite is on the file level. If key-value (e.g. geo,year-population_total) exists, whole file gets overwritten if true, overwrite is on the row level. If values (e.g. afr,2015-population_total) exists, it gets overwritten, if it doesn’t it stays
merge_entity procedure for recipes
-
ddf_utils.chef.procedure.merge_entity.
merge_entity
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], dictionary, target_column, result, merged='drop') → ddf_utils.chef.model.ingredient.DataPointIngredient¶ merge entities
run_op procedure for recipes
-
ddf_utils.chef.procedure.run_op.
run_op
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, op) → ddf_utils.chef.model.ingredient.DataPointIngredient¶ run math operation on each row of ingredient data.
Procedure format:
procedure: run_op ingredients: # list of ingredient id - ingredient_id result: str # new ingredient id options: op: dict # a dictionary describing calculation for each columns.
Keyword Arguments: op (dict) – a dictionary of concept_name -> function mapping Examples
for exmaple, if we want to add 2 columns, col_a and col_b, to create an new column, we can write
procedure: run_op ingredients: - ingredient_to_run result: new_ingredient_id options: op: new_col_name: "col_a + col_b"
split_entity procedure for recipes
-
ddf_utils.chef.procedure.split_entity.
split_entity
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], dictionary, target_column, result, splitted='drop') → ddf_utils.chef.model.ingredient.DataPointIngredient¶ split entities
translate_column procedures for recipes
-
ddf_utils.chef.procedure.translate_column.
translate_column
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, dictionary, column, *, target_column=None, not_found='drop', ambiguity='prompt', ignore_case=False, value_modifier=None) → ddf_utils.chef.model.ingredient.Ingredient¶ Translate column values.
Procedure format:
procedure: translate_column ingredients: # list of ingredient id - ingredient_id result: str # new ingredient id options: column: str # the column to be translated target_column: str # optional, the target column to store the translated data not_found: {'drop', 'include', 'error'} # optional, the behavior when there is values not # found in the mapping dictionary, default is 'drop' ambiguity: {'prompt', 'skip', 'error'} # optional, the behavior when there is ambiguity # in the dictionary dictionary: str or dict # file name or mappings dictionary
If base is provided in dictionary, key and value should also in dictionary. In this case chef will generate a mapping dictionary using the base ingredient. The dictionary format will be:
dictionary: base: str # ingredient name key: str or list # the columns to be the keys of the dictionary, can accept a list value: str # the column to be the values of the the dictionary, must be one column
Parameters: - chef (Chef) – The Chef the procedure will run on
- ingredients (list) – A list of ingredient id in the dag to translate
Keyword Arguments: - dictionary (dict) – A dictionary of oldname -> newname mappings.
If ‘base’ is provided in the dictionary, ‘key’ and ‘value’ should also in the dictionary.
See
ddf_utils.transformer.translate_column()
for more on how this is handled. - column (str) – the column to be translated
- target_column (str, optional) – the target column to store the translated data. If this is not set then the column column will be replaced
- not_found ({'drop', 'include', 'error'}, optional) – the behavior when there is values not found in the mapping dictionary, default is ‘drop’
- ambiguity ({'prompt', 'skip', 'error'}, optional) – the behavior when there is ambiguity in the dictionary, default is ‘prompt’
- value_modifier (str, optional) – a function to modify new column values, default is None
See also
ddf_utils.transformer.translate_column()
: related function in transformer module
translate_header procedures for recipes
-
ddf_utils.chef.procedure.translate_header.
translate_header
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.Ingredient], result, dictionary, duplicated='error') → ddf_utils.chef.model.ingredient.Ingredient¶ Translate column headers
Procedure format:
procedure: translate_header ingredients: # list of ingredient id - ingredient_id result: str # new ingredient id options: dictionary: str or dict # file name or mappings dictionary
Parameters: - chef (Chef) – The Chef the procedure will run on
- ingredients (list) – A list of ingredient id in the dag to translate
- dictionary (dict or str) – A dictionary for name mapping, or filepath to the dictionary
- duplicated (str) – What to do when there are duplicated columns after renaming. Avaliable options are error, replace
- result (str) – The result ingredient id
See also
ddf_utils.transformer.translate_header()
: Related function in transformer module
all procedures for recipes
-
ddf_utils.chef.procedure.trend_bridge.
trend_bridge
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], bridge_start, bridge_end, bridge_length, bridge_on, result, target_column=None) → ddf_utils.chef.model.ingredient.DataPointIngredient¶ run trend bridge on ingredients
Procedure format:
procedure: trend_bridge ingredients: - data_ingredient # optional, if not set defaults to empty ingredient result: data_bridged options: bridge_start: ingredient: old_data_ingredient # optional, if not set then assume it's the input ingredient column: - concept_old_data bridge_end: ingredient: new_data_ingredient # optional, if not set then assume it's the input ingredient column: - concept_new_data bridge_length: 5 # steps in time. If year, years, if days, days. bridge_on: time # the index column to build the bridge with target_column: - concept_in_result # overwrites if exists. creates if not exists. default to bridge_end.column
Parameters: - chef (Chef) – A Chef instance
- ingredients (list) – The input ingredient. The bridged result will be merged in to this ingredient. If this is None, then the only the bridged result will be returned
- bridge_start (dict) – Describe the start of bridge
- bridge_end (dict) – Describe the end of bridge
- bridge_length (int) – The size of bridge
- bridge_on (str) – The column to bridge
- result (str) – The output ingredient id
Keyword Arguments: target_column (list, optional) – The column name of the bridge result. default to bridge_end.column
See also
ddf_utils.transformer.trend_bridge()
: related function in transformer module
window procedure for recipes
-
ddf_utils.chef.procedure.window.
window
(chef: ddf_utils.chef.model.chef.Chef, ingredients: List[ddf_utils.chef.model.ingredient.DataPointIngredient], result, **options) → ddf_utils.chef.model.ingredient.DataPointIngredient¶ apply functions on a rolling window
Procedure format:
procedure: window ingredients: # list of ingredient id - ingredient_id result: str # new ingredient id options: window: column: str # column which window is created from size: int or 'expanding' # if int then rolling window, if expanding then expanding window min_periods: int # as in pandas center: bool # as in pandas aggregate: dict
Two styles of function block are supported, and they can mix in one procedure:
aggregate: col1: sum # run rolling sum to col1 col2: mean # run rolling mean to col2 col3: # run foo to col3 with param1=baz function: foo param1: baz
Keyword Arguments: - window (dict) – window definition, see above for the dictionary format
- aggregate (dict) – aggregation functions
Examples
An example of rolling windows:
procedure: window ingredients: - ingredient_to_roll result: new_ingredient_id options: window: column: year size: 10 min_periods: 1 center: false aggregate: column_to_aggregate: sum
Notes
Any column not mentioned in the aggregate block will be dropped in the returned ingredient.