ddf_utils.chef.model package¶
ddf_utils.chef.model.chef¶
The Chef object
-
class
ddf_utils.chef.model.chef.
Chef
(dag: ddf_utils.chef.model.dag.DAG = None, metadata=None, config=None, cooking=None, serving=None, recipe=None)¶ Bases:
object
the chef api
-
add_config
(**config)¶ add configs, all keyword args will be added/replace existing in config dictionary
-
add_dish
(ingredients, options=None)¶
-
add_ingredient
(**kwargs)¶ add a new ingredient in DAG.
keyword arguments will send as a dictionary to the
dictionary
keyword ofddf_utils.chef.model.ingredient.ingredient_from_dict()
method.
-
add_metadata
(**metadata)¶ add metadata, all keyword args will be added/replace existing in metadata dictionary
-
add_procedure
(collection, procedure, ingredients, result=None, options=None)¶
-
config
¶
-
copy
()¶
-
classmethod
from_recipe
(recipe_file, **config)¶
-
ingredients
¶
-
static
register_procedure
(func)¶
-
run
(serve=False, outpath=None)¶
-
serving
¶
-
to_graph
(node=None)¶
-
to_recipe
(fp=None)¶ write chef in yaml recipe format
-
validate
()¶ validate if the chef is good to run.
The following will be tested:
- check if datasets required by ingredients are available
- check if procedures are available
- check if the DAG is valid. i.e no dependency cycle, no missing dependency.
-
ddf_utils.chef.model.dag¶
the DAG model of chef
The DAG consists of 2 types of nodes: IngredientNode and ProcedureNode. each node will have a evaluate() function, which will return an ingredient on eval.
-
class
ddf_utils.chef.model.dag.
BaseNode
(node_id, chef)¶ Bases:
object
The base node which IngredientNode and ProcedureNode inherit from
Parameters: - node_id (str) – the name of the node
- dag (DAG) – the DAG object the node is in
-
add_downstream
(node)¶
-
add_upstream
(node)¶
-
detect_missing_dependency
()¶ check if every upstream is available in the DAG. raise error if something is missing
-
downstream_list
¶
-
evaluate
()¶
-
get_direct_relatives
(upstream=False)¶ Get the direct relatives to the current node, upstream or downstream.
-
upstream_list
¶
-
class
ddf_utils.chef.model.dag.
DAG
(node_dict=None)¶ Bases:
object
The DAG model.
A dag (directed acyclic graph) is a collection of tasks with directional dependencies. DAGs essentially act as namespaces for its nodes. A node_id can only be added once to a DAG.
-
add_dependency
(upstream_node_id, downstream_node_id)¶ Simple utility method to set dependency between two nodes that already have been added to the DAG using add_node()
-
add_node
(node)¶ add a node to DAG
-
copy
()¶
-
detect_cycles
()¶ Detect cycles in DAG, following Tarjan’s algorithm.
-
get_node
(node_id)¶
-
has_node
(node_id)¶
-
node_dict
¶
-
nodes
¶ return all nodes
-
roots
¶ return the roots of the DAG
-
tree_view
()¶ Shows an ascii tree representation of the DAG
-
-
class
ddf_utils.chef.model.dag.
IngredientNode
(node_id, ingredient, chef)¶ Bases:
ddf_utils.chef.model.dag.BaseNode
Node for storing dataset ingredients.
Parameters: ingredient (Ingredient) – the ingredient in this node -
evaluate
() → ddf_utils.chef.model.ingredient.Ingredient¶ return the ingredient as is
-
-
class
ddf_utils.chef.model.dag.
ProcedureNode
(node_id, procedure, chef)¶ Bases:
ddf_utils.chef.model.dag.BaseNode
The node for storing procedure results
The evaluate() function will run a procedure according to self.procedure, using other nodes’ data. Other nodes will be evaluated if when necessary.
Parameters: procedure (dict) – the procedure dictionary -
evaluate
() → ddf_utils.chef.model.ingredient.Ingredient¶
-
ddf_utils.chef.model.ingredient¶
main ingredient class
-
class
ddf_utils.chef.model.ingredient.
Ingredient
(id: str, key: Union[list, str], value: Union[list, dict, str] = '*', dataset: str = None, data: dict = None, row_filter: dict = None, base_dir: str = './')¶ Bases:
abc.ABC
Protocol class for all ingredients.
all ingredients should have following format:
id: example-ingredient dataset: ddf--example--dataset key: "geo,time" # key columns of ingredient value: # only include concepts listed here - concept_1 - concept_2 filter: # select rows by column values geo: # only keep datapoint where `geo` is in [swe, usa, chn] - swe - usa - chn
The other way to define the ingredient data is using the
data
keyword to include external csv file, or inline the data in the ingredient definition. Example:id: example-ingredient key: concept data: external_concepts.csv
On-the-fly ingredient:
id: example-ingredient key: concept data: - concept: concept_1 name: concept_name_1 concept_type: string description: concept_description_1 - concept: concept_2 name: concept_name_2 concept_type: measure description: concept_description_2
-
dataset_path
¶ return the full path to ingredient’s dataset if the ingredient is from local ddf dataset.
-
ddf
¶
-
ddf_id
¶
-
dtype
= 'abc'¶
-
static
filter_row
(data: dict, row_filter)¶ return the rows selected by row_filter.
-
classmethod
from_procedure_result
(id, key, data_computed: dict)¶
-
get_data
()¶
-
ingredient_type
¶
-
serve
(*args, **kwargs)¶ serving data to disk
-
-
class
ddf_utils.chef.model.ingredient.
ConceptIngredient
(id: str, key: Union[list, str], value: Union[list, dict, str] = '*', dataset: str = None, data: dict = None, row_filter: dict = None, base_dir: str = './')¶ Bases:
ddf_utils.chef.model.ingredient.Ingredient
-
dtype
= 'concepts'¶
-
get_data
() → Dict[str, <Mock name='mock.DataFrame' id='139917546889360'>]¶
-
static
get_data_from_ddf_dataset
(dataset_path, value, row_filter)¶
-
static
get_data_from_external_csv
(file_path, key, row_filter)¶
-
static
get_data_from_inline_data
(data, key, row_filter)¶
-
serve
(outpath, **options)¶ serving data to disk
-
-
class
ddf_utils.chef.model.ingredient.
EntityIngredient
(id: str, key: Union[list, str], value: Union[list, dict, str] = '*', dataset: str = None, data: dict = None, row_filter: dict = None, base_dir: str = './')¶ Bases:
ddf_utils.chef.model.ingredient.Ingredient
-
dtype
= 'entities'¶
-
get_data
() → Dict[str, <Mock name='mock.DataFrame' id='139917546889360'>]¶
-
static
get_data_from_ddf_dataset
(dataset_path, key, value, row_filter)¶
-
static
get_data_from_external_csv
(file_path, key, row_filter)¶
-
static
get_data_from_inline_data
(data, key, row_filter)¶
-
serve
(outpath, **options)¶ serving data to disk
-
-
class
ddf_utils.chef.model.ingredient.
DataPointIngredient
(id: str, key: Union[list, str], value: Union[list, dict, str] = '*', dataset: str = None, data: dict = None, row_filter: dict = None, base_dir: str = './')¶ Bases:
ddf_utils.chef.model.ingredient.Ingredient
-
compute
() → Dict[str, <Mock name='mock.DataFrame' id='139917546889360'>]¶ return a pandas dataframe version of self.data
-
dtype
= 'datapoints'¶
-
classmethod
from_procedure_result
(id, key, data_computed: dict)¶
-
get_data
() → Dict[str, <Mock name='mock.dataframe.DataFrame' id='139917523271376'>]¶
-
static
get_data_from_ddf_dataset
(id, dataset_path, key, value, row_filter)¶
-
static
get_data_from_external_csv
(file_path, key, row_filter)¶
-
static
get_data_from_inline_data
(data, key, row_filter)¶
-
serve
(outpath, **options)¶ serving data to disk
-
-
ddf_utils.chef.model.ingredient.
ingredient_from_dict
(dictionary: dict, **chef_options) → ddf_utils.chef.model.ingredient.Ingredient¶ create ingredient from recipe definition and options. Parameters for ingredient should be passed in a dictionary. See the doc for 3. Define Ingredients or
ddf_utils.chef.model.ingredient.Ingredient
for available parameters.
-
ddf_utils.chef.model.ingredient.
key_to_list
(key)¶ make a list that contains primaryKey of this ingredient
-
ddf_utils.chef.model.ingredient.
get_ingredient_class
(cls)¶