ddf_utils.chef.model package¶
ddf_utils.chef.model.chef¶
The Chef object
-
class
ddf_utils.chef.model.chef.Chef(dag: ddf_utils.chef.model.dag.DAG = None, metadata=None, config=None, cooking=None, serving=None, recipe=None)¶ Bases:
objectthe chef api
-
add_config(**config)¶ add configs, all keyword args will be added/replace existing in config dictionary
-
add_dish(ingredients, options=None)¶
-
add_ingredient(**kwargs)¶ add a new ingredient in DAG.
keyword arguments will send as a dictionary to the
dictionarykeyword ofddf_utils.chef.model.ingredient.ingredient_from_dict()method.
-
add_metadata(**metadata)¶ add metadata, all keyword args will be added/replace existing in metadata dictionary
-
add_procedure(collection, procedure, ingredients, result=None, options=None)¶
-
config¶
-
copy()¶
-
classmethod
from_recipe(recipe_file, **config)¶
-
ingredients¶
-
static
register_procedure(func)¶
-
run(serve=False, outpath=None)¶
-
serving¶
-
to_graph(node=None)¶
-
to_recipe(fp=None)¶ write chef in yaml recipe format
-
validate()¶ validate if the chef is good to run.
The following will be tested:
- check if datasets required by ingredients are available
- check if procedures are available
- check if the DAG is valid. i.e no dependency cycle, no missing dependency.
-
ddf_utils.chef.model.dag¶
the DAG model of chef
The DAG consists of 2 types of nodes: IngredientNode and ProcedureNode. each node will have a evaluate() function, which will return an ingredient on eval.
-
class
ddf_utils.chef.model.dag.BaseNode(node_id, chef)¶ Bases:
objectThe base node which IngredientNode and ProcedureNode inherit from
Parameters: - node_id (str) – the name of the node
- dag (DAG) – the DAG object the node is in
-
add_downstream(node)¶
-
add_upstream(node)¶
-
detect_missing_dependency()¶ check if every upstream is available in the DAG. raise error if something is missing
-
downstream_list¶
-
evaluate()¶
-
get_direct_relatives(upstream=False)¶ Get the direct relatives to the current node, upstream or downstream.
-
upstream_list¶
-
class
ddf_utils.chef.model.dag.DAG(node_dict=None)¶ Bases:
objectThe DAG model.
A dag (directed acyclic graph) is a collection of tasks with directional dependencies. DAGs essentially act as namespaces for its nodes. A node_id can only be added once to a DAG.
-
add_dependency(upstream_node_id, downstream_node_id)¶ Simple utility method to set dependency between two nodes that already have been added to the DAG using add_node()
-
add_node(node)¶ add a node to DAG
-
copy()¶
-
detect_cycles()¶ Detect cycles in DAG, following Tarjan’s algorithm.
-
get_node(node_id)¶
-
has_node(node_id)¶
-
node_dict¶
-
nodes¶ return all nodes
-
roots¶ return the roots of the DAG
-
tree_view()¶ Shows an ascii tree representation of the DAG
-
-
class
ddf_utils.chef.model.dag.IngredientNode(node_id, ingredient, chef)¶ Bases:
ddf_utils.chef.model.dag.BaseNodeNode for storing dataset ingredients.
Parameters: ingredient (Ingredient) – the ingredient in this node -
evaluate() → ddf_utils.chef.model.ingredient.Ingredient¶ return the ingredient as is
-
-
class
ddf_utils.chef.model.dag.ProcedureNode(node_id, procedure, chef)¶ Bases:
ddf_utils.chef.model.dag.BaseNodeThe node for storing procedure results
The evaluate() function will run a procedure according to self.procedure, using other nodes’ data. Other nodes will be evaluated if when necessary.
Parameters: procedure (dict) – the procedure dictionary -
evaluate() → ddf_utils.chef.model.ingredient.Ingredient¶
-
ddf_utils.chef.model.ingredient¶
main ingredient class
-
class
ddf_utils.chef.model.ingredient.Ingredient(id: str, key: Union[list, str], value: Union[list, dict, str] = '*', dataset: str = None, data: dict = None, row_filter: dict = None, base_dir: str = './')¶ Bases:
abc.ABCProtocol class for all ingredients.
all ingredients should have following format:
id: example-ingredient dataset: ddf--example--dataset key: "geo,time" # key columns of ingredient value: # only include concepts listed here - concept_1 - concept_2 filter: # select rows by column values geo: # only keep datapoint where `geo` is in [swe, usa, chn] - swe - usa - chn
The other way to define the ingredient data is using the
datakeyword to include external csv file, or inline the data in the ingredient definition. Example:id: example-ingredient key: concept data: external_concepts.csv
On-the-fly ingredient:
id: example-ingredient key: concept data: - concept: concept_1 name: concept_name_1 concept_type: string description: concept_description_1 - concept: concept_2 name: concept_name_2 concept_type: measure description: concept_description_2
-
dataset_path¶ return the full path to ingredient’s dataset if the ingredient is from local ddf dataset.
-
ddf¶
-
ddf_id¶
-
dtype= 'abc'¶
-
static
filter_row(data: dict, row_filter)¶ return the rows selected by row_filter.
-
classmethod
from_procedure_result(id, key, data_computed: dict)¶
-
get_data()¶
-
ingredient_type¶
-
serve(*args, **kwargs)¶ serving data to disk
-
-
class
ddf_utils.chef.model.ingredient.ConceptIngredient(id: str, key: Union[list, str], value: Union[list, dict, str] = '*', dataset: str = None, data: dict = None, row_filter: dict = None, base_dir: str = './')¶ Bases:
ddf_utils.chef.model.ingredient.Ingredient-
dtype= 'concepts'¶
-
get_data() → Dict[str, <Mock name='mock.DataFrame' id='139917546889360'>]¶
-
static
get_data_from_ddf_dataset(dataset_path, value, row_filter)¶
-
static
get_data_from_external_csv(file_path, key, row_filter)¶
-
static
get_data_from_inline_data(data, key, row_filter)¶
-
serve(outpath, **options)¶ serving data to disk
-
-
class
ddf_utils.chef.model.ingredient.EntityIngredient(id: str, key: Union[list, str], value: Union[list, dict, str] = '*', dataset: str = None, data: dict = None, row_filter: dict = None, base_dir: str = './')¶ Bases:
ddf_utils.chef.model.ingredient.Ingredient-
dtype= 'entities'¶
-
get_data() → Dict[str, <Mock name='mock.DataFrame' id='139917546889360'>]¶
-
static
get_data_from_ddf_dataset(dataset_path, key, value, row_filter)¶
-
static
get_data_from_external_csv(file_path, key, row_filter)¶
-
static
get_data_from_inline_data(data, key, row_filter)¶
-
serve(outpath, **options)¶ serving data to disk
-
-
class
ddf_utils.chef.model.ingredient.DataPointIngredient(id: str, key: Union[list, str], value: Union[list, dict, str] = '*', dataset: str = None, data: dict = None, row_filter: dict = None, base_dir: str = './')¶ Bases:
ddf_utils.chef.model.ingredient.Ingredient-
compute() → Dict[str, <Mock name='mock.DataFrame' id='139917546889360'>]¶ return a pandas dataframe version of self.data
-
dtype= 'datapoints'¶
-
classmethod
from_procedure_result(id, key, data_computed: dict)¶
-
get_data() → Dict[str, <Mock name='mock.dataframe.DataFrame' id='139917523271376'>]¶
-
static
get_data_from_ddf_dataset(id, dataset_path, key, value, row_filter)¶
-
static
get_data_from_external_csv(file_path, key, row_filter)¶
-
static
get_data_from_inline_data(data, key, row_filter)¶
-
serve(outpath, **options)¶ serving data to disk
-
-
ddf_utils.chef.model.ingredient.ingredient_from_dict(dictionary: dict, **chef_options) → ddf_utils.chef.model.ingredient.Ingredient¶ create ingredient from recipe definition and options. Parameters for ingredient should be passed in a dictionary. See the doc for 3. Define Ingredients or
ddf_utils.chef.model.ingredient.Ingredientfor available parameters.
-
ddf_utils.chef.model.ingredient.key_to_list(key)¶ make a list that contains primaryKey of this ingredient
-
ddf_utils.chef.model.ingredient.get_ingredient_class(cls)¶