ddf_utils.factory package¶

Submodules¶

ddf_utils.factory.common module¶

class ddf_utils.factory.common.DataFactory¶

Bases: abc.ABC

bulk_download(*args, **kwargs)¶

has_newer_source(*args, **kwargs)¶

load_metadata(*args, **kwargs)¶

ddf_utils.factory.common.download(url, out_file, session=None, resume=True, method='GET', post_data=None, retry_times=5, backoff=0.5, progress_bar=True, timeout=30)¶

Download a url, and optionally try to resume it.

Parameters:

url (str) – URL to be downloaded
out_file (filepath) – output file path
session (requests session object) – Please note that if you want to use requests_retry_session, you must not use resume=True
resume (bool) – whether to resume the download
method (str) – could be “GET” or “POST”. When posting you can pass a dictionary to post_data
post_data (dict) –
times (int) –
backoff (float) –
progress_bar (bool) – whether to display a progress bar
timeout (int) – maximum time to wait for connect/read server responses. (Note: not the time limit for total response)

ddf_utils.factory.common.requests_retry_session(retries=5, backoff_factor=0.3, status_forcelist=(500, 502, 504), session=None)¶

ddf_utils.factory.common.retry(times=5, backoff=0.5, exceptions=<class 'Exception'>)¶: general wrapper to retry things

ddf_utils.factory.clio_infra module¶

functions for scraping data from clio infra website.

Source link: Clio-infra website

class ddf_utils.factory.clio_infra.ClioInfraLoader¶

Bases: ddf_utils.factory.common.DataFactory

bulk_download(out_dir, data_type=None)¶

has_newer_source(ver)¶

load_metadata()¶

url = 'https://clio-infra.eu/index.html'¶

ddf_utils.factory.ihme module¶

Functions for IHME

The GBD result tool at IHME contains all data for GBD results, but they don’t have an open API to query the data. However the website uses a json endpoint and it doesn’t need authorization. So we also make use of it.

class ddf_utils.factory.ihme.IHMELoader¶

Bases: ddf_utils.factory.common.DataFactory

bulk_download(out_dir, version, context, **kwargs)¶

download the selected contexts/queries from GBD result tools.

context could be a string or a list of strings. The complete query will be generated with _make_query method and all keywork args. When context is a list, multiple queries will be run.

download_links(url)¶

has_newer_source(ver)¶

load_metadata()¶: load all codes used in GBD in a dictionary.

url_data = 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/download.php'¶

url_hir = 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/hierarchy/'¶

url_metadata = 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/metadata/'¶

url_task = 'https://s3.healthdata.org/gbd-api-2019-public/{hash}'¶

url_version = 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/version/'¶

ddf_utils.factory.ilo module¶

Functions for scraping ILO datasets

using the bulk downloader, see its doc.

class ddf_utils.factory.ilo.ILOLoader¶

Bases: ddf_utils.factory.common.DataFactory

bulk_download(out_dir, indicators: list, pool_size=5)¶: Download a list of indicators simultaneously.

download(i, out_dir)¶: Download an indicator to out_dir.

has_newer_source(indicator, date)¶: check if an indicator’s last modified date is newer than given date.

indicator_meta_url_tmpl = 'http://www.ilo.org/ilostat-files/WEB_bulk_download/indicator/table_of_contents_{lang}.csv'¶

load_metadata(table='indicator', lang='en')¶

get code list for a specified table and language.

Check ILO doc for all available tables and languages.

main_url = 'http://www.ilo.org/ilostat-files/WEB_bulk_download/'¶

other_meta_url_tmpl = 'http://www.ilo.org/ilostat-files/WEB_bulk_download/dic/{table}_{lang}.csv'¶

ddf_utils.factory.oecd module¶

Functions for scraping OECD website using their SDMX API

source link OECD website

class ddf_utils.factory.oecd.OECDLoader¶

Bases: ddf_utils.factory.common.DataFactory

bulk_download(out_dir, dataset)¶: download the full json, including observation/dimension lists.

data_url_tmpl = 'http://stats.oecd.org/SDMX-JSON/data/{dataset}/all/all'¶

datastructure_url_tmpl = 'http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/{dataset}'¶

has_newer_source(dataset, version)¶

load_metadata()¶

metadata_url = 'http://stats.oecd.org/RestSDMX/sdmx.ashx/GetKeyFamily/all'¶

ddf_utils.factory.worldbank module¶

Functions to load data from Worldbank API.

We use its bulkdownload utilities.

Source link: WorldBank website

class ddf_utils.factory.worldbank.WorldBankLoader¶

Bases: ddf_utils.factory.common.DataFactory

T.B.D

bulk_download(dataset, out_dir, **kwargs)¶

has_newer_source(dataset, date)¶

load_metadata()¶

url = 'http://api.worldbank.org/v2/datacatalog?format=json'¶