ddf_utils.factory package

Submodules

ddf_utils.factory.common module

class ddf_utils.factory.common.DataFactory

Bases: abc.ABC

bulk_download(*args, **kwargs)
has_newer_source(*args, **kwargs)
load_metadata(*args, **kwargs)
ddf_utils.factory.common.download(url, out_file, session=None, resume=True, method='GET', post_data=None, retry_times=5, backoff=0.5, progress_bar=True, timeout=30)

Download a url, and optionally try to resume it.

Parameters:
  • url (str) – URL to be downloaded
  • out_file (filepath) – output file path
  • session (requests session object) – Please note that if you want to use requests_retry_session, you must not use resume=True
  • resume (bool) – whether to resume the download
  • method (str) – could be “GET” or “POST”. When posting you can pass a dictionary to post_data
  • post_data (dict) –
  • times (int) –
  • backoff (float) –
  • progress_bar (bool) – whether to display a progress bar
  • timeout (int) – maximum time to wait for connect/read server responses. (Note: not the time limit for total response)
ddf_utils.factory.common.requests_retry_session(retries=5, backoff_factor=0.3, status_forcelist=(500, 502, 504), session=None)
ddf_utils.factory.common.retry(times=5, backoff=0.5, exceptions=<class 'Exception'>)

general wrapper to retry things

ddf_utils.factory.clio_infra module

functions for scraping data from clio infra website.

Source link: Clio-infra website

class ddf_utils.factory.clio_infra.ClioInfraLoader

Bases: ddf_utils.factory.common.DataFactory

bulk_download(out_dir, data_type=None)
has_newer_source(ver)
load_metadata()
url = 'https://clio-infra.eu/index.html'

ddf_utils.factory.ihme module

Functions for IHME

The GBD result tool at IHME contains all data for GBD results, but they don’t have an open API to query the data. However the website uses a json endpoint and it doesn’t need authorization. So we also make use of it.

class ddf_utils.factory.ihme.IHMELoader

Bases: ddf_utils.factory.common.DataFactory

bulk_download(out_dir, version, context, **kwargs)

download the selected contexts/queries from GBD result tools.

context could be a string or a list of strings. The complete query will be generated with _make_query method and all keywork args. When context is a list, multiple queries will be run.

has_newer_source(ver)
load_metadata()

load all codes used in GBD in a dictionary.

url_data = 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/download.php'
url_hir = 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/hierarchy/'
url_metadata = 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/metadata/'
url_task = 'https://s3.healthdata.org/gbd-api-2019-public/{hash}'
url_version = 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/version/'

ddf_utils.factory.ilo module

Functions for scraping ILO datasets

using the bulk downloader, see its doc.

class ddf_utils.factory.ilo.ILOLoader

Bases: ddf_utils.factory.common.DataFactory

bulk_download(out_dir, indicators: list, pool_size=5)

Download a list of indicators simultaneously.

download(i, out_dir)

Download an indicator to out_dir.

has_newer_source(indicator, date)

check if an indicator’s last modified date is newer than given date.

indicator_meta_url_tmpl = 'http://www.ilo.org/ilostat-files/WEB_bulk_download/indicator/table_of_contents_{lang}.csv'
load_metadata(table='indicator', lang='en')

get code list for a specified table and language.

Check ILO doc for all available tables and languages.

main_url = 'http://www.ilo.org/ilostat-files/WEB_bulk_download/'
other_meta_url_tmpl = 'http://www.ilo.org/ilostat-files/WEB_bulk_download/dic/{table}_{lang}.csv'

ddf_utils.factory.oecd module

Functions for scraping OECD website using their SDMX API

source link OECD website

class ddf_utils.factory.oecd.OECDLoader

Bases: ddf_utils.factory.common.DataFactory

bulk_download(out_dir, dataset)

download the full json, including observation/dimension lists.

data_url_tmpl = 'http://stats.oecd.org/SDMX-JSON/data/{dataset}/all/all'
datastructure_url_tmpl = 'http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/{dataset}'
has_newer_source(dataset, version)
load_metadata()
metadata_url = 'http://stats.oecd.org/RestSDMX/sdmx.ashx/GetKeyFamily/all'

ddf_utils.factory.worldbank module

Functions to load data from Worldbank API.

We use its bulkdownload utilities.

Source link: WorldBank website

class ddf_utils.factory.worldbank.WorldBankLoader

Bases: ddf_utils.factory.common.DataFactory

T.B.D

bulk_download(dataset, out_dir, **kwargs)
has_newer_source(dataset, date)
load_metadata()
url = 'http://api.worldbank.org/v2/datacatalog?format=json'