ddf_utils.factory package¶
Submodules¶
ddf_utils.factory.common module¶
-
class
ddf_utils.factory.common.
DataFactory
¶ Bases:
abc.ABC
-
bulk_download
(*args, **kwargs)¶
-
has_newer_source
(*args, **kwargs)¶
-
load_metadata
(*args, **kwargs)¶
-
-
ddf_utils.factory.common.
download
(url, out_file, session=None, resume=True, method='GET', post_data=None, retry_times=5, backoff=0.5, progress_bar=True, timeout=30)¶ Download a url, and optionally try to resume it.
Parameters: - url (str) – URL to be downloaded
- out_file (filepath) – output file path
- session (requests session object) – Please note that if you want to use requests_retry_session, you must not use resume=True
- resume (bool) – whether to resume the download
- method (str) – could be “GET” or “POST”. When posting you can pass a dictionary to post_data
- post_data (dict) –
- times (int) –
- backoff (float) –
- progress_bar (bool) – whether to display a progress bar
- timeout (int) – maximum time to wait for connect/read server responses. (Note: not the time limit for total response)
-
ddf_utils.factory.common.
requests_retry_session
(retries=5, backoff_factor=0.3, status_forcelist=(500, 502, 504), session=None)¶
-
ddf_utils.factory.common.
retry
(times=5, backoff=0.5, exceptions=<class 'Exception'>)¶ general wrapper to retry things
ddf_utils.factory.clio_infra module¶
functions for scraping data from clio infra website.
Source link: Clio-infra website
ddf_utils.factory.ihme module¶
Functions for IHME
The GBD result tool at IHME contains all data for GBD results, but they don’t have an open API to query the data. However the website uses a json endpoint and it doesn’t need authorization. So we also make use of it.
-
class
ddf_utils.factory.ihme.
IHMELoader
¶ Bases:
ddf_utils.factory.common.DataFactory
-
bulk_download
(out_dir, version, context, **kwargs)¶ download the selected contexts/queries from GBD result tools.
context
could be a string or a list of strings. The complete query will be generated with_make_query
method and all keywork args. When context is a list, multiple queries will be run.
-
download_links
(url)¶
-
has_newer_source
(ver)¶
-
load_metadata
()¶ load all codes used in GBD in a dictionary.
-
url_data
= 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/download.php'¶
-
url_hir
= 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/hierarchy/'¶
-
url_metadata
= 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/metadata/'¶
-
url_task
= 'https://s3.healthdata.org/gbd-api-2019-public/{hash}'¶
-
url_version
= 'http://ghdx.healthdata.org/sites/all/modules/custom/ihme_query_tool/gbd-search/php/version/'¶
-
ddf_utils.factory.ilo module¶
Functions for scraping ILO datasets
using the bulk downloader, see its doc.
-
class
ddf_utils.factory.ilo.
ILOLoader
¶ Bases:
ddf_utils.factory.common.DataFactory
-
bulk_download
(out_dir, indicators: list, pool_size=5)¶ Download a list of indicators simultaneously.
-
download
(i, out_dir)¶ Download an indicator to out_dir.
-
has_newer_source
(indicator, date)¶ check if an indicator’s last modified date is newer than given date.
-
indicator_meta_url_tmpl
= 'http://www.ilo.org/ilostat-files/WEB_bulk_download/indicator/table_of_contents_{lang}.csv'¶
-
load_metadata
(table='indicator', lang='en')¶ get code list for a specified table and language.
Check ILO doc for all available tables and languages.
-
main_url
= 'http://www.ilo.org/ilostat-files/WEB_bulk_download/'¶
-
other_meta_url_tmpl
= 'http://www.ilo.org/ilostat-files/WEB_bulk_download/dic/{table}_{lang}.csv'¶
-
ddf_utils.factory.oecd module¶
Functions for scraping OECD website using their SDMX API
source link OECD website
-
class
ddf_utils.factory.oecd.
OECDLoader
¶ Bases:
ddf_utils.factory.common.DataFactory
-
bulk_download
(out_dir, dataset)¶ download the full json, including observation/dimension lists.
-
data_url_tmpl
= 'http://stats.oecd.org/SDMX-JSON/data/{dataset}/all/all'¶
-
datastructure_url_tmpl
= 'http://stats.oecd.org/restsdmx/sdmx.ashx/GetDataStructure/{dataset}'¶
-
has_newer_source
(dataset, version)¶
-
load_metadata
()¶
-
metadata_url
= 'http://stats.oecd.org/RestSDMX/sdmx.ashx/GetKeyFamily/all'¶
-
ddf_utils.factory.worldbank module¶
Functions to load data from Worldbank API.
We use its bulkdownload utilities.
Source link: WorldBank website