API Reference#

This page provides an auto-generated summary of ecgtools’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Builder#

ecgtools.Builder(paths[, storage_options, ...])

Generates a catalog from a list of netCDF files or zarr stores

class ecgtools.Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)[source]#

Generates a catalog from a list of netCDF files or zarr stores

Parameters

paths (list of str) – List of paths to crawl for assets/files.
storage_options (dict, optional) – Parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3
depth (int, optional) – Maximum depth to crawl for assets. Default is 0.
exclude_patterns (list of str, optional) – List of glob patterns to exclude from crawling.
include_patterns (list of str, optional) – List of glob patterns to include from crawling.
joblib_parallel_kwargs (dict, optional) – Parameters passed to joblib.Parallel. Default is {}.

build(*, parsing_func, parsing_func_kwargs=None, postprocess_func=None, postprocess_func_kwargs=None)[source]#

Builds a catalog from a list of netCDF files or zarr stores.

Parameters

parsing_func (callable) – Function that parses the asset and returns a dictionary of metadata.
parsing_func_kwargs (dict, optional) – Parameters passed to the parsing function. Default is {}.
postprocess_func (callable, optional) – Function that post-processes the built dataframe and returns a pandas dataframe. Default is None.
postprocess_func_kwargs (dict, optional) – Parameters passed to the post-processing function. Default is {}.

Returns

Builder – The builder object.

clean_dataframe()[source]#: Clean the dataframe by excluding invalid assets and removing duplicate entries.

save(*, name, path_column_name, variable_column_name, data_format, groupby_attrs=None, aggregations=None, esmcat_version='0.0.1', description=None, directory=None, catalog_type='file', to_csv_kwargs=None, json_dump_kwargs=None)[source]#

Persist catalog contents to files.

Parameters

name (str) – The name of the file to save the catalog to.
path_column_name (str) – The name of the column containing the path to the asset. Must be in the header of the CSV file.
variable_column_name (str) – Name of the attribute column in csv file that contains the variable name.
data_format (str) – The data format. Valid values are netcdf and zarr.
aggregations (List[dict]) – List of aggregations to apply to query results, default None
esmcat_version (str) – The ESM Catalog version the collection implements, default None
description (str) – Detailed multi-line description to fully explain the collection, default None
directory (str) – The directory to save the catalog to. If None, use the current directory
catalog_type (str) – The type of catalog to save. Whether to save the catalog table as a dictionary in the JSON file or as a separate CSV file. Valid options are ‘dict’ and ‘file’.
to_csv_kwargs (dict, optional) – Additional keyword arguments passed through to the to_csv() method.
json_dump_kwargs (dict, optional) – Additional keyword arguments passed through to the dump() function.

Returns

Builder – The builder object.

Notes

See https://github.com/NCAR/esm-collection-spec/blob/master/collection-spec/collection-spec.md for more