API Reference

This page provides an auto-generated summary of ecgtools’s API. For more details and examples, refer to the relevant chapters in the main part of the documentation.

Builder

ecgtools.Builder(paths[, storage_options, ...])

Generates a catalog from a list of netCDF files or zarr stores

class ecgtools.Builder(paths, storage_options=None, depth=0, exclude_patterns=None, include_patterns=None, joblib_parallel_kwargs=None)[source]

Generates a catalog from a list of netCDF files or zarr stores

Parameters:
  • paths (list of str) – List of paths to crawl for assets/files.

  • storage_options (dict, optional) – Parameters passed to the backend file-system such as Google Cloud Storage, Amazon Web Service S3

  • depth (int, optional) – Maximum depth to crawl for assets. Default is 0.

  • exclude_patterns (list of str, optional) – List of glob patterns to exclude from crawling.

  • include_patterns (list of str, optional) – List of glob patterns to include from crawling.

  • joblib_parallel_kwargs (dict, optional) – Parameters passed to joblib.Parallel. Default is {}.

build(*, parsing_func, parsing_func_kwargs=None, postprocess_func=None, postprocess_func_kwargs=None)[source]

Builds a catalog from a list of netCDF files or zarr stores.

Parameters:
  • parsing_func (callable) – Function that parses the asset and returns a dictionary of metadata.

  • parsing_func_kwargs (dict, optional) – Parameters passed to the parsing function. Default is {}.

  • postprocess_func (callable, optional) – Function that post-processes the built dataframe and returns a pandas dataframe. Default is None.

  • postprocess_func_kwargs (dict, optional) – Parameters passed to the post-processing function. Default is {}.

Returns:

Builder – The builder object.

clean_dataframe()[source]

Clean the dataframe by excluding invalid assets and removing duplicate entries.

save(*, name, path_column_name, variable_column_name, data_format, groupby_attrs=None, aggregations=None, esmcat_version='0.0.1', description=None, directory=None, catalog_type='file', to_csv_kwargs=None, json_dump_kwargs=None)[source]

Persist catalog contents to files.

Parameters:
  • name (str) – The name of the file to save the catalog to.

  • path_column_name (str) – The name of the column containing the path to the asset. Must be in the header of the CSV file.

  • variable_column_name (str) – Name of the attribute column in csv file that contains the variable name.

  • data_format (str) – The data format. Valid values are netcdf and zarr.

  • aggregations (List[dict]) – List of aggregations to apply to query results, default None

  • esmcat_version (str) – The ESM Catalog version the collection implements, default None

  • description (str) – Detailed multi-line description to fully explain the collection, default None

  • directory (str) – The directory to save the catalog to. If None, use the current directory

  • catalog_type (str) – The type of catalog to save. Whether to save the catalog table as a dictionary in the JSON file or as a separate CSV file. Valid options are ‘dict’ and ‘file’.

  • to_csv_kwargs (dict, optional) – Additional keyword arguments passed through to the to_csv() method.

  • json_dump_kwargs (dict, optional) – Additional keyword arguments passed through to the dump() function.

Returns:

Builder – The builder object.

Notes

See https://github.com/NCAR/esm-collection-spec/blob/master/collection-spec/collection-spec.md for more