Data

The data module provides simple data loading utilities for the eye-tracking collection. Collection data lives in the repo at data/collection as Parquet files (tracked with Git LFS). Column conventions: primary key (columns starting with group_), labels (columns ending with _label), meta (columns starting with meta_).

eyefeatures.data.list_datasets(collection_dir=None, *, include_extensive_collection=True, extensive_collection_only=False, include_extracted_fixations=True, extracted_fixations_only=False, dataset_type=None)[source]

List available dataset names in the collection directory.

Parameters:
  • collection_dir (path, optional) – Root directory containing collection Parquet files. Defaults to data/collection (repo data tracked with Git LFS).

  • include_extensive_collection (bool, default True) – If True, also search in extensive_collection subfolder. Ignored when extensive_collection_only or extracted_fixations_only is True.

  • extensive_collection_only (bool, default False) – If True, list only datasets from extensive_collection subfolder (main directory is not scanned).

  • include_extracted_fixations (bool, default True) – If True, also search in extracted_fixations subfolder. Ignored when extensive_collection_only or extracted_fixations_only is True.

  • extracted_fixations_only (bool, default False) – If True, list only datasets from extracted_fixations subfolder (main directory is not scanned).

  • dataset_type (str, optional) – If “gaze”, return only gaze datasets (names ending with _gaze/_gazes). If “fixation”, return only fixation datasets (names ending with _fixations/_fixation or default). If None, return all.

Returns:

Sorted list of dataset names (without .parquet extension).

Return type:

list of str

eyefeatures.data.load_dataset(dataset_name, collection_dir=None, *, normalize=True)[source]

Load a collection dataset by name.

Parameters:
  • dataset_name (str) – Name of the dataset (e.g. “ASD_ready_data_fixations”). Will search for {dataset_name}.parquet in collection_dir.

  • collection_dir (path, optional) – Root directory containing collection Parquet files. Defaults to data/collection (repo data tracked with Git LFS).

  • normalize (bool, default True) – If True and dataset has unnormalized x/y columns, normalize them and rename to norm_pos_x/norm_pos_y.

Returns:

  • DataFrame: loaded and optionally normalized data

  • meta_info: dict with ‘pk’, ‘labels’, ‘meta’ column lists and ‘info’ (from collection_dir/meta.json under key dataset_name, if present).

Return type:

tuple (DataFrame, meta_info)

eyefeatures.data.get_pk(df)[source]

Get primary key column names (columns starting with group\_).

Parameters:

df (DataFrame) – Benchmark dataset DataFrame.

Returns:

Primary key column names.

Return type:

list of str

eyefeatures.data.get_labels(df)[source]

Get label column names (columns ending with _label).

Parameters:

df (DataFrame) – Benchmark dataset DataFrame.

Returns:

Label column names.

Return type:

list of str

eyefeatures.data.get_meta(df)[source]

Get meta column names (columns starting with meta\_).

Parameters:

df (DataFrame) – Benchmark dataset DataFrame.

Returns:

Meta column names.

Return type:

list of str