Data
The data module provides simple data loading utilities for the eye-tracking
collection. Collection data lives in the repo at data/collection as Parquet
files (tracked with Git LFS). Column conventions: primary key (columns
starting with group_), labels (columns ending with _label), meta
(columns starting with meta_).
- eyefeatures.data.list_datasets(collection_dir=None, *, include_extensive_collection=True, extensive_collection_only=False, include_extracted_fixations=True, extracted_fixations_only=False, dataset_type=None)[source]
List available dataset names in the collection directory.
- Parameters:
collection_dir (path, optional) – Root directory containing collection Parquet files. Defaults to
data/collection(repo data tracked with Git LFS).include_extensive_collection (bool, default True) – If True, also search in extensive_collection subfolder. Ignored when extensive_collection_only or extracted_fixations_only is True.
extensive_collection_only (bool, default False) – If True, list only datasets from extensive_collection subfolder (main directory is not scanned).
include_extracted_fixations (bool, default True) – If True, also search in extracted_fixations subfolder. Ignored when extensive_collection_only or extracted_fixations_only is True.
extracted_fixations_only (bool, default False) – If True, list only datasets from extracted_fixations subfolder (main directory is not scanned).
dataset_type (str, optional) – If “gaze”, return only gaze datasets (names ending with _gaze/_gazes). If “fixation”, return only fixation datasets (names ending with _fixations/_fixation or default). If None, return all.
- Returns:
Sorted list of dataset names (without .parquet extension).
- Return type:
- eyefeatures.data.load_dataset(dataset_name, collection_dir=None, *, normalize=True)[source]
Load a collection dataset by name.
- Parameters:
dataset_name (str) – Name of the dataset (e.g. “ASD_ready_data_fixations”). Will search for {dataset_name}.parquet in collection_dir.
collection_dir (path, optional) – Root directory containing collection Parquet files. Defaults to
data/collection(repo data tracked with Git LFS).normalize (bool, default True) – If True and dataset has unnormalized x/y columns, normalize them and rename to norm_pos_x/norm_pos_y.
- Returns:
DataFrame: loaded and optionally normalized data
meta_info: dict with ‘pk’, ‘labels’, ‘meta’ column lists and ‘info’ (from collection_dir/meta.json under key dataset_name, if present).
- Return type:
tuple (DataFrame, meta_info)