Data

The data module provides simple data loading utilities for the eye-tracking collection. Collection data lives in the repo at data/collection as Parquet files (tracked with Git LFS). Column conventions: primary key (columns starting with group_), labels (columns ending with _label), meta (columns starting with meta_).

eyefeatures.data.list_datasets(collection_dir=None, *, include_extensive_collection=True, extensive_collection_only=False, include_extracted_fixations=True, extracted_fixations_only=False, dataset_type=None)[source]

List available dataset names in the collection directory.

Parameters:

collection_dir (path, optional) – Root directory containing collection Parquet files. Defaults to data/collection (repo data tracked with Git LFS).
include_extensive_collection (bool, default True) – If True, also search in extensive_collection subfolder. Ignored when extensive_collection_only or extracted_fixations_only is True.
extensive_collection_only (bool, default False) – If True, list only datasets from extensive_collection subfolder (main directory is not scanned).
include_extracted_fixations (bool, default True) – If True, also search in extracted_fixations subfolder. Ignored when extensive_collection_only or extracted_fixations_only is True.
extracted_fixations_only (bool, default False) – If True, list only datasets from extracted_fixations subfolder (main directory is not scanned).
dataset_type (str, optional) – If “gaze”, return only gaze datasets (names ending with _gaze/_gazes). If “fixation”, return only fixation datasets (names ending with _fixations/_fixation or default). If None, return all.

Returns:

Sorted list of dataset names (without .parquet extension).

Return type:

list of str

eyefeatures.data.load_dataset(dataset_name, collection_dir=None, *, normalize=True)[source]

Load a collection dataset by name.

Parameters:

dataset_name (str) – Name of the dataset (e.g. “ASD_ready_data_fixations”). Will search for {dataset_name}.parquet in collection_dir.
collection_dir (path, optional) – Root directory containing collection Parquet files. Defaults to data/collection (repo data tracked with Git LFS).
normalize (bool, default True) – If True and dataset has unnormalized x/y columns, normalize them and rename to norm_pos_x/norm_pos_y.

Returns:

DataFrame: loaded and optionally normalized data
meta_info: dict with ‘pk’, ‘labels’, ‘meta’ column lists and ‘info’ (from collection_dir/meta.json under key dataset_name, if present).

Return type:

tuple (DataFrame, meta_info)

eyefeatures.data.get_pk(df)[source]

Get primary key column names (columns starting with group\_).

Parameters:: df (DataFrame) – Benchmark dataset DataFrame.
Returns:: Primary key column names.
Return type:: list of str

eyefeatures.data.get_labels(df)[source]

Get label column names (columns ending with _label).

Parameters:: df (DataFrame) – Benchmark dataset DataFrame.
Returns:: Label column names.
Return type:: list of str

eyefeatures.data.get_meta(df)[source]

Get meta column names (columns starting with meta\_).

Parameters:: df (DataFrame) – Benchmark dataset DataFrame.
Returns:: Meta column names.
Return type:: list of str