emocodes.analysis

Submodules

Package Contents

Classes

InterraterReliability

This class can be used to compute metrics of interrater reliability from a list of dataframes/codes (with

Consensus

This class can be used to compute the consensus (percent overlap) between two or more sets of codes.

SummarizeVideoFeatures

This class produces a summary report of video features to help users judge the suitability of each feature for

Functions

compile_ratings(list_dfs, list_raters=None)

This function takes a list of dataframes (one per rater) and stacks them, preserving the time index.

interrater_iccs(ratings, rater_col_name='rater', index_label='onset_ms', column_labels=None)

This function computes the interrater ICCs using the Pingouin library. By default it computes the absolute agreement

compute_exact_match(ratings_list, raters_list, reference)

This function computes the percent overlap between ratings. It can be run with a reference file that all code files

mismatch_segments_list(df1, df2, time_column=0)

This function compares two columns of the same name across two input dataframes and returns a dataframe of segments

plot_heatmap(data)

This function plots a heatmap.

plot_vif(vif_scores)

This function plots variance inflation factor scores with the horizontal lines denoting the standard cut offs:

pairwise_ips(features, column_names='all')

This function computes the pair-wise instantaneous phase synchrony (IPS) between columns in a dataframe. It returns

pairwise_corr(features, column_names='all')

Computes the pair-wise Spearman correlation coefficient for a set of features.

vif_collinear(features, column_names='all')

Wraps the pliers variance inflation factor command. Computes the variance inflation factor for the specified

hrf(time, time_to_peak=5, undershoot_dur=12)

This function creates a hemodynamic response function timeseries.

hrf_convolve_features(features, column_names='all', time_col='index', units='s', time_to_peak=5, undershoot_dur=12)

This function convolves a hemodynamic response function with each column in a timeseries dataframe.

class emocodes.analysis.InterraterReliability

This class can be used to compute metrics of interrater reliability from a list of dataframes/codes (with identical column names).

df_list_to_long_df(self, list_of_codes, list_of_raters=None)

This method combines input dataframes in to one long, stacked dataframe, differentiating each by “rater”. The index is presumed to be time and is preserved.

Parameters
  • list_of_codes (list) – list of DataFrame objects OR filepaths to CSVs containing DataFrame objects to stack.

  • list_of_raters (list) – Optional. Custom list of rater names/identifiers. If none are entered, defaults to naming as “rater01, rater02…” and so on.

compute_iccs(self, column_labels=None)

This method computes the intraclass correlation across raters in a dataset.

Parameters

column_labels (list) – List of string column labels to compute ICCs for. Default is “None”, which computes ICCs for all columns not the index/time column and “rater”.

save_iccs(self, out_file_name)

This function saves the ICC results table to a CSV.

Parameters

out_file_name (str) – File path and file name to save the ICC results table.

compute_compile_iccs(self, list_of_codes, list_of_raters=None, column_labels=None, out_file_name='interrater_iccs')

This method takes a list of dataframes and computes the interrater reliability for each column.

Parameters
  • list_of_codes (list) – A list of codes to compute the ICCs of. List can be of DataFrame objects or of filepaths to CSVs containing DataFrame objects.

  • list_of_raters (list) – A list of strings to label the individual raters for the codes in the list_of_codes. If None, this function creates a list of [‘rater01’,’rater02’,..] and so on.

  • column_labels (list OR None) – The columns to compute ICCs for. If None, will compute ICCs for all columns except for the ‘time’ and ‘rater’ columns.

  • out_file_name (str) – The filepath and file name to save the ICC results table as.

class emocodes.analysis.Consensus

This class can be used to compute the consensus (percent overlap) between two or more sets of codes.

Use Case 1: compute overlap between trainee codes and exemplar codes

>>> con = Consensus()
>>> con.training_consensus([trainee1_codes_df, trainee2_codes_df], original_codes_df, ['Lizzi','Cat'])
>>> con.consensus_scores.to_csv('consensus_scores.csv') #save scores table as a csv
>>> con.mismatch_segments.to_csv('mismatched_segments.csv') #save the list of mismatched time segments as a csv

Use Case 2: compute overlap pairwise between 2 or more raters

>>> con = Consensus()
>>> con.interrater_consensus([Lizzi_codes_df, Cat_codes_df], ['Lizzi','Cat'])
>>> con.consensus_scores.to_csv('consensus_scores.csv') #save scores table as a csv
>>> con.mismatch_segments.to_csv('mismatched_segments.csv') #save the list of mismatched time segments as a csv
training_consensus(self, trainee_codes_list, exemplar_code_file, trainee_list=None)

This method computes consensus ratings for each set of trainee codes against an exemplar/master set. It produces a report of the percent overlap between the codes as well as a list of nonmatching segments.

Parameters
  • trainee_codes_list (list) – A list of codes. List can be of DataFrame objects or of filepaths to CSVs containing DataFrame objects.

  • exemplar_code_file (filepath OR DataFrame) – The DataFrame to compare each of the trainee codes to. Can be the string filename to a CSV or a DataFrame object.

  • trainee_list (list) – Optional. A list of strings with rater names to use. If None, will automatically assign “rater01”, rater02”, etc.

interrater_consensus(self, codes_list, rater_list=None)

This method compares a list of codes pairwise and produces 1) a measure of overlap for each code and 2) a list of timestamps for the mismatched segments.

Parameters
  • codes_list (list) – List of dataframe objects OR list of file paths to CSVs containing dataframe objects

  • rater_list (list) – Optional. List of identifiers for the list of codes.

emocodes.analysis.compile_ratings(list_dfs, list_raters=None)

This function takes a list of dataframes (one per rater) and stacks them, preserving the time index.

Parameters
  • list_dfs (list) – A list of DataFrames or CSV files containing dataframes.

  • list_raters (list) – Default is None. A list of preferred rater names. If none are passed, default is to use “raterXX” (e.g., “rater01’ for the first dataframe)

Returns

single_df – A single dataframe of the input dataframes stacked, preserving the index.

Return type

DataFrame

emocodes.analysis.interrater_iccs(ratings, rater_col_name='rater', index_label='onset_ms', column_labels=None)

This function computes the interrater ICCs using the Pingouin library. By default it computes the absolute agreement between raters assuming a random sample of raters at each target (each rating at each instance). Read more on ICC2 at https://pingouin-stats.org/generated/pingouin.intraclass_corr.html#pingouin.intraclass_corr

Parameters
  • index_label (str) – The label denoting each measurement. This must be consistent across all raters. Default is “onset_ms”.

  • ratings (DataFrame) – DataFrame with the ratings information stored in a long format.

  • rater_col_name (str) – The name of the column containing rater information. Default is “rater”

  • column_labels (list) – The list of variables to computer inter-rater ICCs for. Default is None, which means it will compute ICCs for every column in the DataFrame not equal to the rater_col_name or the index_label.

Returns

icc_df – The dataframe object containing instance-level and overall intraclass correlation values.

Return type

DataFrame

emocodes.analysis.compute_exact_match(ratings_list, raters_list, reference)

This function computes the percent overlap between ratings. It can be run with a reference file that all code files are compared against, or it can be run without a reference in which case all codes will be compared pair-wise.

Parameters
  • ratings_list (list) – List of dataframe objects or CSV filenames of saved dataframes.

  • raters_list (list) – List of raters corresponding to each ratings DataFrame in the ratings_list.

  • reference (DataFrame or filepath or None) – The DataFrame object or CSV filename of the DataFrame object to compare each DataFrame in ratings_list to. If None, this function performs a pair-wise comparison instead.

Returns

exact_match_stats – A DataFrame with the match statistic for each pair of raters and for each column in the codes.

Return type

DataFrame

emocodes.analysis.mismatch_segments_list(df1, df2, time_column=0)

This function compares two columns of the same name across two input dataframes and returns a dataframe of segments that are nonmatching. Units are of whatever the index or time variable is. Note that this function only checks columns that exist in BOTH dataframes.

Parameters
  • df1 (DataFrame object OR filepath) – The dataframe to compare to df2. Index must be the time or count variable.

  • df2 (DataFrame object OR filepath) – The dataframe to compare to df1. Index must be time or count variable

  • time_column (str OR int) – name or index of column to use as the time variable. Default is 0 (first column)

Returns

nonmatching_segments – A table listing all the segments during which the code in question is not in agreement between the two sets of ratings. Time is in the same units/notation as the index.

Return type

DataFrame

emocodes.analysis.plot_heatmap(data)

This function plots a heatmap.

Parameters

data (DataFrame) – NxN dataframe to plot.

Returns

fig – matplotlib figure object of the plot

Return type

object

emocodes.analysis.plot_vif(vif_scores)

This function plots variance inflation factor scores with the horizontal lines denoting the standard cut offs:

  • <2 = not collinear

  • 2-5 = weakly collinear and likely okay to include together in a model

  • 5-10 = moderately collinear, proceed with caution

  • >10 = highly collinear, do not include together in a multiple linear regression model

Parameters

vif_scores (Series) – VIF scores to plot.

Returns

fig – matplotlib figure object of the plot

Return type

object

class emocodes.analysis.SummarizeVideoFeatures

This class produces a summary report of video features to help users judge the suitability of each feature for regression analysis. After running the class, a PDF, markdown, and HTML version of the report are saved in the output folder along with a folder of figures.

>>> import emocodes as ec
>>> codes = 'video_features.csv' # DataFrame saved as CSV with feature timeseries
>>> output = './report' # directory to save the report in
>>> report = ec.SummarizeVideoFeatures()
>>> report.compile(codes, output)
compile(self, features, out_dir, convolve_hrf=True, column_names='all', sampling_rate=10, units='s', time_col='index')

This function runs the methods to create a features report.

Parameters
  • features (filepath) – A CSV containing a dataframe object with timeseries data for each feature you want to include in the report

  • out_dir (filepath) – The full or relative path to the folder where you want the report saved to.

  • convolve_hrf (bool) – Setting to convolve each feature with a double-gamma hemodynamic response function (HRF) before reporting

  • column_names (list) – The columns to include in the feature analysis

  • sampling_rate (float) – Sampling rate in Hz (samples per second) of the input data

  • units (str) – Must be ‘s’, ‘ms’, ‘m’, or ‘h’ indicating seconds, milliseconds, minutes, or hours respectively. The units that the time variable (index) is in.

  • time_col (str) – The name of the column to use as time if not the index.

compute_plot_corr(self)
compute_plot_ips(self)
compute_plot_vif(self)
plot_features(self)
emocodes.analysis.pairwise_ips(features, column_names='all')

This function computes the pair-wise instantaneous phase synchrony (IPS) between columns in a dataframe. It returns both the mean IPS in a NxN matrix as well as a numpy array that is size NxNxT containing the pair-wise IPS at each time point.

Parameters
  • features (DataFrame) – The dataframe with signals to be analyzed.

  • column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.

Returns

  • mean_ips_df (DataFrame) – NxN DataFrame with pairwise feature mean phase synchrony

  • ips_series (numpy array) – NxNxT (feature x feature x time) array with the instantaneous phase synchrony at each timepoint, pairwise

emocodes.analysis.pairwise_corr(features, column_names='all')

Computes the pair-wise Spearman correlation coefficient for a set of features.

Parameters
  • features (DataFrame) – DataFrame with signals to be analyzed.

  • column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.

Returns

corr_mat_df – Pairwise Spearman correlations organized into a Pandas DataFrame.

Return type

DataFrame

emocodes.analysis.vif_collinear(features, column_names='all')

Wraps the pliers variance inflation factor command. Computes the variance inflation factor for the specified columns in a set of features.

Parameters
  • features (DataFrame) – DataFrame with signals to be analyzed.

  • column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.

Returns

vif_scores – Pandas Series object containing the VIF scores for each column in column_names.

Return type

Series

emocodes.analysis.hrf(time, time_to_peak=5, undershoot_dur=12)

This function creates a hemodynamic response function timeseries.

Parameters
  • time (numpy array) – a 1D numpy array that makes up the x-axis (time) of our HRF in seconds

  • time_to_peak (int) – Time to HRF peak in seconds. Default is 5 seconds.

  • undershoot_dur (int) – Duration of the post-peak undershoot. Default is 12 seconds.

Returns

hrf_timeseries – The y-values for the HRF at each time point

Return type

numpy array

emocodes.analysis.hrf_convolve_features(features, column_names='all', time_col='index', units='s', time_to_peak=5, undershoot_dur=12)

This function convolves a hemodynamic response function with each column in a timeseries dataframe.

Parameters
  • features (DataFrame) – A Pandas dataframe with the feature signals to convolve.

  • column_names (list) – List of columns names to use. Default is “all”

  • time_col (str) – The name of the time column to use if not the index. Default is “index”.

  • units (str) – Must be ‘ms’,’s’,’m’, or ‘h’ to denote milliseconds, seconds, minutes, or hours respectively.

  • time_to_peak (int) – Time to peak for HRF model. Default is 5 seconds.

  • undershoot_dur (int) – Undershoot duration for HRF model. Default is 12 seconds.

Returns

convolved_features – The HRF-convolved feature timeseries

Return type

DataFrame