`emocodes.analysis`

Submodules

Package Contents

Classes

`InterraterReliability`	This class can be used to compute metrics of interrater reliability from a list of dataframes/codes (with
`Consensus`	This class can be used to compute the consensus (percent overlap) between two or more sets of codes.
`SummarizeVideoFeatures`	This class produces a summary report of video features to help users judge the suitability of each feature for

Functions

`compile_ratings`(list_dfs, list_raters=None)	This function takes a list of dataframes (one per rater) and stacks them, preserving the time index.
`interrater_iccs`(ratings, rater_col_name='rater', index_label='onset_ms', column_labels=None)	This function computes the interrater ICCs using the Pingouin library. By default it computes the absolute agreement
`compute_exact_match`(ratings_list, raters_list, reference)	This function computes the percent overlap between ratings. It can be run with a reference file that all code files
`mismatch_segments_list`(df1, df2, time_column=0)	This function compares two columns of the same name across two input dataframes and returns a dataframe of segments
`plot_heatmap`(data)	This function plots a heatmap.
`plot_vif`(vif_scores)	This function plots variance inflation factor scores with the horizontal lines denoting the standard cut offs:
`pairwise_ips`(features, column_names='all')	This function computes the pair-wise instantaneous phase synchrony (IPS) between columns in a dataframe. It returns
`pairwise_corr`(features, column_names='all')	Computes the pair-wise Spearman correlation coefficient for a set of features.
`vif_collinear`(features, column_names='all')	Wraps the pliers variance inflation factor command. Computes the variance inflation factor for the specified
`hrf`(time, time_to_peak=5, undershoot_dur=12)	This function creates a hemodynamic response function timeseries.
`hrf_convolve_features`(features, column_names='all', time_col='index', units='s', time_to_peak=5, undershoot_dur=12)	This function convolves a hemodynamic response function with each column in a timeseries dataframe.

class emocodes.analysis.InterraterReliability

This class can be used to compute metrics of interrater reliability from a list of dataframes/codes (with identical column names).

df_list_to_long_df(self, list_of_codes, list_of_raters=None)

This method combines input dataframes in to one long, stacked dataframe, differentiating each by “rater”. The index is presumed to be time and is preserved.

Parameters

list_of_codes (list) – list of DataFrame objects OR filepaths to CSVs containing DataFrame objects to stack.
list_of_raters (list) – Optional. Custom list of rater names/identifiers. If none are entered, defaults to naming as “rater01, rater02…” and so on.

compute_iccs(self, column_labels=None)

This method computes the intraclass correlation across raters in a dataset.

Parameters: column_labels (list) – List of string column labels to compute ICCs for. Default is “None”, which computes ICCs for all columns not the index/time column and “rater”.

save_iccs(self, out_file_name)

This function saves the ICC results table to a CSV.

Parameters: out_file_name (str) – File path and file name to save the ICC results table.

compute_compile_iccs(self, list_of_codes, list_of_raters=None, column_labels=None, out_file_name='interrater_iccs')

This method takes a list of dataframes and computes the interrater reliability for each column.

Parameters

list_of_codes (list) – A list of codes to compute the ICCs of. List can be of DataFrame objects or of filepaths to CSVs containing DataFrame objects.
list_of_raters (list) – A list of strings to label the individual raters for the codes in the list_of_codes. If None, this function creates a list of [‘rater01’,’rater02’,..] and so on.
column_labels (list OR None) – The columns to compute ICCs for. If None, will compute ICCs for all columns except for the ‘time’ and ‘rater’ columns.
out_file_name (str) – The filepath and file name to save the ICC results table as.

class emocodes.analysis.Consensus

This class can be used to compute the consensus (percent overlap) between two or more sets of codes.

Use Case 1: compute overlap between trainee codes and exemplar codes

>>> con = Consensus()
>>> con.training_consensus([trainee1_codes_df, trainee2_codes_df], original_codes_df, ['Lizzi','Cat'])
>>> con.consensus_scores.to_csv('consensus_scores.csv') #save scores table as a csv
>>> con.mismatch_segments.to_csv('mismatched_segments.csv') #save the list of mismatched time segments as a csv

Use Case 2: compute overlap pairwise between 2 or more raters

>>> con = Consensus()
>>> con.interrater_consensus([Lizzi_codes_df, Cat_codes_df], ['Lizzi','Cat'])
>>> con.consensus_scores.to_csv('consensus_scores.csv') #save scores table as a csv
>>> con.mismatch_segments.to_csv('mismatched_segments.csv') #save the list of mismatched time segments as a csv

training_consensus(self, trainee_codes_list, exemplar_code_file, trainee_list=None)

This method computes consensus ratings for each set of trainee codes against an exemplar/master set. It produces a report of the percent overlap between the codes as well as a list of nonmatching segments.

Parameters

trainee_codes_list (list) – A list of codes. List can be of DataFrame objects or of filepaths to CSVs containing DataFrame objects.
exemplar_code_file (filepath OR DataFrame) – The DataFrame to compare each of the trainee codes to. Can be the string filename to a CSV or a DataFrame object.
trainee_list (list) – Optional. A list of strings with rater names to use. If None, will automatically assign “rater01”, rater02”, etc.

interrater_consensus(self, codes_list, rater_list=None)

This method compares a list of codes pairwise and produces 1) a measure of overlap for each code and 2) a list of timestamps for the mismatched segments.

Parameters

codes_list (list) – List of dataframe objects OR list of file paths to CSVs containing dataframe objects
rater_list (list) – Optional. List of identifiers for the list of codes.

emocodes.analysis.compile_ratings(list_dfs, list_raters=None)

This function takes a list of dataframes (one per rater) and stacks them, preserving the time index.

Parameters

list_dfs (list) – A list of DataFrames or CSV files containing dataframes.
list_raters (list) – Default is None. A list of preferred rater names. If none are passed, default is to use “raterXX” (e.g., “rater01’ for the first dataframe)

Returns

single_df – A single dataframe of the input dataframes stacked, preserving the index.

Return type

DataFrame

emocodes.analysis.interrater_iccs(ratings, rater_col_name='rater', index_label='onset_ms', column_labels=None)

This function computes the interrater ICCs using the Pingouin library. By default it computes the absolute agreement between raters assuming a random sample of raters at each target (each rating at each instance). Read more on ICC2 at https://pingouin-stats.org/generated/pingouin.intraclass_corr.html#pingouin.intraclass_corr

Parameters

index_label (str) – The label denoting each measurement. This must be consistent across all raters. Default is “onset_ms”.
ratings (DataFrame) – DataFrame with the ratings information stored in a long format.
rater_col_name (str) – The name of the column containing rater information. Default is “rater”
column_labels (list) – The list of variables to computer inter-rater ICCs for. Default is None, which means it will compute ICCs for every column in the DataFrame not equal to the rater_col_name or the index_label.

Returns

icc_df – The dataframe object containing instance-level and overall intraclass correlation values.

Return type

DataFrame

emocodes.analysis.compute_exact_match(ratings_list, raters_list, reference)

This function computes the percent overlap between ratings. It can be run with a reference file that all code files are compared against, or it can be run without a reference in which case all codes will be compared pair-wise.

Parameters

ratings_list (list) – List of dataframe objects or CSV filenames of saved dataframes.
raters_list (list) – List of raters corresponding to each ratings DataFrame in the ratings_list.
reference (DataFrame or filepath or None) – The DataFrame object or CSV filename of the DataFrame object to compare each DataFrame in ratings_list to. If None, this function performs a pair-wise comparison instead.

Returns

exact_match_stats – A DataFrame with the match statistic for each pair of raters and for each column in the codes.

Return type

DataFrame

emocodes.analysis.mismatch_segments_list(df1, df2, time_column=0)

This function compares two columns of the same name across two input dataframes and returns a dataframe of segments that are nonmatching. Units are of whatever the index or time variable is. Note that this function only checks columns that exist in BOTH dataframes.

Parameters

df1 (DataFrame object OR filepath) – The dataframe to compare to df2. Index must be the time or count variable.
df2 (DataFrame object OR filepath) – The dataframe to compare to df1. Index must be time or count variable
time_column (str OR int) – name or index of column to use as the time variable. Default is 0 (first column)

Returns

nonmatching_segments – A table listing all the segments during which the code in question is not in agreement between the two sets of ratings. Time is in the same units/notation as the index.

Return type

DataFrame

emocodes.analysis.plot_heatmap(data)

This function plots a heatmap.

Parameters: data (DataFrame) – NxN dataframe to plot.
Returns: fig – matplotlib figure object of the plot
Return type: object

emocodes.analysis.plot_vif(vif_scores)

This function plots variance inflation factor scores with the horizontal lines denoting the standard cut offs:

<2 = not collinear

2-5 = weakly collinear and likely okay to include together in a model

5-10 = moderately collinear, proceed with caution

>10 = highly collinear, do not include together in a multiple linear regression model

Parameters: vif_scores (Series) – VIF scores to plot.
Returns: fig – matplotlib figure object of the plot
Return type: object

class emocodes.analysis.SummarizeVideoFeatures

This class produces a summary report of video features to help users judge the suitability of each feature for regression analysis. After running the class, a PDF, markdown, and HTML version of the report are saved in the output folder along with a folder of figures.

>>> import emocodes as ec
>>> codes = 'video_features.csv' # DataFrame saved as CSV with feature timeseries
>>> output = './report' # directory to save the report in
>>> report = ec.SummarizeVideoFeatures()
>>> report.compile(codes, output)

compile(self, features, out_dir, convolve_hrf=True, column_names='all', sampling_rate=10, units='s', time_col='index')

This function runs the methods to create a features report.

Parameters

features (filepath) – A CSV containing a dataframe object with timeseries data for each feature you want to include in the report
out_dir (filepath) – The full or relative path to the folder where you want the report saved to.
convolve_hrf (bool) – Setting to convolve each feature with a double-gamma hemodynamic response function (HRF) before reporting
column_names (list) – The columns to include in the feature analysis
sampling_rate (float) – Sampling rate in Hz (samples per second) of the input data
units (str) – Must be ‘s’, ‘ms’, ‘m’, or ‘h’ indicating seconds, milliseconds, minutes, or hours respectively. The units that the time variable (index) is in.
time_col (str) – The name of the column to use as time if not the index.

compute_plot_corr(self)

compute_plot_ips(self)

compute_plot_vif(self)

plot_features(self)

emocodes.analysis.pairwise_ips(features, column_names='all')

This function computes the pair-wise instantaneous phase synchrony (IPS) between columns in a dataframe. It returns both the mean IPS in a NxN matrix as well as a numpy array that is size NxNxT containing the pair-wise IPS at each time point.

Parameters

features (DataFrame) – The dataframe with signals to be analyzed.
column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.

Returns

mean_ips_df (DataFrame) – NxN DataFrame with pairwise feature mean phase synchrony
ips_series (numpy array) – NxNxT (feature x feature x time) array with the instantaneous phase synchrony at each timepoint, pairwise

emocodes.analysis.pairwise_corr(features, column_names='all')

Computes the pair-wise Spearman correlation coefficient for a set of features.

Parameters

features (DataFrame) – DataFrame with signals to be analyzed.
column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.

Returns

corr_mat_df – Pairwise Spearman correlations organized into a Pandas DataFrame.

Return type

DataFrame

emocodes.analysis.vif_collinear(features, column_names='all')

Wraps the pliers variance inflation factor command. Computes the variance inflation factor for the specified columns in a set of features.

Parameters

features (DataFrame) – DataFrame with signals to be analyzed.
column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.

Returns

vif_scores – Pandas Series object containing the VIF scores for each column in column_names.

Return type

Series

emocodes.analysis.hrf(time, time_to_peak=5, undershoot_dur=12)

This function creates a hemodynamic response function timeseries.

Parameters

time (numpy array) – a 1D numpy array that makes up the x-axis (time) of our HRF in seconds
time_to_peak (int) – Time to HRF peak in seconds. Default is 5 seconds.
undershoot_dur (int) – Duration of the post-peak undershoot. Default is 12 seconds.

Returns

hrf_timeseries – The y-values for the HRF at each time point

Return type

numpy array

emocodes.analysis.hrf_convolve_features(features, column_names='all', time_col='index', units='s', time_to_peak=5, undershoot_dur=12)

This function convolves a hemodynamic response function with each column in a timeseries dataframe.

Parameters

features (DataFrame) – A Pandas dataframe with the feature signals to convolve.
column_names (list) – List of columns names to use. Default is “all”
time_col (str) – The name of the time column to use if not the index. Default is “index”.
units (str) – Must be ‘ms’,’s’,’m’, or ‘h’ to denote milliseconds, seconds, minutes, or hours respectively.
time_to_peak (int) – Time to peak for HRF model. Default is 5 seconds.
undershoot_dur (int) – Undershoot duration for HRF model. Default is 12 seconds.

Returns

convolved_features – The HRF-convolved feature timeseries

Return type

DataFrame

emocodes.analysis

Submodules

Package Contents

Classes

Functions

`emocodes.analysis`