emocodes.analysis
Submodules
Package Contents
Classes
This class can be used to compute metrics of interrater reliability from a list of dataframes/codes (with |
|
This class can be used to compute the consensus (percent overlap) between two or more sets of codes. |
|
This class produces a summary report of video features to help users judge the suitability of each feature for |
Functions
|
This function takes a list of dataframes (one per rater) and stacks them, preserving the time index. |
|
This function computes the interrater ICCs using the Pingouin library. By default it computes the absolute agreement |
|
This function computes the percent overlap between ratings. It can be run with a reference file that all code files |
|
This function compares two columns of the same name across two input dataframes and returns a dataframe of segments |
|
This function plots a heatmap. |
|
This function plots variance inflation factor scores with the horizontal lines denoting the standard cut offs: |
|
This function computes the pair-wise instantaneous phase synchrony (IPS) between columns in a dataframe. It returns |
|
Computes the pair-wise Spearman correlation coefficient for a set of features. |
|
Wraps the pliers variance inflation factor command. Computes the variance inflation factor for the specified |
|
This function creates a hemodynamic response function timeseries. |
|
This function convolves a hemodynamic response function with each column in a timeseries dataframe. |
- class emocodes.analysis.InterraterReliability
This class can be used to compute metrics of interrater reliability from a list of dataframes/codes (with identical column names).
- df_list_to_long_df(self, list_of_codes, list_of_raters=None)
This method combines input dataframes in to one long, stacked dataframe, differentiating each by “rater”. The index is presumed to be time and is preserved.
- Parameters
list_of_codes (list) – list of DataFrame objects OR filepaths to CSVs containing DataFrame objects to stack.
list_of_raters (list) – Optional. Custom list of rater names/identifiers. If none are entered, defaults to naming as “rater01, rater02…” and so on.
- compute_iccs(self, column_labels=None)
This method computes the intraclass correlation across raters in a dataset.
- Parameters
column_labels (list) – List of string column labels to compute ICCs for. Default is “None”, which computes ICCs for all columns not the index/time column and “rater”.
- save_iccs(self, out_file_name)
This function saves the ICC results table to a CSV.
- Parameters
out_file_name (str) – File path and file name to save the ICC results table.
- compute_compile_iccs(self, list_of_codes, list_of_raters=None, column_labels=None, out_file_name='interrater_iccs')
This method takes a list of dataframes and computes the interrater reliability for each column.
- Parameters
list_of_codes (list) – A list of codes to compute the ICCs of. List can be of DataFrame objects or of filepaths to CSVs containing DataFrame objects.
list_of_raters (list) – A list of strings to label the individual raters for the codes in the list_of_codes. If None, this function creates a list of [‘rater01’,’rater02’,..] and so on.
column_labels (list OR None) – The columns to compute ICCs for. If None, will compute ICCs for all columns except for the ‘time’ and ‘rater’ columns.
out_file_name (str) – The filepath and file name to save the ICC results table as.
- class emocodes.analysis.Consensus
This class can be used to compute the consensus (percent overlap) between two or more sets of codes.
Use Case 1: compute overlap between trainee codes and exemplar codes
>>> con = Consensus() >>> con.training_consensus([trainee1_codes_df, trainee2_codes_df], original_codes_df, ['Lizzi','Cat']) >>> con.consensus_scores.to_csv('consensus_scores.csv') #save scores table as a csv >>> con.mismatch_segments.to_csv('mismatched_segments.csv') #save the list of mismatched time segments as a csv
Use Case 2: compute overlap pairwise between 2 or more raters
>>> con = Consensus() >>> con.interrater_consensus([Lizzi_codes_df, Cat_codes_df], ['Lizzi','Cat']) >>> con.consensus_scores.to_csv('consensus_scores.csv') #save scores table as a csv >>> con.mismatch_segments.to_csv('mismatched_segments.csv') #save the list of mismatched time segments as a csv
- training_consensus(self, trainee_codes_list, exemplar_code_file, trainee_list=None)
This method computes consensus ratings for each set of trainee codes against an exemplar/master set. It produces a report of the percent overlap between the codes as well as a list of nonmatching segments.
- Parameters
trainee_codes_list (list) – A list of codes. List can be of DataFrame objects or of filepaths to CSVs containing DataFrame objects.
exemplar_code_file (filepath OR DataFrame) – The DataFrame to compare each of the trainee codes to. Can be the string filename to a CSV or a DataFrame object.
trainee_list (list) – Optional. A list of strings with rater names to use. If None, will automatically assign “rater01”, rater02”, etc.
- interrater_consensus(self, codes_list, rater_list=None)
This method compares a list of codes pairwise and produces 1) a measure of overlap for each code and 2) a list of timestamps for the mismatched segments.
- Parameters
codes_list (list) – List of dataframe objects OR list of file paths to CSVs containing dataframe objects
rater_list (list) – Optional. List of identifiers for the list of codes.
- emocodes.analysis.compile_ratings(list_dfs, list_raters=None)
This function takes a list of dataframes (one per rater) and stacks them, preserving the time index.
- Parameters
list_dfs (list) – A list of DataFrames or CSV files containing dataframes.
list_raters (list) – Default is None. A list of preferred rater names. If none are passed, default is to use “raterXX” (e.g., “rater01’ for the first dataframe)
- Returns
single_df – A single dataframe of the input dataframes stacked, preserving the index.
- Return type
DataFrame
- emocodes.analysis.interrater_iccs(ratings, rater_col_name='rater', index_label='onset_ms', column_labels=None)
This function computes the interrater ICCs using the Pingouin library. By default it computes the absolute agreement between raters assuming a random sample of raters at each target (each rating at each instance). Read more on ICC2 at https://pingouin-stats.org/generated/pingouin.intraclass_corr.html#pingouin.intraclass_corr
- Parameters
index_label (str) – The label denoting each measurement. This must be consistent across all raters. Default is “onset_ms”.
ratings (DataFrame) – DataFrame with the ratings information stored in a long format.
rater_col_name (str) – The name of the column containing rater information. Default is “rater”
column_labels (list) – The list of variables to computer inter-rater ICCs for. Default is None, which means it will compute ICCs for every column in the DataFrame not equal to the rater_col_name or the index_label.
- Returns
icc_df – The dataframe object containing instance-level and overall intraclass correlation values.
- Return type
DataFrame
- emocodes.analysis.compute_exact_match(ratings_list, raters_list, reference)
This function computes the percent overlap between ratings. It can be run with a reference file that all code files are compared against, or it can be run without a reference in which case all codes will be compared pair-wise.
- Parameters
ratings_list (list) – List of dataframe objects or CSV filenames of saved dataframes.
raters_list (list) – List of raters corresponding to each ratings DataFrame in the ratings_list.
reference (DataFrame or filepath or None) – The DataFrame object or CSV filename of the DataFrame object to compare each DataFrame in ratings_list to. If None, this function performs a pair-wise comparison instead.
- Returns
exact_match_stats – A DataFrame with the match statistic for each pair of raters and for each column in the codes.
- Return type
DataFrame
- emocodes.analysis.mismatch_segments_list(df1, df2, time_column=0)
This function compares two columns of the same name across two input dataframes and returns a dataframe of segments that are nonmatching. Units are of whatever the index or time variable is. Note that this function only checks columns that exist in BOTH dataframes.
- Parameters
df1 (DataFrame object OR filepath) – The dataframe to compare to df2. Index must be the time or count variable.
df2 (DataFrame object OR filepath) – The dataframe to compare to df1. Index must be time or count variable
time_column (str OR int) – name or index of column to use as the time variable. Default is 0 (first column)
- Returns
nonmatching_segments – A table listing all the segments during which the code in question is not in agreement between the two sets of ratings. Time is in the same units/notation as the index.
- Return type
DataFrame
- emocodes.analysis.plot_heatmap(data)
This function plots a heatmap.
- Parameters
data (DataFrame) – NxN dataframe to plot.
- Returns
fig – matplotlib figure object of the plot
- Return type
object
- emocodes.analysis.plot_vif(vif_scores)
This function plots variance inflation factor scores with the horizontal lines denoting the standard cut offs:
<2 = not collinear
2-5 = weakly collinear and likely okay to include together in a model
5-10 = moderately collinear, proceed with caution
>10 = highly collinear, do not include together in a multiple linear regression model
- Parameters
vif_scores (Series) – VIF scores to plot.
- Returns
fig – matplotlib figure object of the plot
- Return type
object
- class emocodes.analysis.SummarizeVideoFeatures
This class produces a summary report of video features to help users judge the suitability of each feature for regression analysis. After running the class, a PDF, markdown, and HTML version of the report are saved in the output folder along with a folder of figures.
>>> import emocodes as ec >>> codes = 'video_features.csv' # DataFrame saved as CSV with feature timeseries >>> output = './report' # directory to save the report in >>> report = ec.SummarizeVideoFeatures() >>> report.compile(codes, output)
- compile(self, features, out_dir, convolve_hrf=True, column_names='all', sampling_rate=10, units='s', time_col='index')
This function runs the methods to create a features report.
- Parameters
features (filepath) – A CSV containing a dataframe object with timeseries data for each feature you want to include in the report
out_dir (filepath) – The full or relative path to the folder where you want the report saved to.
convolve_hrf (bool) – Setting to convolve each feature with a double-gamma hemodynamic response function (HRF) before reporting
column_names (list) – The columns to include in the feature analysis
sampling_rate (float) – Sampling rate in Hz (samples per second) of the input data
units (str) – Must be ‘s’, ‘ms’, ‘m’, or ‘h’ indicating seconds, milliseconds, minutes, or hours respectively. The units that the time variable (index) is in.
time_col (str) – The name of the column to use as time if not the index.
- compute_plot_corr(self)
- compute_plot_ips(self)
- compute_plot_vif(self)
- plot_features(self)
- emocodes.analysis.pairwise_ips(features, column_names='all')
This function computes the pair-wise instantaneous phase synchrony (IPS) between columns in a dataframe. It returns both the mean IPS in a NxN matrix as well as a numpy array that is size NxNxT containing the pair-wise IPS at each time point.
- Parameters
features (DataFrame) – The dataframe with signals to be analyzed.
column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.
- Returns
mean_ips_df (DataFrame) – NxN DataFrame with pairwise feature mean phase synchrony
ips_series (numpy array) – NxNxT (feature x feature x time) array with the instantaneous phase synchrony at each timepoint, pairwise
- emocodes.analysis.pairwise_corr(features, column_names='all')
Computes the pair-wise Spearman correlation coefficient for a set of features.
- Parameters
features (DataFrame) – DataFrame with signals to be analyzed.
column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.
- Returns
corr_mat_df – Pairwise Spearman correlations organized into a Pandas DataFrame.
- Return type
DataFrame
- emocodes.analysis.vif_collinear(features, column_names='all')
Wraps the pliers variance inflation factor command. Computes the variance inflation factor for the specified columns in a set of features.
- Parameters
features (DataFrame) – DataFrame with signals to be analyzed.
column_names (list) – List of columns to compare pairwise in the ratings DataFrame. Default is ‘all’.
- Returns
vif_scores – Pandas Series object containing the VIF scores for each column in column_names.
- Return type
Series
- emocodes.analysis.hrf(time, time_to_peak=5, undershoot_dur=12)
This function creates a hemodynamic response function timeseries.
- Parameters
time (numpy array) – a 1D numpy array that makes up the x-axis (time) of our HRF in seconds
time_to_peak (int) – Time to HRF peak in seconds. Default is 5 seconds.
undershoot_dur (int) – Duration of the post-peak undershoot. Default is 12 seconds.
- Returns
hrf_timeseries – The y-values for the HRF at each time point
- Return type
numpy array
- emocodes.analysis.hrf_convolve_features(features, column_names='all', time_col='index', units='s', time_to_peak=5, undershoot_dur=12)
This function convolves a hemodynamic response function with each column in a timeseries dataframe.
- Parameters
features (DataFrame) – A Pandas dataframe with the feature signals to convolve.
column_names (list) – List of columns names to use. Default is “all”
time_col (str) – The name of the time column to use if not the index. Default is “index”.
units (str) – Must be ‘ms’,’s’,’m’, or ‘h’ to denote milliseconds, seconds, minutes, or hours respectively.
time_to_peak (int) – Time to peak for HRF model. Default is 5 seconds.
undershoot_dur (int) – Undershoot duration for HRF model. Default is 12 seconds.
- Returns
convolved_features – The HRF-convolved feature timeseries
- Return type
DataFrame