emocodes.analysis.codes

Module Contents

Classes

InterraterReliability

This class can be used to compute metrics of interrater reliability from a list of dataframes/codes (with

Consensus

This class can be used to compute the consensus (percent overlap) between two or more sets of codes.

Functions

compile_ratings(list_dfs, list_raters=None)

This function takes a list of dataframes (one per rater) and stacks them, preserving the time index.

interrater_iccs(ratings, rater_col_name='rater', index_label='onset_ms', column_labels=None)

This function computes the interrater ICCs using the Pingouin library. By default it computes the absolute agreement

compute_exact_match(ratings_list, raters_list, reference)

This function computes the percent overlap between ratings. It can be run with a reference file that all code files

mismatch_segments_list(df1, df2, time_column=0)

This function compares two columns of the same name across two input dataframes and returns a dataframe of segments

class emocodes.analysis.codes.InterraterReliability

This class can be used to compute metrics of interrater reliability from a list of dataframes/codes (with identical column names).

df_list_to_long_df(self, list_of_codes, list_of_raters=None)

This method combines input dataframes in to one long, stacked dataframe, differentiating each by “rater”. The index is presumed to be time and is preserved.

Parameters
  • list_of_codes (list) – list of DataFrame objects OR filepaths to CSVs containing DataFrame objects to stack.

  • list_of_raters (list) – Optional. Custom list of rater names/identifiers. If none are entered, defaults to naming as “rater01, rater02…” and so on.

compute_iccs(self, column_labels=None)

This method computes the intraclass correlation across raters in a dataset.

Parameters

column_labels (list) – List of string column labels to compute ICCs for. Default is “None”, which computes ICCs for all columns not the index/time column and “rater”.

save_iccs(self, out_file_name)

This function saves the ICC results table to a CSV.

Parameters

out_file_name (str) – File path and file name to save the ICC results table.

compute_compile_iccs(self, list_of_codes, list_of_raters=None, column_labels=None, out_file_name='interrater_iccs')

This method takes a list of dataframes and computes the interrater reliability for each column.

Parameters
  • list_of_codes (list) – A list of codes to compute the ICCs of. List can be of DataFrame objects or of filepaths to CSVs containing DataFrame objects.

  • list_of_raters (list) – A list of strings to label the individual raters for the codes in the list_of_codes. If None, this function creates a list of [‘rater01’,’rater02’,..] and so on.

  • column_labels (list OR None) – The columns to compute ICCs for. If None, will compute ICCs for all columns except for the ‘time’ and ‘rater’ columns.

  • out_file_name (str) – The filepath and file name to save the ICC results table as.

class emocodes.analysis.codes.Consensus

This class can be used to compute the consensus (percent overlap) between two or more sets of codes.

Use Case 1: compute overlap between trainee codes and exemplar codes

>>> con = Consensus()
>>> con.training_consensus([trainee1_codes_df, trainee2_codes_df], original_codes_df, ['Lizzi','Cat'])
>>> con.consensus_scores.to_csv('consensus_scores.csv') #save scores table as a csv
>>> con.mismatch_segments.to_csv('mismatched_segments.csv') #save the list of mismatched time segments as a csv

Use Case 2: compute overlap pairwise between 2 or more raters

>>> con = Consensus()
>>> con.interrater_consensus([Lizzi_codes_df, Cat_codes_df], ['Lizzi','Cat'])
>>> con.consensus_scores.to_csv('consensus_scores.csv') #save scores table as a csv
>>> con.mismatch_segments.to_csv('mismatched_segments.csv') #save the list of mismatched time segments as a csv
training_consensus(self, trainee_codes_list, exemplar_code_file, trainee_list=None)

This method computes consensus ratings for each set of trainee codes against an exemplar/master set. It produces a report of the percent overlap between the codes as well as a list of nonmatching segments.

Parameters
  • trainee_codes_list (list) – A list of codes. List can be of DataFrame objects or of filepaths to CSVs containing DataFrame objects.

  • exemplar_code_file (filepath OR DataFrame) – The DataFrame to compare each of the trainee codes to. Can be the string filename to a CSV or a DataFrame object.

  • trainee_list (list) – Optional. A list of strings with rater names to use. If None, will automatically assign “rater01”, rater02”, etc.

interrater_consensus(self, codes_list, rater_list=None)

This method compares a list of codes pairwise and produces 1) a measure of overlap for each code and 2) a list of timestamps for the mismatched segments.

Parameters
  • codes_list (list) – List of dataframe objects OR list of file paths to CSVs containing dataframe objects

  • rater_list (list) – Optional. List of identifiers for the list of codes.

emocodes.analysis.codes.compile_ratings(list_dfs, list_raters=None)

This function takes a list of dataframes (one per rater) and stacks them, preserving the time index.

Parameters
  • list_dfs (list) – A list of DataFrames or CSV files containing dataframes.

  • list_raters (list) – Default is None. A list of preferred rater names. If none are passed, default is to use “raterXX” (e.g., “rater01’ for the first dataframe)

Returns

single_df – A single dataframe of the input dataframes stacked, preserving the index.

Return type

DataFrame

emocodes.analysis.codes.interrater_iccs(ratings, rater_col_name='rater', index_label='onset_ms', column_labels=None)

This function computes the interrater ICCs using the Pingouin library. By default it computes the absolute agreement between raters assuming a random sample of raters at each target (each rating at each instance). Read more on ICC2 at https://pingouin-stats.org/generated/pingouin.intraclass_corr.html#pingouin.intraclass_corr

Parameters
  • index_label (str) – The label denoting each measurement. This must be consistent across all raters. Default is “onset_ms”.

  • ratings (DataFrame) – DataFrame with the ratings information stored in a long format.

  • rater_col_name (str) – The name of the column containing rater information. Default is “rater”

  • column_labels (list) – The list of variables to computer inter-rater ICCs for. Default is None, which means it will compute ICCs for every column in the DataFrame not equal to the rater_col_name or the index_label.

Returns

icc_df – The dataframe object containing instance-level and overall intraclass correlation values.

Return type

DataFrame

emocodes.analysis.codes.compute_exact_match(ratings_list, raters_list, reference)

This function computes the percent overlap between ratings. It can be run with a reference file that all code files are compared against, or it can be run without a reference in which case all codes will be compared pair-wise.

Parameters
  • ratings_list (list) – List of dataframe objects or CSV filenames of saved dataframes.

  • raters_list (list) – List of raters corresponding to each ratings DataFrame in the ratings_list.

  • reference (DataFrame or filepath or None) – The DataFrame object or CSV filename of the DataFrame object to compare each DataFrame in ratings_list to. If None, this function performs a pair-wise comparison instead.

Returns

exact_match_stats – A DataFrame with the match statistic for each pair of raters and for each column in the codes.

Return type

DataFrame

emocodes.analysis.codes.mismatch_segments_list(df1, df2, time_column=0)

This function compares two columns of the same name across two input dataframes and returns a dataframe of segments that are nonmatching. Units are of whatever the index or time variable is. Note that this function only checks columns that exist in BOTH dataframes.

Parameters
  • df1 (DataFrame object OR filepath) – The dataframe to compare to df2. Index must be the time or count variable.

  • df2 (DataFrame object OR filepath) – The dataframe to compare to df1. Index must be time or count variable

  • time_column (str OR int) – name or index of column to use as the time variable. Default is 0 (first column)

Returns

nonmatching_segments – A table listing all the segments during which the code in question is not in agreement between the two sets of ratings. Time is in the same units/notation as the index.

Return type

DataFrame