emocodes.processing.codes

Module Contents

Classes

CodeTimeSeries

This class processes a Datavyu CSV. converting the codes to a time series for bio-behavioral analysis. In the

ValidateTimeSeries

This class takes a Datavyu-exported CSV and produces a report of the following common problems:

Functions

timestamps_report(codes_df, video_length, labels)

This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for

values_report(codes_df, labels)

This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for

get_code_labels(codes_df)

Pull the unique labels from the Datavyu codes.

convert_timestamps(labels, codes_df, video_duration, sampling_rate, interpolate_gaps=True, do_log=False)

This function performs two steps:

save_timeseries(timeseries_df, outfile_type, outfile_name, do_log=False)

param timeseries_df

The resampled time series from validate_convert_timestamps

class emocodes.processing.codes.CodeTimeSeries(interpolate_gaps=True, sampling_rate=5, logging_dir='./logs')

This class processes a Datavyu CSV. converting the codes to a time series for bio-behavioral analysis. In the EmoCodes system, this class should be run after the codes are validated and any reported errors are corrected.

Example: Use CodeTimeSeries to convert “datavyu_export.csv” (codes of “myvideo.mp4” completed in Datavyu) to a timeseries file with a sampling rate of 1.2 Hz. This will save a file called “video_codes_time_series.csv” (the default saved file name).

>>> import emocodes as ec
>>> datavyu_file = 'datavyu_export.csv'
>>> video_file = 'myvideo.mp4'
>>> ec.CodeTimeSeries(sampling_rate=1.2).proc_codes_file(datavyu_file, video_file)
Parameters
interpolate_gaps: bool

Defaults is ‘True’. To leave gaps blank (NaNs), set to False.

sampling_rate: float

Default is 5 Hz. Desired output sampling rate in Hz (samples per second).

logging_dir: str

A filepath to a folder to save the processing logs to. Default is to create a new folder within the current directory named “logs” to print logs to.

proc_codes_file(self, codes_file, video_file, save_file_name='video_codes_time_series', file_type='csv')

This method fully processes a Datavyu CSV-exported file to a time series output for further analysis.

Parameters
  • codes_file (str) – File path to the Datavyu outputs to convert.

  • video_file (str) – Filepath to the video MP4 file that was coded in Datavyu

  • save_file_name (str) – Default is “video_code_time_series”. The file path and name to save the outputs as.

  • file_type (str ('csv','excel','tab','space')) – Default is “csv”. The type of file to save the data as.

get_labels(self)

Method to make a list of the code labels in a Datavyu CSV-exported file.

find_video_length(self, video_file)

Method to extract the length of a video MP4 file in milliseconds.

Parameters

str (video_file;) – File path to a video MP4 file

convert_codes(self)

This method converts a Datavyu style output to a long-form, timeseries style output which can be used for further analysis.

save(self, save_file_name='video_codes_time_series', file_type='csv')

This method saves the processed codes as a CSV for further analysis.

Parameters
  • save_file_name (str) – Default is “video_code_time_series”. The file path + name to save the processed codes as.

  • file_type (str ('csv','excel','tab','space')) – Default is “csv”. The type of file to save the data as.

class emocodes.processing.codes.ValidateTimeSeries

This class takes a Datavyu-exported CSV and produces a report of the following common problems:

  • Missing values

  • offsets before onsets

  • offsets of zero

  • not starting with the video

  • not ending with the video file

  • segment durations of zero

The report also gives the following descriptive information:

  • list of unique values per code

  • number of segments per code

  • list of code segments (cells) with problematic data (offsets, onsets, or values)

This report can then be used to go back and clean the coding data in Datavyu before further processing.

Example

>>> import emocodes as ec
>>> codes_file = 'datavyu_export_codes.csv'
>>> video_file = 'myvideo.mp4'
>>> ec.ValidateTimeSeries().run(codes_file, video_file)
run(self, file_name, video_file, report_filename=None)
check_timestamps(self)
check_values(self)
emocodes.processing.codes.timestamps_report(codes_df, video_length, labels)

This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for each code label timestamps: - missing offsets - offsets of zero - offsets labeled as before their corresponding onsets - whether or not the code starts at zero - whether or not the code ends with the video - overlapping onsets or offsets - a list of segments with potentially bad timestamps

Parameters
  • codes_df (DataFrame) – A pandas DataFrame object that includes the Datavyu-exported values.

  • video_length (int) – video length in milliseconds (output of get_video_length function)

  • labels (list) – List of unique code labels included in codes_df (output of get_code_labels function)

Returns

summary_report – A pandas dataframe with the report for each code in codes_df

Return type

DataFrame

emocodes.processing.codes.values_report(codes_df, labels)

This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for each code label: - number of values coded - number of blank values found - list of code segments with blank values (correspond to cells in Datavyu) - list of unique values found

Parameters
  • codes_df (DataFrame) – Pandas dataframe object with Datavyu CSV data.

  • labels (list) – List of unique code labels included in codes_df (output of get_code_labels function)

Returns

summary_report – A dataframe with the report for each code in codes_df

Return type

DataFrame

emocodes.processing.codes.get_code_labels(codes_df)

Pull the unique labels from the Datavyu codes.

Parameters

codes_df (DataFrame) – The dataframe of codes created by importing the Datavyu codes using pandas.

Returns

labels – Variable names of the codes from Datavyu

Return type

list

emocodes.processing.codes.convert_timestamps(labels, codes_df, video_duration, sampling_rate, interpolate_gaps=True, do_log=False)

This function performs two steps: 1. Checks for human errors in coding such as incorrect end times or gaps in coding. 2. Convert the timestamps to time series and optionally interpolate across gaps.

Parameters
  • labels (list) – a list of strings which are the unique column variable labels in the dataframe. The output of the get_code_labels function.

  • codes_df (DataFrame) – the dataframe of Datavyu codes

  • video_duration (int) – The length of video in milliseconds, the output of get_video_length

  • sampling_rate (int) – The sampling rate in Hz that the file should be saved as.

  • interpolate_gaps (bool) – Default is set to True. If you wish for gaps to be preserved as NaNs, set to False.

  • do_log (bool) – Default is set to False. If logging is being used, set to True.

Returns

timeseries_df – The resampled code time series

Return type

DataFrame

emocodes.processing.codes.save_timeseries(timeseries_df, outfile_type, outfile_name, do_log=False)
Parameters
  • timeseries_df (DataFrame) – The resampled time series from validate_convert_timestamps

  • outfile_type (str ('csv','excel','tab','space')) – the file type to save the time series as.

  • outfile_name (str) – The file prefix for the output file.

  • do_log (bool) – Default is set to False. If logging is being used, set to True.