emocodes.processing

Submodules

Package Contents

Classes

CodeTimeSeries

This class processes a Datavyu CSV. converting the codes to a time series for bio-behavioral analysis. In the

ValidateTimeSeries

This class takes a Datavyu-exported CSV and produces a report of the following common problems:

ExtractVideoFeatures

This class can be used to extract the following low-level features from an MP4 file:

Functions

get_code_labels(codes_df)

Pull the unique labels from the Datavyu codes.

convert_timestamps(labels, codes_df, video_duration, sampling_rate, interpolate_gaps=True, do_log=False)

This function performs two steps:

save_timeseries(timeseries_df, outfile_type, outfile_name, do_log=False)

param timeseries_df

The resampled time series from validate_convert_timestamps

values_report(codes_df, labels)

This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for

timestamps_report(codes_df, video_length, labels)

This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for

get_video_length(video_file)

This function checks the length of a video file and returns that value in milliseconds.

resample_video(video_file, sampling_rate)

This function resamples a video to the desired sampling rate. Can be useful for making video with high sampling

extract_visual_features(video_file)

This function extracts luminance, vibrance, saliency, and sharpness from the frames of a video

extract_audio_features(in_file)

This function extracts audio intensity, tempo, and beats from the audio of a video file using the pliers library.

class emocodes.processing.CodeTimeSeries(interpolate_gaps=True, sampling_rate=5, logging_dir='./logs')

This class processes a Datavyu CSV. converting the codes to a time series for bio-behavioral analysis. In the EmoCodes system, this class should be run after the codes are validated and any reported errors are corrected.

Example: Use CodeTimeSeries to convert “datavyu_export.csv” (codes of “myvideo.mp4” completed in Datavyu) to a timeseries file with a sampling rate of 1.2 Hz. This will save a file called “video_codes_time_series.csv” (the default saved file name).

>>> import emocodes as ec
>>> datavyu_file = 'datavyu_export.csv'
>>> video_file = 'myvideo.mp4'
>>> ec.CodeTimeSeries(sampling_rate=1.2).proc_codes_file(datavyu_file, video_file)
Parameters
interpolate_gaps: bool

Defaults is ‘True’. To leave gaps blank (NaNs), set to False.

sampling_rate: float

Default is 5 Hz. Desired output sampling rate in Hz (samples per second).

logging_dir: str

A filepath to a folder to save the processing logs to. Default is to create a new folder within the current directory named “logs” to print logs to.

proc_codes_file(self, codes_file, video_file, save_file_name='video_codes_time_series', file_type='csv')

This method fully processes a Datavyu CSV-exported file to a time series output for further analysis.

Parameters
  • codes_file (str) – File path to the Datavyu outputs to convert.

  • video_file (str) – Filepath to the video MP4 file that was coded in Datavyu

  • save_file_name (str) – Default is “video_code_time_series”. The file path and name to save the outputs as.

  • file_type (str ('csv','excel','tab','space')) – Default is “csv”. The type of file to save the data as.

get_labels(self)

Method to make a list of the code labels in a Datavyu CSV-exported file.

find_video_length(self, video_file)

Method to extract the length of a video MP4 file in milliseconds.

Parameters

str (video_file;) – File path to a video MP4 file

convert_codes(self)

This method converts a Datavyu style output to a long-form, timeseries style output which can be used for further analysis.

save(self, save_file_name='video_codes_time_series', file_type='csv')

This method saves the processed codes as a CSV for further analysis.

Parameters
  • save_file_name (str) – Default is “video_code_time_series”. The file path + name to save the processed codes as.

  • file_type (str ('csv','excel','tab','space')) – Default is “csv”. The type of file to save the data as.

class emocodes.processing.ValidateTimeSeries

This class takes a Datavyu-exported CSV and produces a report of the following common problems:

  • Missing values

  • offsets before onsets

  • offsets of zero

  • not starting with the video

  • not ending with the video file

  • segment durations of zero

The report also gives the following descriptive information:

  • list of unique values per code

  • number of segments per code

  • list of code segments (cells) with problematic data (offsets, onsets, or values)

This report can then be used to go back and clean the coding data in Datavyu before further processing.

Example

>>> import emocodes as ec
>>> codes_file = 'datavyu_export_codes.csv'
>>> video_file = 'myvideo.mp4'
>>> ec.ValidateTimeSeries().run(codes_file, video_file)
run(self, file_name, video_file, report_filename=None)
check_timestamps(self)
check_values(self)
emocodes.processing.get_code_labels(codes_df)

Pull the unique labels from the Datavyu codes.

Parameters

codes_df (DataFrame) – The dataframe of codes created by importing the Datavyu codes using pandas.

Returns

labels – Variable names of the codes from Datavyu

Return type

list

emocodes.processing.convert_timestamps(labels, codes_df, video_duration, sampling_rate, interpolate_gaps=True, do_log=False)

This function performs two steps: 1. Checks for human errors in coding such as incorrect end times or gaps in coding. 2. Convert the timestamps to time series and optionally interpolate across gaps.

Parameters
  • labels (list) – a list of strings which are the unique column variable labels in the dataframe. The output of the get_code_labels function.

  • codes_df (DataFrame) – the dataframe of Datavyu codes

  • video_duration (int) – The length of video in milliseconds, the output of get_video_length

  • sampling_rate (int) – The sampling rate in Hz that the file should be saved as.

  • interpolate_gaps (bool) – Default is set to True. If you wish for gaps to be preserved as NaNs, set to False.

  • do_log (bool) – Default is set to False. If logging is being used, set to True.

Returns

timeseries_df – The resampled code time series

Return type

DataFrame

emocodes.processing.save_timeseries(timeseries_df, outfile_type, outfile_name, do_log=False)
Parameters
  • timeseries_df (DataFrame) – The resampled time series from validate_convert_timestamps

  • outfile_type (str ('csv','excel','tab','space')) – the file type to save the time series as.

  • outfile_name (str) – The file prefix for the output file.

  • do_log (bool) – Default is set to False. If logging is being used, set to True.

emocodes.processing.values_report(codes_df, labels)

This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for each code label: - number of values coded - number of blank values found - list of code segments with blank values (correspond to cells in Datavyu) - list of unique values found

Parameters
  • codes_df (DataFrame) – Pandas dataframe object with Datavyu CSV data.

  • labels (list) – List of unique code labels included in codes_df (output of get_code_labels function)

Returns

summary_report – A dataframe with the report for each code in codes_df

Return type

DataFrame

emocodes.processing.timestamps_report(codes_df, video_length, labels)

This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for each code label timestamps: - missing offsets - offsets of zero - offsets labeled as before their corresponding onsets - whether or not the code starts at zero - whether or not the code ends with the video - overlapping onsets or offsets - a list of segments with potentially bad timestamps

Parameters
  • codes_df (DataFrame) – A pandas DataFrame object that includes the Datavyu-exported values.

  • video_length (int) – video length in milliseconds (output of get_video_length function)

  • labels (list) – List of unique code labels included in codes_df (output of get_code_labels function)

Returns

summary_report – A pandas dataframe with the report for each code in codes_df

Return type

DataFrame

class emocodes.processing.ExtractVideoFeatures

This class can be used to extract the following low-level features from an MP4 file:

  • Luminance: The frame-by-frame brightness level

  • Vibrance: The variance of color channels of each frame

  • Saliency: Fraction of highly salient visual information for each frame according to the Itti & Koch algorithm: https://doi.org/10.1109/34.730558

  • Sharpness: Degree of blur or sharpness of each frame

  • Dynamic Tempo: the rolling tempo of the audio track

  • Loudness: Operationalized as the root-mean-square of the audio amplitude

  • Beats: if a musical beat falls on that timestamp. For files of less than 30Hz, this variable is likely not useful.

Example Usage:
>>> import emocodes as ec
>>> video_file = 'video.mp4'
>>> sampling_rate = 5 # in Hz
>>> outfile = 'outputs/video_features'
>>> features_df = ec.ExtractVideoFeatures().extract_features(video_file, sampling_rate, outfile)
extract_features(self, video_file, sampling_rate=1, outfile=None)

This method extracts the frame-by-frame visual and aural features from an MP4 file.

Parameters
  • video_file (str) – The filepath to the video file to be processed. Must be MP4

  • sampling_rate (float) – The desired output sampling rate in Hz.

  • outfile (str) – Optional. The desired output filename for the features CSV. If None, defaults to the path and name of the MP4 video file with ‘.mp4’ replaced with ‘_features.csv’

extract_audio_features(self, video_file)

This method extracts the frame by frame audio features from a video input.

Parameters

video_file (str) – The filepath to the video file to be processed. Must be MP4.

extract_visual_features(self, video_file)

This method extracts the visual features from the video input. :param video_file: The filepath to the video file to be processed. Must be MP4 :type video_file: str

resample_features(self, sampling_rate)

This method resamples the available feature dataframes.

emocodes.processing.get_video_length(video_file)

This function checks the length of a video file and returns that value in milliseconds.

Parameters

video_file (str) – The path to the video file that was coded

Returns

file_duration – The duration of the file in milliseconds

Return type

float

emocodes.processing.resample_video(video_file, sampling_rate)

This function resamples a video to the desired sampling rate. Can be useful for making video with high sampling rates more tractable for analysis.

Parameters
  • video_file (str) – file path to video to be resampled.

  • sampling_rate (float) – Desired sampling rate in Hz

Returns

resampled_video

Return type

pliers video object with resampled video frames

emocodes.processing.extract_visual_features(video_file)

This function extracts luminance, vibrance, saliency, and sharpness from the frames of a video using the pliers library. If you use this function, please cite the pliers library directly: https://github.com/PsychoinformaticsLab/pliers#how-to-cite

Parameters

video_file (str) – Path to video file to analyze.

Returns

low_level_video_df – Pandas dataframe with a column per low-level feature.py (index is time).

Return type

DataFrame

emocodes.processing.extract_audio_features(in_file)

This function extracts audio intensity, tempo, and beats from the audio of a video file using the pliers library. If you use this function, please cite the pliers library directly: https://github.com/PsychoinformaticsLab/pliers#how-to-cite

Parameters

in_file (str) – file path to video or audio file to be processed

Returns

low_level_audio_df – Pandas dataframe with a column per low-level feature (index is time).

Return type

DataFrame