emocodes.processing
Submodules
Package Contents
Classes
This class processes a Datavyu CSV. converting the codes to a time series for bio-behavioral analysis. In the |
|
This class takes a Datavyu-exported CSV and produces a report of the following common problems: |
|
This class can be used to extract the following low-level features from an MP4 file: |
Functions
|
Pull the unique labels from the Datavyu codes. |
|
This function performs two steps: |
|
|
|
This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for |
|
This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for |
|
This function checks the length of a video file and returns that value in milliseconds. |
|
This function resamples a video to the desired sampling rate. Can be useful for making video with high sampling |
|
This function extracts luminance, vibrance, saliency, and sharpness from the frames of a video |
|
This function extracts audio intensity, tempo, and beats from the audio of a video file using the pliers library. |
- class emocodes.processing.CodeTimeSeries(interpolate_gaps=True, sampling_rate=5, logging_dir='./logs')
This class processes a Datavyu CSV. converting the codes to a time series for bio-behavioral analysis. In the EmoCodes system, this class should be run after the codes are validated and any reported errors are corrected.
Example: Use CodeTimeSeries to convert “datavyu_export.csv” (codes of “myvideo.mp4” completed in Datavyu) to a timeseries file with a sampling rate of 1.2 Hz. This will save a file called “video_codes_time_series.csv” (the default saved file name).
>>> import emocodes as ec >>> datavyu_file = 'datavyu_export.csv' >>> video_file = 'myvideo.mp4' >>> ec.CodeTimeSeries(sampling_rate=1.2).proc_codes_file(datavyu_file, video_file)
- Parameters
- interpolate_gaps: bool
Defaults is ‘True’. To leave gaps blank (NaNs), set to False.
- sampling_rate: float
Default is 5 Hz. Desired output sampling rate in Hz (samples per second).
- logging_dir: str
A filepath to a folder to save the processing logs to. Default is to create a new folder within the current directory named “logs” to print logs to.
- proc_codes_file(self, codes_file, video_file, save_file_name='video_codes_time_series', file_type='csv')
This method fully processes a Datavyu CSV-exported file to a time series output for further analysis.
- Parameters
codes_file (str) – File path to the Datavyu outputs to convert.
video_file (str) – Filepath to the video MP4 file that was coded in Datavyu
save_file_name (str) – Default is “video_code_time_series”. The file path and name to save the outputs as.
file_type (str ('csv','excel','tab','space')) – Default is “csv”. The type of file to save the data as.
- get_labels(self)
Method to make a list of the code labels in a Datavyu CSV-exported file.
- find_video_length(self, video_file)
Method to extract the length of a video MP4 file in milliseconds.
- Parameters
str (video_file;) – File path to a video MP4 file
- convert_codes(self)
This method converts a Datavyu style output to a long-form, timeseries style output which can be used for further analysis.
- save(self, save_file_name='video_codes_time_series', file_type='csv')
This method saves the processed codes as a CSV for further analysis.
- Parameters
save_file_name (str) – Default is “video_code_time_series”. The file path + name to save the processed codes as.
file_type (str ('csv','excel','tab','space')) – Default is “csv”. The type of file to save the data as.
- class emocodes.processing.ValidateTimeSeries
This class takes a Datavyu-exported CSV and produces a report of the following common problems:
Missing values
offsets before onsets
offsets of zero
not starting with the video
not ending with the video file
segment durations of zero
The report also gives the following descriptive information:
list of unique values per code
number of segments per code
list of code segments (cells) with problematic data (offsets, onsets, or values)
This report can then be used to go back and clean the coding data in Datavyu before further processing.
Example
>>> import emocodes as ec >>> codes_file = 'datavyu_export_codes.csv' >>> video_file = 'myvideo.mp4' >>> ec.ValidateTimeSeries().run(codes_file, video_file)
- run(self, file_name, video_file, report_filename=None)
- check_timestamps(self)
- check_values(self)
- emocodes.processing.get_code_labels(codes_df)
Pull the unique labels from the Datavyu codes.
- Parameters
codes_df (DataFrame) – The dataframe of codes created by importing the Datavyu codes using pandas.
- Returns
labels – Variable names of the codes from Datavyu
- Return type
list
- emocodes.processing.convert_timestamps(labels, codes_df, video_duration, sampling_rate, interpolate_gaps=True, do_log=False)
This function performs two steps: 1. Checks for human errors in coding such as incorrect end times or gaps in coding. 2. Convert the timestamps to time series and optionally interpolate across gaps.
- Parameters
labels (list) – a list of strings which are the unique column variable labels in the dataframe. The output of the get_code_labels function.
codes_df (DataFrame) – the dataframe of Datavyu codes
video_duration (int) – The length of video in milliseconds, the output of get_video_length
sampling_rate (int) – The sampling rate in Hz that the file should be saved as.
interpolate_gaps (bool) – Default is set to True. If you wish for gaps to be preserved as NaNs, set to False.
do_log (bool) – Default is set to False. If logging is being used, set to True.
- Returns
timeseries_df – The resampled code time series
- Return type
DataFrame
- emocodes.processing.save_timeseries(timeseries_df, outfile_type, outfile_name, do_log=False)
- Parameters
timeseries_df (DataFrame) – The resampled time series from validate_convert_timestamps
outfile_type (str ('csv','excel','tab','space')) – the file type to save the time series as.
outfile_name (str) – The file prefix for the output file.
do_log (bool) – Default is set to False. If logging is being used, set to True.
- emocodes.processing.values_report(codes_df, labels)
This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for each code label: - number of values coded - number of blank values found - list of code segments with blank values (correspond to cells in Datavyu) - list of unique values found
- Parameters
codes_df (DataFrame) – Pandas dataframe object with Datavyu CSV data.
labels (list) – List of unique code labels included in codes_df (output of get_code_labels function)
- Returns
summary_report – A dataframe with the report for each code in codes_df
- Return type
DataFrame
- emocodes.processing.timestamps_report(codes_df, video_length, labels)
This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for each code label timestamps: - missing offsets - offsets of zero - offsets labeled as before their corresponding onsets - whether or not the code starts at zero - whether or not the code ends with the video - overlapping onsets or offsets - a list of segments with potentially bad timestamps
- Parameters
codes_df (DataFrame) – A pandas DataFrame object that includes the Datavyu-exported values.
video_length (int) – video length in milliseconds (output of get_video_length function)
labels (list) – List of unique code labels included in codes_df (output of get_code_labels function)
- Returns
summary_report – A pandas dataframe with the report for each code in codes_df
- Return type
DataFrame
- class emocodes.processing.ExtractVideoFeatures
This class can be used to extract the following low-level features from an MP4 file:
Luminance: The frame-by-frame brightness level
Vibrance: The variance of color channels of each frame
Saliency: Fraction of highly salient visual information for each frame according to the Itti & Koch algorithm: https://doi.org/10.1109/34.730558
Sharpness: Degree of blur or sharpness of each frame
Dynamic Tempo: the rolling tempo of the audio track
Loudness: Operationalized as the root-mean-square of the audio amplitude
Beats: if a musical beat falls on that timestamp. For files of less than 30Hz, this variable is likely not useful.
- Example Usage:
>>> import emocodes as ec >>> video_file = 'video.mp4' >>> sampling_rate = 5 # in Hz >>> outfile = 'outputs/video_features' >>> features_df = ec.ExtractVideoFeatures().extract_features(video_file, sampling_rate, outfile)
- extract_features(self, video_file, sampling_rate=1, outfile=None)
This method extracts the frame-by-frame visual and aural features from an MP4 file.
- Parameters
video_file (str) – The filepath to the video file to be processed. Must be MP4
sampling_rate (float) – The desired output sampling rate in Hz.
outfile (str) – Optional. The desired output filename for the features CSV. If None, defaults to the path and name of the MP4 video file with ‘.mp4’ replaced with ‘_features.csv’
- extract_audio_features(self, video_file)
This method extracts the frame by frame audio features from a video input.
- Parameters
video_file (str) – The filepath to the video file to be processed. Must be MP4.
- extract_visual_features(self, video_file)
This method extracts the visual features from the video input. :param video_file: The filepath to the video file to be processed. Must be MP4 :type video_file: str
- resample_features(self, sampling_rate)
This method resamples the available feature dataframes.
- emocodes.processing.get_video_length(video_file)
This function checks the length of a video file and returns that value in milliseconds.
- Parameters
video_file (str) – The path to the video file that was coded
- Returns
file_duration – The duration of the file in milliseconds
- Return type
float
- emocodes.processing.resample_video(video_file, sampling_rate)
This function resamples a video to the desired sampling rate. Can be useful for making video with high sampling rates more tractable for analysis.
- Parameters
video_file (str) – file path to video to be resampled.
sampling_rate (float) – Desired sampling rate in Hz
- Returns
resampled_video
- Return type
pliers video object with resampled video frames
- emocodes.processing.extract_visual_features(video_file)
This function extracts luminance, vibrance, saliency, and sharpness from the frames of a video using the pliers library. If you use this function, please cite the pliers library directly: https://github.com/PsychoinformaticsLab/pliers#how-to-cite
- Parameters
video_file (str) – Path to video file to analyze.
- Returns
low_level_video_df – Pandas dataframe with a column per low-level feature.py (index is time).
- Return type
DataFrame
- emocodes.processing.extract_audio_features(in_file)
This function extracts audio intensity, tempo, and beats from the audio of a video file using the pliers library. If you use this function, please cite the pliers library directly: https://github.com/PsychoinformaticsLab/pliers#how-to-cite
- Parameters
in_file (str) – file path to video or audio file to be processed
- Returns
low_level_audio_df – Pandas dataframe with a column per low-level feature (index is time).
- Return type
DataFrame