emocodes.processing.codes
Module Contents
Classes
This class processes a Datavyu CSV. converting the codes to a time series for bio-behavioral analysis. In the |
|
This class takes a Datavyu-exported CSV and produces a report of the following common problems: |
Functions
|
This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for |
|
This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for |
|
Pull the unique labels from the Datavyu codes. |
|
This function performs two steps: |
|
|
- class emocodes.processing.codes.CodeTimeSeries(interpolate_gaps=True, sampling_rate=5, logging_dir='./logs')
This class processes a Datavyu CSV. converting the codes to a time series for bio-behavioral analysis. In the EmoCodes system, this class should be run after the codes are validated and any reported errors are corrected.
Example: Use CodeTimeSeries to convert “datavyu_export.csv” (codes of “myvideo.mp4” completed in Datavyu) to a timeseries file with a sampling rate of 1.2 Hz. This will save a file called “video_codes_time_series.csv” (the default saved file name).
>>> import emocodes as ec >>> datavyu_file = 'datavyu_export.csv' >>> video_file = 'myvideo.mp4' >>> ec.CodeTimeSeries(sampling_rate=1.2).proc_codes_file(datavyu_file, video_file)
- Parameters
- interpolate_gaps: bool
Defaults is ‘True’. To leave gaps blank (NaNs), set to False.
- sampling_rate: float
Default is 5 Hz. Desired output sampling rate in Hz (samples per second).
- logging_dir: str
A filepath to a folder to save the processing logs to. Default is to create a new folder within the current directory named “logs” to print logs to.
- proc_codes_file(self, codes_file, video_file, save_file_name='video_codes_time_series', file_type='csv')
This method fully processes a Datavyu CSV-exported file to a time series output for further analysis.
- Parameters
codes_file (str) – File path to the Datavyu outputs to convert.
video_file (str) – Filepath to the video MP4 file that was coded in Datavyu
save_file_name (str) – Default is “video_code_time_series”. The file path and name to save the outputs as.
file_type (str ('csv','excel','tab','space')) – Default is “csv”. The type of file to save the data as.
- get_labels(self)
Method to make a list of the code labels in a Datavyu CSV-exported file.
- find_video_length(self, video_file)
Method to extract the length of a video MP4 file in milliseconds.
- Parameters
str (video_file;) – File path to a video MP4 file
- convert_codes(self)
This method converts a Datavyu style output to a long-form, timeseries style output which can be used for further analysis.
- save(self, save_file_name='video_codes_time_series', file_type='csv')
This method saves the processed codes as a CSV for further analysis.
- Parameters
save_file_name (str) – Default is “video_code_time_series”. The file path + name to save the processed codes as.
file_type (str ('csv','excel','tab','space')) – Default is “csv”. The type of file to save the data as.
- class emocodes.processing.codes.ValidateTimeSeries
This class takes a Datavyu-exported CSV and produces a report of the following common problems:
Missing values
offsets before onsets
offsets of zero
not starting with the video
not ending with the video file
segment durations of zero
The report also gives the following descriptive information:
list of unique values per code
number of segments per code
list of code segments (cells) with problematic data (offsets, onsets, or values)
This report can then be used to go back and clean the coding data in Datavyu before further processing.
Example
>>> import emocodes as ec >>> codes_file = 'datavyu_export_codes.csv' >>> video_file = 'myvideo.mp4' >>> ec.ValidateTimeSeries().run(codes_file, video_file)
- run(self, file_name, video_file, report_filename=None)
- check_timestamps(self)
- check_values(self)
- emocodes.processing.codes.timestamps_report(codes_df, video_length, labels)
This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for each code label timestamps: - missing offsets - offsets of zero - offsets labeled as before their corresponding onsets - whether or not the code starts at zero - whether or not the code ends with the video - overlapping onsets or offsets - a list of segments with potentially bad timestamps
- Parameters
codes_df (DataFrame) – A pandas DataFrame object that includes the Datavyu-exported values.
video_length (int) – video length in milliseconds (output of get_video_length function)
labels (list) – List of unique code labels included in codes_df (output of get_code_labels function)
- Returns
summary_report – A pandas dataframe with the report for each code in codes_df
- Return type
DataFrame
- emocodes.processing.codes.values_report(codes_df, labels)
This function takes a dataframe of Datavyu-exported codes and produces a report that includes the following for each code label: - number of values coded - number of blank values found - list of code segments with blank values (correspond to cells in Datavyu) - list of unique values found
- Parameters
codes_df (DataFrame) – Pandas dataframe object with Datavyu CSV data.
labels (list) – List of unique code labels included in codes_df (output of get_code_labels function)
- Returns
summary_report – A dataframe with the report for each code in codes_df
- Return type
DataFrame
- emocodes.processing.codes.get_code_labels(codes_df)
Pull the unique labels from the Datavyu codes.
- Parameters
codes_df (DataFrame) – The dataframe of codes created by importing the Datavyu codes using pandas.
- Returns
labels – Variable names of the codes from Datavyu
- Return type
list
- emocodes.processing.codes.convert_timestamps(labels, codes_df, video_duration, sampling_rate, interpolate_gaps=True, do_log=False)
This function performs two steps: 1. Checks for human errors in coding such as incorrect end times or gaps in coding. 2. Convert the timestamps to time series and optionally interpolate across gaps.
- Parameters
labels (list) – a list of strings which are the unique column variable labels in the dataframe. The output of the get_code_labels function.
codes_df (DataFrame) – the dataframe of Datavyu codes
video_duration (int) – The length of video in milliseconds, the output of get_video_length
sampling_rate (int) – The sampling rate in Hz that the file should be saved as.
interpolate_gaps (bool) – Default is set to True. If you wish for gaps to be preserved as NaNs, set to False.
do_log (bool) – Default is set to False. If logging is being used, set to True.
- Returns
timeseries_df – The resampled code time series
- Return type
DataFrame
- emocodes.processing.codes.save_timeseries(timeseries_df, outfile_type, outfile_name, do_log=False)
- Parameters
timeseries_df (DataFrame) – The resampled time series from validate_convert_timestamps
outfile_type (str ('csv','excel','tab','space')) – the file type to save the time series as.
outfile_name (str) – The file prefix for the output file.
do_log (bool) – Default is set to False. If logging is being used, set to True.