1. Instagram photos reveal predictive markers of depression (2017)
1) Data Collection
The data collection process was crowdsourced using Amazon Mechanical Turk (MTurk).
Separate surveys were created for individuals with depression and healthy controls.
Participants were asked to provide their Instagram usernames and consent to access their posting history,
resulting in a dataset of 43,950 photos from 166 users, of which 71 had a history of depression.
2) Feature Extraction
Several types of features were extracted from the Instagram photos:
(1) Color Analysis
a. Hue, Saturation, and Brightness (HSV):
- Hue: Indicates the color type, ranging from red to blue/purple. Lower hue values indicate more red, and higher values indicate more blue.
- Saturation: Refers to the vividness of the color. Lower saturation makes an image appear gray and faded.
- Brightness: Indicates the lightness or darkness of an image. Lower brightness scores indicate darker images .
(2)Metadata Components:
a. Comments and Likes: Number of comments and likes each photo received were counted to gauge community engagement.
b. Posting Frequency: The number of posts per day was used as a measure of user activity.
c. Filter Usage: The use of Instagram filters was tracked to determine any differences in how depressed and healthy users edited their photos .
(3) Algorithmic Face Detection:
a. Presence and Number of Faces: A face detection algorithm was used to identify and count the number of human faces in each photograph, as an indicator of social activity.
* 1) is the setting in nature or indoors? 2) is it night or day can also be used as features
3) Analysis and Modeling
Two models were trained using Bayesian logistic regression and Random Forest classifiers:
(1) All-data Model: Included all collected data to test if depression markers are observable in Instagram posts .
(2) Pre-diagnosis Model: Used data from healthy participants and pre-diagnosis data from depressed participants to test if depression markers are detectable before a clinical diagnosis.
Both models significantly outperformed a null model, with the All-data model showing higher recall and precision compared to general practitioners' unassisted diagnostic accuracy. The features indicative of depression included increased hue (bluer photos), decreased brightness and saturation (darker and grayer photos), and fewer likes on posts.
4) Human Ratings
Participants were also asked to rate the photos on happiness, sadness, likability, and interestingness. These human ratings showed low correlation with computational features but were still able to distinguish between photos posted by depressed and healthy individuals.
2. Predicting Depression Tendency based on Image, Text and Behavior Data from Instagram (2019)
1) Data Collection
gathered data from Instagram by selecting users who posted with depression-related hashtags and further filtered these users based on their Instagram bio and behavior.
2) Behavior Features
how users interact on social media platforms, specifically Instagram in this context.
(1) Social Behaviors
Social behaviors capture the user's activity and interaction patterns on Instagram. The following features were extracted:
a. post Time: The time at which a user makes a post. This can be significant because people with depression often exhibit irregular sleep patterns and may post during late night hours.
b. number of Posts
c. Day of the Week: The day on which a user posts, to see if there are patterns linked to weekdays versus weekends.
d. Post Frequency
e. number of Following
f. number of Followers
g. number or Likes:
h. number of comments
(2) Writing Behaviors
Writing behaviors refer to the textual content and style in a user's posts. The following features were extracted:
a. Total Word Count: The total number of words in each post.
b. First-Person Pronoun Count: The frequency of first-person pronouns (e.g., "I," "me," "my") which may indicate a focus on self.
c. Depression-Related Hashtags: The number of hashtags related to depression (e.g., #depression, #sad).
d. Hashtag Count: The total number of hashtags used per post.
e. Emoji Count: The number of emojis used, as they can be indicators of mood.
f. Absolutist Word Count: Words that indicate absolute states (e.g., "always," "never," "completely"), which are more frequently used by people experiencing depression.
3) Image Feature Extraction
The image feature extraction process involves using deep learning, specifically Convolutional Neural Networks (CNNs), to automatically extract features from images posted by users. The study used a pre-trained CNN model (VGG16) and applied transfer learning to adapt it to the task of detecting depression from Instagram images.
a. Pre-training: The VGG16 model was pre-trained on the ImageNet dataset, which contains millions of labeled images across thousands of categories. This pre-training helps the model learn general visual features.
b. Transfer Learning: The pre-trained model was then fine-tuned using a smaller dataset specific to the task at hand—classifying images as depressive or non-depressive. Transfer learning leverages the knowledge gained during pre-training to improve performance on the new, related task.
c. Image Processing: Each Instagram image was resized to 224x224 pixels to match the input size expected by the VGG16 model.
d. Feature Extraction: The CNN processes each image to generate a feature vector. Specifically, the output of the first fully connected layer after the convolutional layers was used as the feature vector. This vector captures high-level image features learned by the model.
e. Vector Concatenation: For each user, the feature vectors from all their images were concatenated to form a comprehensive feature set. If a user had fewer images, zero vectors were added to maintain consistent input sizes across all users.
3. A comprehensive review of predictive analytics models for mental illness using machine learning algorithms (2024)
1) Feature Extraction from Images
- Basic Image Features:
- Pixel Values: The raw pixel values can be used as features, although this is often not very effective for complex tasks.
- Color Histograms: Histograms of pixel intensities in different color channels (e.g., RGB) can capture color distribution.
- Edge Detection: Using filters like Sobel, Canny, or Prewitt to detect edges, which are useful for identifying object boundaries.
- Texture Features:
- Local Binary Patterns (LBP): A texture descriptor that compares each pixel to its neighbors.
- Haralick Textures: Descriptors derived from the gray-level co-occurrence matrix (GLCM), capturing information about the texture of an image.
- Shape Features:
- Hough Transform: Detects shapes such as lines and circles.
- Fourier Descriptors: Used for shape analysis by transforming the shape boundary into the frequency domain.
- Feature Descriptors:
- SIFT (Scale-Invariant Feature Transform): Detects and describes local features in images.
- 크기와 회전에 변하지 않는 특징 추출
- SURF (Speeded-Up Robust Features): Similar to SIFT but faster and more efficient.
- ORB (Oriented FAST and Rotated BRIEF): A combination of FAST keypoint detector and BRIEF descriptor.
- SIFT (Scale-Invariant Feature Transform): Detects and describes local features in images.
- Deep Learning Features:
- Convolutional Neural Networks (CNNs): Automatically learn hierarchical feature representations from raw images.
- Pre-trained Models: Using models like VGG, ResNet, or Inception, trained on large datasets, and extracting features from intermediate layers.
2) Feature Extraction from Audio
- Time-Domain Features:
- Zero-Crossing Rate (ZCR): The rate at which the signal changes sign.
- Energy: The sum of squares of the signal amplitude values.
- Frequency-Domain Features:
- Fourier Transform: Converts the time-domain signal into the frequency domain.
- Spectrogram: A visual representation of the spectrum of frequencies in a signal as it varies with time.
- Cepstral Features:
- Mel-Frequency Cepstral Coefficients (MFCCs): Captures the short-term power spectrum of a sound.
- Delta and Delta-Delta MFCCs: Capture the temporal dynamics of MFCCs.
- Pitch and Harmonic Features:
- Pitch: The perceived frequency of a sound.
- Harmonic-to-Noise Ratio (HNR): Measures the amount of harmonic components versus noise.
- Rhythm Features:
- Tempo: The speed of the beat of a piece of music.
- Beat Histograms: Distribution of beats over time.
3) Practical Application Examples
- Image Data:
- For object detection, features like edges (using Canny edge detector) or SIFT descriptors can be useful.
- For texture classification, Haralick features or LBP can be applied.
- For high-level tasks such as image classification, deep features from CNNs are typically used.
- Audio Data:
- For speech recognition, MFCCs are commonly used as they effectively capture the phonetic properties of the speech signal.
- For music genre classification, features like spectral contrast and chroma features might be used.
- For emotion recognition in speech, features such as pitch, energy, and formant frequencies are relevant.
4. Identifying substance use risk based on deep neural networks and Instagram social media data (2019)
The study aims to classify individuals' risk for alcohol, tobacco, and drug use using Instagram data.
1) Data Collection:
- Participants were recruited via the Clickworker crowdsourcing platform and other means.
- Participants completed a survey based on the National Institute on Drug Abuse's Modified ASSIST screener.
- Instagram profile data, including images, captions, and comments, were collected with participants' consent.
2) Handling Imbalanced Dataset:
- The data were skewed towards lower risk categories.
- The researchers converted labels into binary classes for simplicity: "low risk" and "high risk".
- Oversampling techniques were applied to balance the dataset.
3) Feature Extraction:
- Images: Features were extracted using ResNet18, a convolutional neural network pre-trained on ImageNet.
- Text: Features were extracted using a combination of Word2Vec for semantic word representation and LSTM for semantic text representation.
- These features were mapped into a 300-dimensional joint feature space.
4) Aggregation and Predictive Analysis:
- A fixed number of images, captions, and comments were randomly sampled from each user’s data.
- The extracted features were averaged to estimate risk.
- A fully connected neural network layer with softmax normalization and cross-entropy loss function was used for risk estimation.
5. Using computer vision techniques on Instagram to link users’ personalities and genders to the features of their photos: An exploratory study (2018)
1) Data Collection
An online survey of 179 university students was conducted to measure user characteristics, and 25,394 photos in total were downloaded and analyzed from the respondents’ Instagram accounts.
2) User characteristics
The Big Five personality traits and gender
3) Photo Characteristics
(1) Content category
(2) the number of faces
(3) the emotions on the faces -> affections based on PAD model
(4) the pixel derived features
6. Examining how the auditory and lyrical characteristic of songs to which people listen vary according to mental health traits (reviewing)
1) Data Collection
68 college students who regularly listen to music on Melon
2) Feature Extraction
Music can be categorized into auditory and lyrical characteristics
(1) Fistly, in order to analyze the audio data, convert the original data into a mel-spectrogram format by using 'liborsa' library provided in Python
3) Analysis Method
(1) deep learning algorithm
-> convolutional autoencoder algorithm, the Doc2Vect algorithm
(2) a series of regression
(3) Spotify API -> extract following features of each song.
-> energy, valence, danceability, instrucmentalness, loudness, and tempo.