← Back to Projects

Social Media Sentiment Analysis

Role: Data Analyst
Duration: 6 weeks
Tools: Python, NLP, VADER
NLP Python VADER Text Analysis

Project Overview

Analyzed customer sentiment from 50,000+ social media posts to understand brand perception and identify product improvement opportunities. The project processed posts from Twitter, Reddit, and product review sites using natural language processing techniques.

The analysis delivered actionable insights that directly influenced product development and customer service strategy.

Business Context

The company launched a new product line but lacked systematic insight into customer sentiment. Traditional surveys provided limited feedback with low response rates, while thousands of organic customer comments remained unanalyzed.

The marketing and product teams needed to understand not just whether customers were satisfied, but specifically what they loved, what frustrated them, and what features they wanted most.

Methodology

  • Data collection via APIs and web scraping (Twitter API, Reddit API)
  • Text preprocessing: cleaning, tokenization, lemmatization
  • Sentiment scoring using VADER (Valence Aware Dictionary and sEntiment Reasoner)
  • Topic modeling with Latent Dirichlet Allocation (LDA)
  • Time series analysis of sentiment trends
  • Aspect-based sentiment analysis for specific product features

Key Findings

Overall Sentiment: 52% positive, 31% neutral, 17% negative

Peak Negativity: Spike in negative sentiment 2 weeks post-launch due to shipping delays

Product Quality: 78% positive mentions about build quality

Customer Service: 65% of negative comments related to support response time

Top Request: Battery life mentioned in 1,200+ posts as improvement area

Python Implementation

Core sentiment analysis code:

import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import re

# Initialize VADER
analyzer = SentimentIntensityAnalyzer()

def preprocess_text(text):
    # Remove URLs, mentions, hashtags
    text = re.sub(r'http\S+|www\S+|@\w+|#\w+', '', text)
    # Remove special characters
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    return text.lower().strip()

def get_sentiment(text):
    scores = analyzer.polarity_scores(text)
    if scores['compound'] >= 0.05:
        return 'positive'
    elif scores['compound'] <= -0.05:
        return 'negative'
    else:
        return 'neutral'

# Apply to dataset
df['clean_text'] = df['text'].apply(preprocess_text)
df['sentiment'] = df['clean_text'].apply(get_sentiment)
df['sentiment_score'] = df['clean_text'].apply(
    lambda x: analyzer.polarity_scores(x)['compound']
)

Sentiment Trends Dashboard

Interactive sentiment visualization will be embedded here

Actionable Recommendations

Based on the analysis, three priority actions were identified:

1. Improve Customer Support Response Time

Implement chatbot for common queries and expand support team to reduce wait times from 48 hours to 4 hours. This addressed the #1 source of negative sentiment.

2. Enhance Battery Performance

Prioritize battery optimization in next software update and communicate improvement timeline to customers. This addressed the most frequently requested feature.

3. Address Shipping Issues

Partner with alternative logistics providers and provide proactive shipping updates to customers. This would prevent the post-launch sentiment spike in future product releases.

Business Impact

  • All recommendations implemented within 6 weeks of presentation
  • Customer satisfaction score improved from 3.2 to 4.1 stars (28% increase)
  • Negative sentiment decreased by 40% after addressing key issues
  • Support ticket volume reduced by 35% after chatbot deployment
  • Analysis framework adopted for ongoing monthly monitoring
  • Product team integrated sentiment tracking into development roadmap

Technical Challenges

The biggest challenge was handling sarcasm and context-dependent sentiment. For example, "Great, now it won't turn on" tested as positive due to the word "great" but was clearly negative.

This was addressed by:

  • Implementing custom rules for common sarcastic phrases
  • Manual review and labeling of 500+ ambiguous posts
  • Fine-tuning VADER's lexicon with domain-specific terms
  • Adding negation detection to improve accuracy

These improvements increased sentiment classification accuracy from 78% to 91%.

Lessons Learned

This project taught me that sentiment analysis is most powerful when combined with topic modeling. Understanding not just how people feel, but what they're talking about when they express those feelings, provides much richer insights.

It also reinforced the importance of continuous monitoring—sentiment isn't static, and tracking changes over time revealed patterns (like the shipping spike) that a one-time analysis would have missed.