With dating apps becoming a popular way to meet potential partners, new friends or just having fun, it is interesting to explore the perception that users of these apps have about them.
This notebook presents an analysis focused on the qualitative side of their story. The approach has limitations that I’ll describe below, however it enable us to have a general idea of how are these apps being perceived by most of their users.
The data source is available here on Kaggle, for this analysis I’ve focused on the fields that relate to:
Our goal is to create a word cloud that gathers keywords to represent the overall perception of people using the apps.
NOTE: Feel free to unhide the first cell above to read the code if you want to take a closer look into the data processing phases.
The reviews we are about to explore are coming from 3 apps, and most of them, comes specifically from Tinder. The chart below represents the percentage occupied by each.
count_apps
## # A tibble: 3 × 3
## # Groups: App [3]
## App n percentage
## <chr> <int> <dbl>
## 1 Bumble 102384 15.0
## 2 Hinge 52994 7.77
## 3 Tinder 526616 77.2
The average rating any app has is constantly changing, as more users share their feedback and as each company adds or removes features. The chart below shows how the average rate per app has changed from 2013 to 2022.
There is something tricky here, as you can see, Bumble has a perfect 5 on its first month and then suddlently falls to 2, something similar happens on Tinder’s first month as well, Hinge starts very low on their first 3 months. This happens because the less observations you have (regardless of the research field) the more vulnerable your mean will become to outliers and extreme values, to illustrate this, take a look at the chart bellow.
Over their first couple of months, each application had the lower amount of reviews, causing the out of the ordinary mean values we can see above. However, this does not change the fact that our three applications are showing a negative trend.
Before moving to the next section let’s explore one more thing, we know the monthly rating average but, how about the count of each individual score? how likely are our users to rate 5 or 1 vs. the scores in the middle? let’s see.
There is no question, users are more likely to rate an app with the extreme values, 1 or 5, than using something more neutral in the middle. Regardless of the difference in numbers, the patter is quite similar among the three apps.
Working with reviews is something that I think is quite challenging, because every user has their own writting style, some of them will use only 1 o 2 words while other prefer to give more details about their experience, some user avoid to use rude language, while others don’t mind leaving something agressive. These differences make necessary to desing a method to identify and extract the part (or parts) of the review that are useful for our business task.
To tackle this task, I’ve designed a function that will allow me to:
The approach has limitations, some of them are:
Lets demostrate how the function will perform. Lets begin with a simple review like “its a good app”:
word_analysis("its a good app", 5, "Tinder")
## review_final rating app
## 1 good 5 Tinder
The result is exactly what we want, we are removing the word app, as we are expecting all reviews to refer about the app so we keep the word “good”. Now lets complicate things a little bit, let see what happens if our review looks like: “it was a good experience but its too expensive”
word_analysis("it was a good experience but its too expensive",3,"some app")
## review_final rating app
## 1 good 3 some app
## 2 experience 3 some app
## 3 too expensive 3 some app
Again, the result is expected, here we are taking the words “good” and “experience” separately, remember, a word cannot be complement and adjective at the same time, so we are taking “good” as adjective. Finally, we are taking “too expensive” as a group. Lets see a second example.
word_analysis("it's not good very expensive",3,"some app")
## review_final rating app
## 1 not good 3 some app
## 2 very expensive 3 some app
Here, we are changing the direction of good (not good) and the intensity of expensive (very expensive).
The function will also allow multiple complementary words as long as they are together, let’s see two examples.
word_analysis("not very useful",3,"some app")
## review_final rating app
## 1 not very useful 3 some app
word_analysis("just a few real profiles",3,"some app")
## review_final rating app
## 1 few real profiles 3 some app
Now that I’ve show how the function works lets see what happened after appending the real records.
NOTE: I’ve made the word cloud charts following the examples provided on this amazing article by Céline Van den Rul. For these charts, the color is only to make easier to read, does not relate to any variable. Due the long processing times, I’ve decided to create the word cloud charts using a random sample of data.
First, lets filter the reviews by people who rated their app with 5 or 4.
good_cloud <- wordcloud(words = cloud_table_good$review_final,
freq = cloud_table_good$n,
min.freq = 30,
max.words=120,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Paired"),
scale=c(3,1))
Now, lets show the reviews by people who rated from 1 to 3.
bad_cloud <- wordcloud(words = cloud_table_bad$review_final,
freq = cloud_table_bad$n,
min.freq = 50,
max.words=95,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Paired"),
scale=c(3,1))
Lets finish with the combined reviews. This last chart summarizes the good and bad size of dating apps. Please let me know what makes you feel, do you think these apps have a future on the long term? or are they destined to fail and be replaced? do you use them? if so… what is your opinion?
total_cloud <- wordcloud(words = cloud_table_total$review_final,
freq = cloud_table_total$n,
min.freq = 50,
max.words=95,
random.order=FALSE,
rot.per=0.35,
colors=brewer.pal(8, "Paired"),
scale=c(3,1))
RStudio. RStudio Team (2021). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.
Tydiverse. Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686.
wordcloud. Ian Fellows (2018). wordcloud: Word Clouds.http://blog.fellstat.com/?cat=11 http://www.fellstat.com.
RColorBrewer. Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes. R package version 1.1-2.
stringr. Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. http://stringr.tidyverse.org, https://github.com/tidyverse/stringr.
How to Generate Word Clouds in R, Van den Rul, Céline (2019), https://towardsdatascience.com/create-a-word-cloud-with-r-bde3e7422e8a