Exploratory Data Analysis

Key Takeaways

  • We did a study looking at how many people in Turkey took their own lives from 2007 to 2022. We wanted to understand if the increase in stress and the more news about suicide are making it seem like more people are doing it, or if there’s a real increase in suicides.

  • In our data set, we have details about when people did suicide, why they did it, how old they were, and if they were male or female. This data helps us see the patterns and changes in suicide rates over the twenty years. We can use this data to understand why people do this and how we can help them better. We took the data from TÜİK (Türkiye İstatistik Kurumu)

  • A striking revelation emerges as male suicides prove to be three times more prevalent than female suicides, underscoring a concerning gender disparity.

  • Furthermore, the primary cause of suicides is identified as illness, bringing attention to the crucial intersection of mental health and physical well-being.

  • Notably, teenagers exhibit a pronounced inclination towards suicide, signifying a pressing concern for the mental health of this demographic.

  • Overarching these findings is a troubling trend: the consistent increase in suicide numbers over the studied period.

  • The important things we learned from this study tell a strong story. It doesn’t just show important problems but also says we really need to do something about mental health issues in Turkey, and we need to do it quickly.

Loading Libraries and Datasets

Code
library(tidyverse) 
library(dslabs)
library(readxl)
library(dplyr)
library(ggplot2)
library(viridis)
library(hrbrthemes)
data <- read_excel("C:/Users/kubil/Desktop/intihar.xls")
save(data, file = "semicolon1.RData")
nüfus <- read_excel("C:/Users/kubil/Desktop/nüfus.xlsx")
save(nüfus, file = "semicolon2.RData")

Finding the Suicide Rate Using Suicide and Population Datasets

We have data on the number of suicides between 2002 and 2022, but our population data is limited between 2007 and 2022. Since it is more logical to use suicide rate data, we will use data between 2007 and 2022.

Code
data <- filter(data,Year >= 2007)
colnames(data) <- c("Year","Age","Sex","Total","Illness","Family","Economy","Business","Emotional","Education","Other","Unknown")
nüfus <- filter(nüfus,Yıl >= 2007,Düzey == "TÜRKİYE")
a = 0
popu = sum(as.numeric(nüfus$Toplam))/length(nüfus$Toplam)
for (i in 1:16){
  data1 <- filter(data, Year == 2023-i)
  data1$Total <- data1$Total/nüfus$Toplam[i]*popu
  data1$Illness <- data1$Illness/nüfus$Toplam[i]*popu
  data1$Family <- data1$Family/nüfus$Toplam[i]*popu
  data1$Economy <- data1$Economy/nüfus$Toplam[i]*popu
  data1$Business <- data1$Business/nüfus$Toplam[i]*popu
  data1$Emotional <- data1$Emotional/nüfus$Toplam[i]*popu
  data1$Education <- data1$Education/nüfus$Toplam[i]*popu
  data1$Other <- data1$Other/nüfus$Toplam[i]*popu
  data1$Unknown <- data1$Unknown/nüfus$Toplam[i]*popu
  a <- rbind(a,data1,deparse.level = 0)
}
data = filter(a,Year != 0)

Separeting Population from the Dataset

  • In our analysis, we aimed to examine suicide numbers independently of the population. To achieve this, we divided the number of suicides for each year by the respective population of that year. Afterward, we determined the average population over the 16-year period. Multiplying the calculated values by this average population, we obtained suicide numbers adjusted for population variations. This method allowed us to isolate and understand the suicide trends more accurately, considering the fluctuations in population over the years.

Suicides by Gender

We used the bar plot method to compare suicides by gender regardless of age and reason. You can find the graph and the code we used below.

Code
filtered_data_male <- filter(data, Sex == "Male", Age=="Total")
male = sum(as.numeric(filtered_data_male$Total))
filtered_data_female <- filter(data, Sex == "Female", Age=="Total")
female = sum(as.numeric(filtered_data_female$Total))
data1 <- c(male,female)
barplot(data1, names.arg = c("Male","Female"), col = "skyblue", main = "Number of Suicides by Gender", xlab = "Sex", ylab = "Total")

Code
b1 <- filter(data,Sex=="Male",Age=="Total")
b2 <- filter(data,Sex=="Female",Age=="Total")

slices1 <- c(sum(as.numeric(b1$Total)),sum(as.numeric(b2$Total)))

lbls <- c("Male","Female")
pct <- round(slices1/sum(slices1)*100)
lbls <- paste(lbls, pct)
# add percents to labels
lbls <- paste(lbls,"%",sep="") # ad % to labels
Code
pie(slices1,labels = lbls, col=rainbow(length(lbls)),
   main="Pie Chart of Gender")

-From this graph, we can observe that men are more prone to suicide compared to women.

Suicides by Age Groups

We used the bar plot method to compare suicides by age groups regardless of gender and reason. You can find the graph and the code we used below.

Code
filtered_data_1 <- filter(data,Age=="<15", Sex == "Total")
filtered_data_2 <- filter(data,Age=="15-19", Sex == "Total")
filtered_data_3 <- filter(data,Age=="20-24", Sex == "Total")
filtered_data_4 <- filter(data,Age=="25-29", Sex == "Total")
filtered_data_5 <- filter(data,Age=="30-34", Sex == "Total")
filtered_data_6 <- filter(data,Age=="35-39", Sex == "Total")
filtered_data_7 <- filter(data,Age=="40-44", Sex == "Total")
filtered_data_8 <- filter(data,Age=="45-49", Sex == "Total")
filtered_data_9 <- filter(data,Age=="50-54", Sex == "Total")
filtered_data_10 <- filter(data,Age=="55-59", Sex == "Total")
filtered_data_11 <- filter(data,Age=="60-64", Sex == "Total")
filtered_data_12 <- filter(data,Age=="65-69", Sex == "Total")
filtered_data_13 <- filter(data,Age=="75 +", Sex == "Total")
age1 = sum(as.numeric(filtered_data_1$Total))
age2 = sum(as.numeric(filtered_data_2$Total))
age3 = sum(as.numeric(filtered_data_3$Total))
age4 = sum(as.numeric(filtered_data_4$Total))
age5 = sum(as.numeric(filtered_data_5$Total))
age6 = sum(as.numeric(filtered_data_6$Total))
age7 = sum(as.numeric(filtered_data_7$Total))
age8 = sum(as.numeric(filtered_data_8$Total))
age9 = sum(as.numeric(filtered_data_9$Total))
age10 = sum(as.numeric(filtered_data_10$Total))
age11 = sum(as.numeric(filtered_data_11$Total))
age12 = sum(as.numeric(filtered_data_12$Total))
age13 = sum(as.numeric(filtered_data_13$Total))
data2 = c(age1,age2,age3,age4,age5,age6,age7,age8,age9,age10,age11,age12,age13)
barplot(data2, names.arg = c("<15","15-19","20-24","25-29","30-34","35-39","40-44","45-49","50-54","55-59","60-64","65-69","75 +"), col = "red", main = "Number of Suicides by Age Group", xlab = "Age", ylab = "Total")

When looking at this graph, we observe that suicides are most common among young adults aged 20-24. We can say that as age progresses beyond 25, suicide rates tend to decrease. Another interesting result is that the age groups with the lowest suicide rates are children and the elderly.

Suicides by Year

We used the bar plot method to compare suicides by year regardless of gender, reason and age group. You can find the graph and the code we used below.

Code
ffiltered_data_6 <- filter(data,Year=="2007",Age=="Total", Sex == "Total")
ffiltered_data_7 <- filter(data,Year=="2008",Age=="Total", Sex == "Total")
ffiltered_data_8 <- filter(data,Year=="2009",Age=="Total", Sex == "Total")
ffiltered_data_9 <- filter(data,Year=="2010",Age=="Total", Sex == "Total")
ffiltered_data_10 <- filter(data,Year=="2011",Age=="Total", Sex == "Total")
ffiltered_data_11 <- filter(data,Year=="2012",Age=="Total", Sex == "Total")
ffiltered_data_12 <- filter(data,Year=="2013",Age=="Total", Sex == "Total")
ffiltered_data_13 <- filter(data,Year=="2014",Age=="Total", Sex == "Total")
ffiltered_data_14 <- filter(data,Year=="2015",Age=="Total", Sex == "Total")
ffiltered_data_15 <- filter(data,Year=="2016",Age=="Total", Sex == "Total")
ffiltered_data_16 <- filter(data,Year=="2017",Age=="Total", Sex == "Total")
ffiltered_data_17 <- filter(data,Year=="2018",Age=="Total", Sex == "Total")
ffiltered_data_18 <- filter(data,Year=="2019",Age=="Total", Sex == "Total")
ffiltered_data_19 <- filter(data,Year=="2020",Age=="Total", Sex == "Total")
ffiltered_data_20 <- filter(data,Year=="2021",Age=="Total", Sex == "Total")
ffiltered_data_21 <- filter(data,Year=="2022",Age=="Total", Sex == "Total")

year6 = sum(as.numeric(ffiltered_data_6$Total))
year7 = sum(as.numeric(ffiltered_data_7$Total))
year8 = sum(as.numeric(ffiltered_data_8$Total))
year9 = sum(as.numeric(ffiltered_data_9$Total))
year10 = sum(as.numeric(ffiltered_data_10$Total))
year11 = sum(as.numeric(ffiltered_data_11$Total))
year12 = sum(as.numeric(ffiltered_data_12$Total))
year13 = sum(as.numeric(ffiltered_data_13$Total))
year14 = sum(as.numeric(ffiltered_data_14$Total))
year15 = sum(as.numeric(ffiltered_data_15$Total))
year16 = sum(as.numeric(ffiltered_data_16$Total))
year17 = sum(as.numeric(ffiltered_data_17$Total))
year18 = sum(as.numeric(ffiltered_data_18$Total))
year19 = sum(as.numeric(ffiltered_data_19$Total))
year20 = sum(as.numeric(ffiltered_data_20$Total))
year21 = sum(as.numeric(ffiltered_data_21$Total))
Code
data3 = c(year6,year7,year8,year9,year10,year11,year12,year13,year14,year15,year16,year17,year18,year19,year20,year21)
barplot(data3, names.arg = c("2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020","2021","2022"), col = "yellow", main = "Number of Suicides by Year", xlab = "Year", ylab = "Total")

When looking at this graph, we observe that suicides are most common in 2021 and 2022.

Factors such as people entering social isolation, losing their jobs, and experiencing financial difficulties due to the pandemic have made the phenomenon of suicide more painful.

Suicides by Reason

We used the bar plot method to compare suicides by reason regardless of gender and age group. You can find the graph and the code we used below.

Code
fffiltered_data_1 <- filter(data,Age =="Total", Sex == "Total")
cause1 = sum(as.numeric(fffiltered_data_1$Illness))
cause2 = sum(as.numeric(fffiltered_data_1$Family))
cause3 = sum(as.numeric(fffiltered_data_1$Economy))
cause4 = sum(as.numeric(fffiltered_data_1$Business))
cause5 = sum(as.numeric(fffiltered_data_1$Emotional))
cause6 = sum(as.numeric(fffiltered_data_1$Education))
cause7 = sum(as.numeric(fffiltered_data_1$Other))
unknown <- ifelse(is.na(as.numeric(fffiltered_data_1$Unknown)),0,as.numeric(fffiltered_data_1$Unknown))
cause8 = sum(unknown)
data4 = c(cause1,cause2,cause3,cause4,cause5,cause6,cause7,cause8)
Code
barplot(data4, names.arg = c("Illness","Family","Economy","Business","Emotional","Educational","Other","Unknown"), col = "brown", main = "Number of Suicides by Reason", xlab = "Suicide Reason", ylab = "Total",cex.names=0.72)

This line chart shows the change in the causes of suicide over the years.

Code
# Keep only 3 names
  a = filter(data,Age=="Total", Sex =="Total")
  a1 = data.frame(Year=a$Year,Total=a$Illness,Cause="Illness")
  a2 = data.frame(Year=a$Year,Total=a$Family,Cause="Family")
  a3 = data.frame(Year=a$Year,Total=a$Economy,Cause="Economy")
  a4 = data.frame(Year=a$Year,Total=a$Business,Cause="Business")
  a5 = data.frame(Year=a$Year,Total=a$Emotional,Cause="Emotional")
  a6 = data.frame(Year=a$Year,Total=a$Education,Cause="Education")
  a7 = data.frame(Year=a$Year,Total=a$Other,Cause="Other")
  a8 = data.frame(Year=a$Year,Total=a$Unknown,Cause="Unknown")
  cause = rbind(a1,a2,a3,a4,a5,a6,a7,a8,deparse.level = 0)
Code
graph <- cause %>% 
    filter(Cause %in% c("Illness","Family","Economy","Business","Emotional","Education","Other","Unknown"))
  b = as.numeric(cause$Total)
  c = ifelse(is.na(b),0,b)
  graph %>%
    ggplot( aes(x=Year, y=c, group=Cause, color=Cause)) +
      geom_line() +
      ylab("Number of Suicides") +
      ggtitle("Number of Suicides by Reasons")

We can say that while the number of suicides due to business and education has decreased over time, the number of other suicides has remained almost constant.

We used a pie chart to better analyze the ratios.

Code
# Pie Chart with Percentages
#| codefold: true
#| output: false
slices <- c(
  sum(ifelse(is.na(as.numeric(a$Illness)),0,as.numeric(a$Illness))),
  sum(ifelse(is.na(as.numeric(a$Family)),0,as.numeric(a$Family))) ,
  sum(ifelse(is.na(as.numeric(a$Economy)),0,as.numeric(a$Economy))),
  sum(ifelse(is.na(as.numeric(a$Business)),0,as.numeric(a$Business))),
  sum(ifelse(is.na(as.numeric(a$Emotional)),0,as.numeric(a$Emotional))),
  sum(ifelse(is.na(as.numeric(a$Education)),0,as.numeric(a$Education))),
  sum(ifelse(is.na(as.numeric(a$Other)),0,as.numeric(a$Other))),sum(ifelse(is.na(as.numeric(a$Unknown)),0,as.numeric(a$Unknown))))

lbls <- c("Illness","Family","Economy","Business","Emotional","Education","Other","Unknown")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
# add percents to labels
lbls <- paste(lbls,"%",sep="") # ad % to labels
Code
pie(slices,lbls, col=rainbow(length(lbls)),
   main="Pie Chart of Suicide Reasons",cex=0.8)

As a result, we see that if we ignore the unknowns and others, the Illness has a huge dominance.

Suicides of Males by Reason

To obtain more meaningful results, we separated the data by gender.

We interpreted the reasons why men commit suicide with a line chart.

Code
  a = filter(data,Age=="Total", Sex == "Male")
  a1 = data.frame(Year=a$Year,Total=a$Illness,Cause="Illness")
  a2 = data.frame(Year=a$Year,Total=a$Family,Cause="Family")
  a3 = data.frame(Year=a$Year,Total=a$Economy,Cause="Economy")
  a4 = data.frame(Year=a$Year,Total=a$Business,Cause="Business")
  a5 = data.frame(Year=a$Year,Total=a$Emotional,Cause="Emotional")
  a6 = data.frame(Year=a$Year,Total=a$Education,Cause="Education")
  a7 = data.frame(Year=a$Year,Total=a$Other,Cause="Other")
  a8 = data.frame(Year=a$Year,Total=a$Unknown,Cause="Unknown")
  cause = rbind(a1,a2,a3,a4,a5,a6,a7,a8,deparse.level = 0)
  b = as.numeric(cause$Total)
  c = ifelse(is.na(b),0,b)
Code
graph <- cause %>% 
    filter(Cause %in% c("Illness","Family","Economy","Business","Emotional","Education","Other","Unknown"))
  graph %>%
    ggplot( aes(x=Year, y=c, group=Cause, color=Cause)) +
      geom_line() +
      ylab("Number of Suicides") +
      ggtitle("Number of Males' Suicides by Reasons ")

Code
slices2 = c(sum(as.numeric(a1$Total)),sum(as.numeric(a2$Total)),sum(as.numeric(a3$Total)),sum(as.numeric(a4$Total)),sum(as.numeric(a5$Total)),sum(ifelse(is.na(as.numeric(a6$Total)),0,as.numeric(a6$Total))),sum(as.numeric(a7$Total)),sum(as.numeric(a8$Total)))
name = c("Illness","Family","Economy","Business","Emotional","Education","Other","Unknown")
lbls2 <- name
pct <- round(slices2/sum(slices2)*100)
lbls2 <- paste(lbls2, pct)
# add percents to labels
lbls2 <- paste(lbls2,"%",sep="") # ad % to labels
Code
par(mfrow=c(1,2))     


pie(slices2,lbls2, col=rainbow(length(lbls2)),
   main="Pie Chart of Males' \n Suicide by Reasons",cex=0.8)
pie(slices,lbls, col=rainbow(length(lbls)),
   main="Pie Chart of \n Suicide Reasons",cex=0.8)

Suicides of Females by Reason

Code
  a = filter(data,Age=="Total", Sex == "Female")
  a1 = data.frame(Year=a$Year,Total=a$Illness,Cause="Illness")
  a2 = data.frame(Year=a$Year,Total=a$Family,Cause="Family")
  a3 = data.frame(Year=a$Year,Total=a$Economy,Cause="Economy")
  a4 = data.frame(Year=a$Year,Total=a$Business,Cause="Business")
  a5 = data.frame(Year=a$Year,Total=a$Emotional,Cause="Emotional")
  a6 = data.frame(Year=a$Year,Total=a$Education,Cause="Education")
  a7 = data.frame(Year=a$Year,Total=a$Other,Cause="Other")
  a8 = data.frame(Year=a$Year,Total=a$Unknown,Cause="Unknown")
  cause = rbind(a1,a2,a3,a4,a5,a6,a7,a8,deparse.level = 0)
  b = as.numeric(cause$Total)
  c = ifelse(is.na(b),0,b)
Code
graph <- cause %>% 
    filter(Cause %in% c("Illness","Family","Economy","Business","Emotional","Education","Other","Unknown"))
  graph %>%
    ggplot( aes(x=Year, y=c, group=Cause, color=Cause)) +
      geom_line() +
      ylab("Number of Suicides") +
      ggtitle("Number of Females' Suicide by Reasons")

Code
slices3 = c(sum(as.numeric(a1$Total)),sum(as.numeric(a2$Total)),sum(as.numeric(a3$Total)),sum(ifelse(is.na(as.numeric(a4$Total)),0,as.numeric(a4$Total))),sum(ifelse(is.na(as.numeric(a5$Total)),0,as.numeric(a5$Total))),sum(ifelse(is.na(as.numeric(a6$Total)),0,as.numeric(a6$Total))),sum(as.numeric(a7$Total)),sum(as.numeric(a8$Total)))
name = c("Illness","Family","Economy","Business","Emotional","Education","Other","Unknown")
lbls3 <- name
pct <- round(slices3/sum(slices3)*100)
lbls3 <- paste(lbls3, pct)
# add percents to labels
lbls3 <- paste(lbls3,"%",sep="") # ad % to labels
Code
par(mfrow=c(1,2))     

pie(slices3,labels = lbls3, col=rainbow(length(lbls3)),
   main="Pie Chart of Females' \n Suicide by Reasons",cex=0.5)
pie(slices,lbls, col=rainbow(length(lbls)),
   main="Pie Chart of \n Suicide Reasons",cex=0.5)

Back to top