Analysis

Key Takeaways

  • The project focuses on a comprehensive analysis of crime dynamics in Turkey from 2011 to 2020. Utilizing the dataset “suc turu ve egitim durumuna gore ceza infaz kurumuna giren hukumluler.xls” from TÜİK, the analysis explores the relationships between crime rates, gender, and education levels of offenders. The primary goal is to gain insights into the demographic characteristics of criminals and contribute to the development of effective crime control policies.

  • The primary dataset is obtained from “suc turu ve egitim durumuna gore ceza infaz kurumuna giren hukumluler.xls” from TÜİK, which contains information on people who have been sent to prison. In addition, index data from 2011 to 2020 are used to categorise and analyse crime trends. The dataset is processed using R with packages such as readxl, tidyr and dplyr.

  • An important aspect of this analysis is to consider the gender and educational level of the criminals in order to identify patterns and correlations between these demographic characteristics and crime rates. The R code provided illustrates the detailed steps taken to process the data, including tasks such as cleaning, structuring, handling NAs and preparing the data for visualisation.

  • The analysis produces some interesting results, particularly when looking at crime trends over the period 2011-2020. The plotted data reveals different patterns of crime rates in different categories, with some showing fluctuations and others showing rapid changes, possibly influenced by legal changes or social dynamics. Gender analysis adds depth to the findings, revealing which crimes are more common among certain genders and how these patterns evolve over the years. The use of logarithmic scaling on the y-axis strengthens the visualisation, allowing for the different magnitudes of crime rates and facilitating a more nuanced understanding of the data.

  • The main outcome of this project shows some generalisations about crime. It can be seen that most of the crimes are committed by men (96%), the most common crime is theft (15.7%) and most of the criminals have only secondary school education (29.9%).

EDA Analysis

Initialization of environment
library(tidyr)
library(readxl)
library(dplyr)
library(ggplot2)
library(ggrepel)
library(gghighlight)
library(tibble)
library(stringr)

dat <- read_xls("data/suc turu ve egitim durumuna gore ceza infaz kurumuna giren hukumluler.xls")
index2020 <- read_xlsx("data/indexes.xlsx", sheet = "2020-2018", col_names = FALSE)
index2017 <- read_xlsx("data/indexes.xlsx", sheet = "2017-2013", col_names = FALSE)
index2012 <- read_xlsx("data/indexes.xlsx", sheet = "2012-2011", col_names = FALSE)
index2020 <- sapply(index2020, as.character)
index2017 <- sapply(index2017, as.character)
index2012 <- sapply(index2012, as.character)

temp2020 <- sapply(index2020, as.character)
temp2017 <- c(sapply(index2017, as.character), rep(NA, times = 1))
temp2012 <- c(sapply(index2012, as.character), rep(NA, times = 4))

for (val in dat$'2011-2020') {
  if (identical(val,"2020")){
    suc2020 <- dat[20:59, 2]
    suc2020 <- suc2020[complete.cases(suc2020),]
    suc2020 <- cbind(index2020, suc2020)
    names(suc2020) <- c("type_of_crime", "num2020")
  }
  if (identical(val,"2019")){
    suc2019 <- dat[62:101, 2]
    suc2019 <- suc2019[complete.cases(suc2019),]
    suc2019 <- cbind(index2020, suc2019)
    names(suc2019) <- c("type_of_crime", "num2019")
  }
  if (identical(val,"2018")){
    suc2018 <- dat[104:143, 2]
    suc2018 <- suc2018[complete.cases(suc2018),]
    suc2018 <- cbind(index2020, suc2018)
    names(suc2018) <- c("type_of_crime", "num2018")
  }
  if (identical(val,"2017")){
    suc2017 <- dat[147:185, 2]
    suc2017 <- suc2017[complete.cases(suc2017),]
    suc2017 <- cbind(index2017, suc2017)
    names(suc2017) <- c("type_of_crime", "num2017")
  }
  if (identical(val,"2016")){
    suc2016 <- dat[189:227, 2]
    suc2016 <- suc2016[complete.cases(suc2016),]
    suc2016 <- cbind(index2017, suc2016)
    names(suc2016) <- c("type_of_crime", "num2016")
  }
  if (identical(val,"2015")){
    suc2015 <- dat[231:269, 2]
    suc2015 <- suc2015[complete.cases(suc2015),]
    suc2015 <- cbind(index2017, suc2015)
    names(suc2015) <- c("type_of_crime", "num2015")
  }
  if (identical(val,"2014")){
    suc2014 <- dat[273:311, 2]
    suc2014 <- suc2014[complete.cases(suc2014),]
    suc2014 <- cbind(index2017, suc2014)
    names(suc2014) <- c("type_of_crime", "num2014")
  }
  if (identical(val,"2013")){
    suc2013 <- dat[315:353, 2]
    suc2013 <- suc2013[complete.cases(suc2013),]
    suc2013 <- cbind(index2017, suc2013)
    names(suc2013) <- c("type_of_crime", "num2013")
  }
  if (identical(val,"2012")){
    suc2012 <- dat[357:391, 2]
    suc2012 <- suc2012[complete.cases(suc2012),]
    suc2012 <- cbind(index2012, suc2012)
    names(suc2012) <- c("type_of_crime", "num2012")
  }
  if (identical(val,"2011")){
    suc2011 <- dat[395:429, 2]
    suc2011 <- suc2011[complete.cases(suc2011),]
    suc2011 <- cbind(index2012, suc2011)
    names(suc2011) <- c("type_of_crime", "num2011")
  }
}

d2020 <- data.frame(rep(2020, times = nrow(suc2020)))
d2019 <- data.frame(rep(2019, times = nrow(suc2020)))
d2018 <- data.frame(rep(2018, times = nrow(suc2018)))
d2017 <- data.frame(c(rep(2017, times = nrow(suc2017)), rep(NA,times=1)))
d2016 <- data.frame(c(rep(2016, times = nrow(suc2016)),rep(NA,times=1)))
d2015 <- data.frame(c(rep(2015, times = nrow(suc2015)),rep(NA,times=1)))
d2014 <- data.frame(c(rep(2014, times = nrow(suc2014)),rep(NA,times=1)))
d2013 <- data.frame(c(rep(2013, times = nrow(suc2013)),rep(NA,times=1)))
d2012 <- data.frame(c(rep(2012, times = nrow(suc2012)),rep(NA,times=4)))
d2011 <- data.frame(c(rep(2011, times = nrow(suc2011)),rep(NA,times=4)))

df <- data.frame(d2020,d2019,d2018,d2017,d2016,d2015,d2014,d2013,d2012,d2011)

years <- data.frame(years = unlist(df, use.names = FALSE))
years <- na.omit(years)

type_frame <- data.frame(temp2020,temp2020,temp2020,temp2017,temp2017,temp2017,temp2017,temp2017,temp2012,temp2012)
types <- data.frame(type_of_crimes = unlist(type_frame, use.names = FALSE)) |> na.omit()

numbers <- data.frame(years = years, type_of_crime = types, na.omit(data.frame(gen_total = dat$...2[20:nrow(dat)], male = dat$...3[20:nrow(dat)], female = dat$...4[20:nrow(dat)])))
numbers$gen_total <- as.numeric(numbers$gen_total)
numbers$male <- as.numeric(numbers$male)
numbers$female <- as.numeric(numbers$female)
numbers[is.na(numbers)] <- 0
numbers <- pivot_longer(numbers, cols = c(male,female), names_to = "gender", values_to = "number")

gendered_numbers <- data.frame(rep(NA, times = 410))

for (i in 1:ncol(dat)){
  if (is.na(dat[16,i])){
    next
  }
  if (dat[16,i] == "Male" || dat[16,i] == "Female"){
    column_name <- paste( dat[16,i],i, sep = "")
    gendered_numbers[[column_name]] <- dat[20:429,i]
  }
}

gendered_numbers <- gendered_numbers[,2:19]

colnames(gendered_numbers) <- c("Male_t", "Female_t", "Male_illit", "Female_illit", 
                                "Male_lit", "Female_lit", "Male_pris", "Female_pris",
                                "Male_prie", "Female_prie", "Male_jhs", "Female_jhs",
                                "Male_hs", "Female_hs", "Male_bsc", "Female_bsc", 
                                "Male_unk", "Female_unk")

x <- 1:18

gendered_numbers[ , x] <- apply(gendered_numbers[ , x], 2,            # I CAN NOT BELIEVE I HAD TO DO THIS JUST TO CONVERT ACTUAL NUMBERS TO NUMERIC. R SUCKS SOMETIMES.
                                function(y) as.numeric(as.character(y)))

gendered_numbers <- gendered_numbers[rowSums(is.na(gendered_numbers)) != ncol(gendered_numbers),]
gendered_numbers[is.na(gendered_numbers)] <- 0
gendered_numbers <- data.frame(years, gendered_numbers)

education_cat <- pivot_longer(gendered_numbers, cols = -years, names_to = c("gender","education_level"),names_sep = "_", values_to = "number")
full_names <- data.frame(rep(NA, times = nrow(education_cat)))
for (i in 1:nrow(education_cat)){
  if (education_cat$education_level[i] == "t"){
    full_names[i,1] <- "total"
  }
  else if (education_cat$education_level[i] == "illit"){
    full_names[i,1] <- "illiterate"
  }
  else if (education_cat$education_level[i] == "lit"){
    full_names[i,1] <- "literate"
  }
  else if (education_cat$education_level[i] == "pris"){
    full_names[i,1] <- "primary school"
  }
  else if (education_cat$education_level[i] == "prie"){
    full_names[i,1] <- "primary education"
  }
  else if (education_cat$education_level[i] == "jhs"){
    full_names[i,1] <- "junior high school"
  }
  else if (education_cat$education_level[i] == "hs"){
    full_names[i,1] <- "high school"
  }
  else if (education_cat$education_level[i] == "bsc"){
    full_names[i,1] <- "higher education"
  }
  else if (education_cat$education_level[i] == "unk"){
    full_names[i,1] <- "unknown"
  }
}
education_cat <- data.frame(education_cat, full_names)
colnames(education_cat)[5] <- "full_names"
edu_names <- data.frame(rep(NA, times = nrow(education_cat)))
edu_names[which(education_cat$years == 2020),1] <- unlist(lapply(index2020, function(x) rep(x,times = sum(education_cat$years == 2020)/27)))
edu_names[which(education_cat$years == 2019),1] <- unlist(lapply(index2020, function(x) rep(x,times = sum(education_cat$years == 2020)/27)))
edu_names[which(education_cat$years == 2018),1] <- unlist(lapply(index2020, function(x) rep(x,times = sum(education_cat$years == 2020)/27)))
edu_names[which(education_cat$years == 2017),1] <- unlist(lapply(index2017, function(x) rep(x,times = sum(education_cat$years == 2017)/26)))
edu_names[which(education_cat$years == 2016),1] <- unlist(lapply(index2017, function(x) rep(x,times = sum(education_cat$years == 2017)/26)))
edu_names[which(education_cat$years == 2015),1] <- unlist(lapply(index2017, function(x) rep(x,times = sum(education_cat$years == 2017)/26)))
edu_names[which(education_cat$years == 2014),1] <- unlist(lapply(index2017, function(x) rep(x,times = sum(education_cat$years == 2017)/26)))
edu_names[which(education_cat$years == 2013),1] <- unlist(lapply(index2017, function(x) rep(x,times = sum(education_cat$years == 2017)/26)))
edu_names[which(education_cat$years == 2012),1] <- unlist(lapply(index2012, function(x) rep(x,times = sum(education_cat$years == 2012)/23)))
edu_names[which(education_cat$years == 2011),1] <- unlist(lapply(index2012, function(x) rep(x,times = sum(education_cat$years == 2012)/23)))
education_cat <- data.frame(education_cat, edu_names)
colnames(education_cat)[6] <- "crime_names"

The columns of data frames in this environment consists of “Years” (Year the crime was committed), “Number” (Number of times a criminal of given characteristics was prosecuted), “Gender” (Gender of the criminals), “Education Level” (Abbreviation of education levels), “Full Name” (Full names of said levels), “Type of Crimes” (Type of the crime committed), “Gender Total” (Male and female total in associated conditions).

The data is divided into two data frames with the difference of one containing education info and one not containing it because the plan at first was to just analyze the data based on gender and not education levels.

Exploratory Data Analysis (EDA) is provided below by Turkish Penal Institution Statistics for the year 2020.

Overall Trend

  • The total number of individuals in penal institutions decreased by 8.5% compared to the same date in 2019, reaching 266,831 on December 31, 2020.
Code
comp_2019_2020 <- filter(numbers, years == c(2020,2019) & type_of_crimes == c("Total")) |> 
  ggplot(aes(years, gen_total)) +
  geom_col() +
  scale_x_continuous(breaks=c(2019,2020)) +
  ylab("total number of crimes")

comp_2019_2020

Gender Distribution

  • 96.0% of the penal institution population consisted of males, while 4.0% were females.
Code
total_male_female <- filter(education_cat, years == 2020 & education_level == "t") |>
  ggplot(aes(x = "", y = number, fill = gender)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start=0) +
  theme(axis.line = element_blank(), 
        axis.text = element_blank(),
        axis.title.y = element_blank(),
        axis.line.y = element_blank(),
        axis.title.x = element_blank())+
  scale_fill_manual(values = c("violet", "skyblue"))

total_male_female

Population Composition

  • Convicted individuals made up 84.3% of the population, while individuals in pretrial detention constituted 15.7%.
  • The number of individuals in penal institutions per 100,000 people was 319 on December 31, 2020.

Age Analysis

  • The number of individuals aged 12 and above in penal institutions was 390 per 100,000 people in 2020.
  • The number of individuals entering penal institutions as children (aged 12-17) decreased by 23.6% to 1,283, and those committing crimes as children decreased by 21.4% to 10,234 compared to the previous year.

Entrance and Exit Statistics

  • In 2020, 258,401 individuals entered penal institutions, and 361,870 individuals left.
  • 95.9% of those entering and 96.4% of those leaving were males.

Crime Distribution

  • The most common crimes leading to detention were assault (15.7%), theft (15.2%), traffic offenses (5.9%), and violations of the Execution and Bankruptcy Law (5.3%).

  • As you can see from the plot below, when we compare overall ratio of these crimes to their values in 2020, traffic crimes are way higher in 2020. This is interesting when we consider that for most of the year, people were quarantined and most people were working from home or working flexibly.

Code
all_total <- filter(numbers, type_of_crimes == "Total")$gen_total |> sum()/2
total_assault <- filter(numbers, type_of_crimes == "Assault")$gen_total |> sum()/2
total_theft <- filter(numbers, type_of_crimes == "Theft")$gen_total |> sum()/2
total_traffic <- filter(numbers, type_of_crimes == "Traffic crimes")$gen_total |> sum()/2
total_law <- filter(numbers, type_of_crimes == "Opposition  to the  Bankruptcy  and Enforcement Law")$gen_total |> sum()/2
  
rate_assault <- total_assault/all_total
rate_theft <- total_theft/all_total
rate_traffic <- total_traffic/all_total
rate_law <- total_law/all_total
  
all_vs_2020 <- data.frame(type = c(rep("all", times = 4), rep("2020", times = 4)), ratio = c(rate_assault,rate_theft,rate_traffic,rate_law, 0.157, 0.152, 0.059, 0.053), crime_type = c("Assault", "Theft","Traffic crimes","Opposition to the Bankruptcy \nand Enforcement Law")) |>
  ggplot(aes(x = crime_type, y = ratio)) +
  geom_point(aes(color = type))+
  ggtitle("Most Committed Crimes in 2020 and\nTheir Ratios in 2020 vs All Time")
  
all_vs_2020

Educational Background

  • Among those entering for assault, 29.9% had completed middle school, 26.0% high school, and 21.4% primary education.
Code
data.frame(number = rowSums(data.frame(filter(education_cat, years == 2020 & crime_names == "Assault" & full_names != "total" & gender == "Male")$number,
                                       filter(education_cat, years == 2020 & crime_names == "Assault" & full_names != "total" & gender == "Female")$number)),
           education_level = filter(education_cat, years == 2020 & crime_names == "Assault" & full_names != "total" & gender == "Female")$education_level) |>
  ggplot(aes(education_level,number))+
  geom_point(color = "magenta", size = 2)+
  ggtitle("Education - Assault Relation")

  • For theft, 45.4% had completed middle school, 17.6% primary education, and 15.6% high school.
Code
data.frame(number = rowSums(data.frame(filter(education_cat, years == 2020 & crime_names == "Theft" & full_names != "total" & gender == "Male")$number,
                                       filter(education_cat, years == 2020 & crime_names == "Theft" & full_names != "total" & gender == "Female")$number)),
           education_level = filter(education_cat, years == 2020 & crime_names == "Theft" & full_names != "total" & gender == "Female")$education_level) |>
  ggplot(aes(education_level,number))+
  geom_point(color = "darkblue", size = 2)+
  ggtitle("Education - Theft Relation")

  • Criminals with a higher education background were more likely to commit assault (9.2%), while those with no formal education were more likely to commit theft (30.2%).
Code
data.frame(number = rowSums(data.frame(filter(education_cat, years == 2020 & full_names == "higher education" & gender == "Male" & crime_names != "Total" & crime_names != "Other crimes")$number,
                                       filter(education_cat, years == 2020 & full_names == "higher education" & gender == "Female" & crime_names != "Total" & crime_names != "Other crimes")$number)),
           crime_name = filter(education_cat, years == 2020 & full_names == "higher education" & gender == "Female" & crime_names != "Total" & crime_names != "Other crimes")$crime_names) |>
  ggplot(aes(crime_name,number))+
  geom_point() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), strip.text = element_text(size = 8))+
  ggtitle("Higher Education Tendencies")

Analysis of Crimes

Assault Crimes

There is an increasing trend between 2011 and 2013. Remains approximately stable in the years between 2013-2020. In general, an increase can be observed.

Bad Treatment

A slight increase is observed between 2011-2013. Between 2013-2018, a decrease and then an increase was observed and reached the same level. Afterwards, it continued with a slight increase.

Bribery

A rapid increase was observed between 2011-2013. There is a decrease in the number of crimes between 2013-2015. Until 2020, it first increased and then decreased, reaching the same crime level.

Contrary to Measures for Family Protection

Approximately the same level of offenses between 2011 and 2020.

Damage to Property

There was a general decrease in the number of crimes between 2013-2020.

Defamation

There is an increasing trend between 2011-2013. In the following years, an approximately constant number of offenses is observed. In general, an increase in the number of offenses is observed.

Embezzlement

There is an increase in the number of crimes in 2011-2013 and a slight decrease in 2013-2016. In the following years, it can be seen that the number of offenses has remained approximately constant. In general, there is an increase in the number of crimes.

Forestry

Between 2011 and 2015, there was an increase in the number of crimes at the same rate, and then it reached the same level with a decrease. In the following years, a general increase is observed.

Forgery

There is a slight increase in the number of offenses between 2011-2020.

Homicide

An increase is observed between 2011-2014. A decrease is observed in 2019-2020. In general, there is an increase in the crime rate.

Kidnapping

There is an increase between 2011-2013. In the following years, we can talk about approximately the same number of crimes and an increase is observed in general.

Opposition to the Bankruptcy and Enforsment Law

Between 2011-2019, there is a general decrease in the number of offenses.

Opposition to Cheque Law

A rapidly decreasing number of crimes was observed between 2011 and 2012.

Opposition to the Military Criminal Law

Stable in 2011-2012 and a significant decrease in 2012-2013. After 2013, an increasing trend is observed.

Other Crimes

In general, there is a slight increasing trend.

Prevention of Performance

There is a decreasing trend between 2013-2016. A slightly increasing trend is observed in the following years.

Use and Purchase of Drugs

Between 2011 and 2020, there appears to be a low rate of increase.

Factors such as economic difficulties or unemployment may be the main reason for this increase. Increased social pressures, increased stress levels and life difficulties, lighter sentences may increase drug use.

Traffic Crimes

The increase in general traffic density with urbanisation and population growth, inadequate urban planning and infrastructure deficiencies are among the reasons for the increase in traffic crimes.

Threat

Changes in social communication, social media and other communication technologies, police intervention, security policies and criminal sanctions can affect threat offences.

Theft

While theft crime increased between 2011-2014, it remained almost at the same level between 2014-2020. The increase in theft crimes between 2011-2014 may have led to an increase in security measures during this period. Increased police and improvements in security systems may have reduced theft crimes.

Swindling

Between 2011 and 2020, an increase rate is observed.

The reason for this increase may be the widespread use of the internet and cyber swindling crimes.

Smuggling

According to the years, smuggling offences appear to be decreasing and increasing in some years.Economic gains from smuggling activities may affect crime rates.

Sexual Crimes

While theft crime increased between 2011-2014, it remained almost at the same level between 2014-2020.

The increase in awareness of sexual crimes in the society and the increase in the rate of reporting crimes by victims may increase as crimes are recorded more.

Robbery

While robbery crimes increased between 2011-2014, they remained almost at the same level between 2014-2020.The economic situation is one of the most important reasons affecting robbery crimes. Factors such as economic insufficiency, unemployment or income inequality can lead to an increase in robbery offences.

Production and Commerce of Drugs

Drug production and trafficking increased briefly from 2011 onwards and then remained almost at the same level.

Approaches focussing on rehabilitation rather than punishment and changes in criminal justice policies can affect drug trafficking.

Trend Analysis

In general, when we analyze the total number of crime types over the years, we observe an increasing trend. We can say that the main reason for this is the increase in economic difficulties with the growing population and the gradual change in the social structure. In addition, the fact that data collection and record-keeping are being carried out more with advanced technology than in 2011 allows for more statistical data to be kept on these crime rates. This shows that there is a reason to consider for the increase in the number of crimes.

Reasons for the Increasing Trend

Economic Challenges

Economic recession, rising unemployment rates or economic uncertainty can lead to increased crime rates. When people are struggling with financial difficulties, the risk of crime often increases.

Social Changes

Urbanization, migration movements or changes in the social structure can affect crime rates. Especially in urbanized areas, crime rates often increase.

Social Support and Rehabilitation Services

Lack of effective social support, rehabilitation and reintegration services can lead to increased crime rates. It is important to reintegrate and support offenders into society.

Media and Cultural Influences

How the media portrays crimes, perceptions of offenders and social norms can influence crime rates. Media exaggeration of crimes or dissemination of misleading information can lead to an increase in crime rates.

Reasons for the Decreasing Trend

Effective Police and Security Measures

More effective police work, security measures and crime prevention programs can reduce crime rates.

Education and Awareness

Awareness and education programs about the negative effects of crime in society can contribute to lower crime rates.

Social Assistance and Support Programs

Economic and social support programs can reduce crime among disadvantaged groups and promote social cohesion.

Social Norms and Values

The attitude of social norms and values towards crime can influence crime rates. Positive social norms and values can contribute to lower crime rates.

Back to top