Data

Crime Dynamics: Analyzing Crime Rates, Gender, and Education Trends from 2011 to 2020

Our project will focus on the relationships between the gender and education levels of those who committed crimes in Turkey between 2011 and 2020, and in this context, it aims to help us better understand the sensitivity of crime to population-related characteristics.

Crime is a complex issue that affects the order and security of societies. Therefore, understanding changes in crime rates and uncovering the reasons behind these changes form the basis of creating effective crime-fighting strategies.

The main reason we chose this data is to understand the social impact by analyzing changes in crime rates and to determine crime profiles by examining the demographic characteristics of criminals. A comprehensive analysis that includes variables such as crime types, gender and education level will contribute to the development of more effective policies in the fight against crime.

First of all, we will examine in detail the changes in the rates of certain types of crimes. We aim to obtain a more detailed output about the security situation of the society by analyzing trends in crime categories such as theft, murder and rape.

Gender and education level are important factors in understanding the crime profile. Our study will focus on the relationships between gender and education levels of offenders and, in this context, will help us better understand the sensitivity of crime to demographic characteristics. The types of analysis aimed in the project consist of determining which crime types are more common in which year using Descriptive Statistics, observing which crime types a certain gender is more prone to and how their trends change over the year with Gender-Based Analysis, observing how the number of crime types changes over time with Time-Series Analysis, analyzing how a certain crime type depends on factors such as gender and year with Regression Analysis.

The resulting analyzes can contribute to efforts to create a safer society by providing information on identifying the factors underlying crime.

Overview of Structure

While the original data holds the education information, our version that is exported as the RData file omits it for now. The code we used to work on the data is as follows:

Code

library(tidyr)
library(readxl)
library(dplyr)
library(ggplot2)
library(ggrepel)
library(gghighlight)
library(tibble)
library(stringr)

dat <- read_xls("data/suc turu ve egitim durumuna gore ceza infaz kurumuna giren hukumluler.xls")
index2020 <- read_xlsx("data/indexes.xlsx", sheet = "2020-2018", col_names = FALSE)
index2017 <- read_xlsx("data/indexes.xlsx", sheet = "2017-2013", col_names = FALSE)
index2012 <- read_xlsx("data/indexes.xlsx", sheet = "2012-2011", col_names = FALSE)
index2020 <- sapply(index2020, as.character)
index2017 <- sapply(index2017, as.character)
index2012 <- sapply(index2012, as.character)

temp2020 <- sapply(index2020, as.character)
temp2017 <- c(sapply(index2017, as.character), rep(NA, times = 1))
temp2012 <- c(sapply(index2012, as.character), rep(NA, times = 4))

for (val in dat$'2011-2020') {
  if (identical(val,"2020")){
    suc2020 <- dat[20:59, 2]
    suc2020 <- suc2020[complete.cases(suc2020),]
    suc2020 <- cbind(index2020, suc2020)
    names(suc2020) <- c("type_of_crime", "num2020")
  }
  if (identical(val,"2019")){
    suc2019 <- dat[62:101, 2]
    suc2019 <- suc2019[complete.cases(suc2019),]
    suc2019 <- cbind(index2020, suc2019)
    names(suc2019) <- c("type_of_crime", "num2019")
  }
  if (identical(val,"2018")){
    suc2018 <- dat[104:143, 2]
    suc2018 <- suc2018[complete.cases(suc2018),]
    suc2018 <- cbind(index2020, suc2018)
    names(suc2018) <- c("type_of_crime", "num2018")
  }
  if (identical(val,"2017")){
    suc2017 <- dat[147:185, 2]
    suc2017 <- suc2017[complete.cases(suc2017),]
    suc2017 <- cbind(index2017, suc2017)
    names(suc2017) <- c("type_of_crime", "num2017")
  }
  if (identical(val,"2016")){
    suc2016 <- dat[189:227, 2]
    suc2016 <- suc2016[complete.cases(suc2016),]
    suc2016 <- cbind(index2017, suc2016)
    names(suc2016) <- c("type_of_crime", "num2016")
  }
  if (identical(val,"2015")){
    suc2015 <- dat[231:269, 2]
    suc2015 <- suc2015[complete.cases(suc2015),]
    suc2015 <- cbind(index2017, suc2015)
    names(suc2015) <- c("type_of_crime", "num2015")
  }
  if (identical(val,"2014")){
    suc2014 <- dat[273:311, 2]
    suc2014 <- suc2014[complete.cases(suc2014),]
    suc2014 <- cbind(index2017, suc2014)
    names(suc2014) <- c("type_of_crime", "num2014")
  }
  if (identical(val,"2013")){
    suc2013 <- dat[315:353, 2]
    suc2013 <- suc2013[complete.cases(suc2013),]
    suc2013 <- cbind(index2017, suc2013)
    names(suc2013) <- c("type_of_crime", "num2013")
  }
  if (identical(val,"2012")){
    suc2012 <- dat[357:391, 2]
    suc2012 <- suc2012[complete.cases(suc2012),]
    suc2012 <- cbind(index2012, suc2012)
    names(suc2012) <- c("type_of_crime", "num2012")
  }
  if (identical(val,"2011")){
    suc2011 <- dat[395:429, 2]
    suc2011 <- suc2011[complete.cases(suc2011),]
    suc2011 <- cbind(index2012, suc2011)
    names(suc2011) <- c("type_of_crime", "num2011")
  }
}

d2020 <- data.frame(rep(2020, times = nrow(suc2020)))
d2019 <- data.frame(rep(2019, times = nrow(suc2020)))
d2018 <- data.frame(rep(2018, times = nrow(suc2018)))
d2017 <- data.frame(c(rep(2017, times = nrow(suc2017)), rep(NA,times=1)))
d2016 <- data.frame(c(rep(2016, times = nrow(suc2016)),rep(NA,times=1)))
d2015 <- data.frame(c(rep(2015, times = nrow(suc2015)),rep(NA,times=1)))
d2014 <- data.frame(c(rep(2014, times = nrow(suc2014)),rep(NA,times=1)))
d2013 <- data.frame(c(rep(2013, times = nrow(suc2013)),rep(NA,times=1)))
d2012 <- data.frame(c(rep(2012, times = nrow(suc2012)),rep(NA,times=4)))
d2011 <- data.frame(c(rep(2011, times = nrow(suc2011)),rep(NA,times=4)))

df <- data.frame(d2020,d2019,d2018,d2017,d2016,d2015,d2014,d2013,d2012,d2011)

years <- data.frame(years = unlist(df, use.names = FALSE))
years <- na.omit(years)

type_frame <- data.frame(temp2020,temp2020,temp2020,temp2017,temp2017,temp2017,temp2017,temp2017,temp2012,temp2012)
types <- data.frame(type_of_crimes = unlist(type_frame, use.names = FALSE)) |> na.omit()

numbers <- data.frame(years = years, type_of_crime = types, na.omit(data.frame(gen_total = dat$...2[20:nrow(dat)], male = dat$...3[20:nrow(dat)], female = dat$...4[20:nrow(dat)])))
numbers$gen_total <- as.numeric(numbers$gen_total)
numbers$male <- as.numeric(numbers$male)
numbers$female <- as.numeric(numbers$female)
numbers[is.na(numbers)] <- 0
numbers <- pivot_longer(numbers, cols = c(male,female), names_to = "gender", values_to = "number")

plot <- ggplot(numbers, aes(years, gen_total, color = type_of_crimes)) + 
  geom_line() + 
  scale_y_log10() +
  scale_x_continuous(breaks=seq(2011, 2020, 1)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1), strip.text = element_text(size = 8)) +
  gghighlight::gghighlight() +
  facet_wrap(~type_of_crimes) +
  ylab("Total Number of Crimes") +
  xlab("Years")

save(numbers, file = "crime_numbers.RData")

In order to make the data easier to work on, we have used the packages readxl, tidyr, and dplyr. Using readxl to access the data from its excel file format, and rest to process the exported data chunk by chunk so we could rid the data from its NA filled, unplottable structure.

Some ways the data is categorized can be seen by looking into this plot:

Don’t be shy to open the image in a new tab so you can zoom in on it!

So, as it can be seen from the plot, the types of crimes that are present in the data are given as the titles of the individual line plots. The x axis is years for each type of crime and y axis is the number of times that crime was committed, of course. The plot piece named “Total” is the total amount of times all crimes were committed.

Some of the crimes start or end abruptly as there are types of crimes redacted or added throughout the years such as “Opposition to cheque laws” which was only judged in 2011 and 2012.

Y axis of our plot is logarithmically scaled to compensate for the difference between the total number of times all the crimes for committed (Total) and the number of times different types of crimes were committed.

Warning

Help from ChatGPT has been used to cut back from time googling very specific data formatting problems. More context can be provided if needed.