Cristin-resultat-ID: 1564668
Sist endret: 13. februar 2018, 11:24
Resultat
Doktorgradsavhandling
2018

Advancing Neuro-Fuzzy Algorithm for Automated Classification in Large-scale Forensic and Cybercrime Investigations: Adaptive Machine Learning for Big Data Forensic

Bidragsytere:
  • Andrii Shalaginov

Utgiver/serie

Utgiver

Norges teknisk-naturvitenskapelige universitet
NVI-nivå 0

Serie

Doktoravhandlinger ved NTNU
ISSN 1503-8181
NVI-nivå 0

Om resultatet

Doktorgradsavhandling
Publiseringsår: 2018
Volum: 2018
Hefte: 57
Antall sider: 349
ISBN: 978-82-326-2906-0

Klassifisering

Vitenskapsdisipliner

Informasjons- og kommunikasjonsteknologi

Fagfelt (NPI)

Fagfelt: IKT
- Fagområde: Realfag og teknologi

Beskrivelse Beskrivelse

Tittel

Advancing Neuro-Fuzzy Algorithm for Automated Classification in Large-scale Forensic and Cybercrime Investigations: Adaptive Machine Learning for Big Data Forensic

Sammendrag

Cyber Crime Investigators are challenged by the huge amount and complexity of digital data seized in criminal cases. Human experts are present in the Court of Law and make decisions with respect to the digital data and evidence found. Therefore, it is necessary to combine automated analysis and human-understandable representation of digital data and evidences. Machine Learning methods such as Artificial Neural Networks, Support Vector Machines and Bayes Networks have been successfully applied in Digital Investigation & Forensics. The challenge however is in the fact that these methods neither provide precise human-explainable models nor can work without prior knowledge. Our research is inspired by the emerging area of Computational Forensics. We focus on the Neuro-Fuzzy rule-extraction classification method, a promising Hybrid Intelligence model. The contribution goes towards the improved performance of Neuro-Fuzzy in extracting accurate fuzzy rules that are human-explainable. These rules can be presented and explained in a Court of Law, which is better than a set of numerical parameters obtained from more abstract Machine Learning models. In our initial research on the Neuro-Fuzzy method, we found that its application in Digital Forensics was promising, but with a number of drawbacks. These include (i) poor performance in learning from real-world in comparison to other state of the art Machine Learning methods, (ii) a number of output fuzzy rules so large that no human expert can understand them, (iii) a strong model overfitting caused by the huge number of fuzzy rules, and (iv) an intrinsic learning procedure that neglects part of the data, which therefore becomes inaccurate. Due to this criticism, Neuro-Fuzzy method's latent potential has not been widely applied to the area yet. The contribution of this work is the following: (1) theoretical in the improvement of Neuro-Fuzzy method and (2) empirical in the experimental design using large-scale datasets in Digital Forensics domain. The entire study was conducted during 2013-2017 at the NTNU Digital Forensics Group. Add. 1. Neuro-Fuzzy was revised and therefore we first contributed to the Machine Learning domain and subsequently the large-scale Digital Forensics application. In particular, (i) we proposed exploratory data analysis to improve Self-Organizing Map initialization and generalization of the Neuro-Fuzzy method targeting large-scale datasets; (ii) we also improved the compactness and generalization of fuzzy patches, resulting in the increased accuracy and robustness of the method through a chi-square goodness of fit test; (iii) we constructed the new membership function based on Gaussian multinomial distribution that considers fuzzy patches representation as a statistically estimated hyperellipsoid; (iv) we reformulated the application of the Neuro-Fuzzy in solving multi-class problems rather than conventional two classes problems; (v) finally, we designed a new approach to model non-linear data using Deep Learning and Neuro-Fuzzy method that results in a Deep Neuro-Fuzzy architecture. Add. 2. The experimental study includes extended evaluation of the proposed improvements with respect to the challenges and requirements of a variety of different real-world applications, including: (i) state of the art datasets like the Android malware dataset, network intrusion detection KDD CUP 1999 and web application firewalls PKDD 2007 datasets. Moreover, community-accepted datasets from UCI collection were also used, including large-scale datasets such as SUSY and HIGGS. (ii) A new, novel large-scale collection of Windows Portable Executable 32-bit malware files was also composed as a part of this PhD work. It consists of 328,000 labelled malware samples that represent 10,362 families and 35 categories; these were further tested as non-trivial multi-class problems, neither sufficiently studied in the literature nor previously explored.

Bidragsytere

Andrii Shalaginov

  • Tilknyttet:
    Forfatter
    ved Institutt for informasjonssikkerhet og kommunikasjonsteknologi ved Norges teknisk-naturvitenskapelige universitet
1 - 1 av 1