Data Mining and Visualization

Major: Computer Science (Design and programming of intelligent systems and devices)
Code of subject: 6.122.12.O.024
Credits: 5.00
Department: Computer-Aided Design
Lecturer: Andriy KERNYTSKYY
Semester: 4 семестр
Mode of study: денна
Мета вивчення дисципліни: The purpose of teaching the educational discipline "Systems of intellectual analysis and visualization of data" is to study the methods of modern data processing - intelligent data analysis (Data Mining), analytical research of large masses of information with the aim of identifying new previously unknown, practically useful knowledge and patterns necessary for decision-making; an overview of the methods, software products and various tools used in Data Mining; consideration of practical examples of Data Mining application; preparing students for independent work on solving problems using Data Mining and development of intelligent systems. Data Mining is a multidisciplinary field that arose and is developing on the basis of such sciences as applied statistics, pattern recognition, artificial intelligence, database theory, etc.
Завдання: • master the basic principles of building data models; • familiarize yourself with the concept of Knowledge Discovery in Data (discovery of knowledge in data) and Data Mining ("extraction" of knowledge); • learn to effectively use the methods of obtaining knowledge from large data sets; • familiarize yourself with the main types of problems that can be solved using the methods of intelligent data analysis; • to gain practical skills in the use of instrumental means of intellectual data analysis when solving applied problems and to learn how to interpret the obtained results. The study of an academic discipline involves the formation and development of students' competencies: general: • ability to apply knowledge in practical situations; • knowledge and understanding of the subject area and understanding of professional activity; • the ability to learn and master modern knowledge; • ability to search, process and analyze information from various sources. professional: • the ability to use modern software tools for designing and researching intelligent data analysis systems; • the ability to apply technologies for working with data stores, to carry out their analytical processing and intellectual analysis to ensure the reliable operation of information systems; • justify the choice of a specific type of model and method of intelligent data analysis at • solving the set practical problem; • carry out the necessary pre-processing of data, determine the type of analysis task, solve it adequately • by the selected method with optimally defined parameters, evaluate the results, make meaningful conclusions and interpretation; • use modern software tools for designing and researching intelligent data analysis systems; • independently apply Data Mining algorithms when processing data; • independently develop and build data storage models; • independently conduct data analysis to identify knowledge; • independently use the OLAP system when processing databases. The learning outcomes of this discipline detail the following program learning outcomes: Classification of competences according to NRK: Ability to reveal the scientific essence of problems in the professional field Ability to solve problems in the professional field based on analysis and synthesis. Knowledge: Knowledge of the scientific and methodological foundations of the creation and application of information technologies and information systems for automated information processing and management. Ability: Ability to apply theoretical knowledge in practical situations in scientific activity. Communication: Communicating in a dialog mode with the wider scientific community and the public in a certain field of scientific and/or professional activity. Autonomy and responsibility: Initiation of innovative complex projects, leadership and full autonomy during their implementation Social responsibility for the results of strategic decision-making. Ability to self-develop and self-improve during life, responsibility for teaching others.
Learning outcomes: As a result of studying the discipline, the specialist should know: • basic concepts, tasks and stages of data mining; • approaches to storing, presenting and processing information in modern information systems; • methods for model building and dependency analysis in large data sets; • modern software for the design and development of data mining systems. The trained specialist should be able to: • justify the choice of a particular type of model and method of data mining when • solving the practical problem; • carry out the necessary preliminary data processing, determine the type of analysis problem, and solve it adequately • selected method with optimally defined parameters, evaluate results, make meaningful • conclusions and interpretation; • use modern software to design and research data mining systems.
Required prior and related subjects: Previous courses: Organization if data and knowledge bases Related and Continuing Courses: Methods and Systems of Artificial Intelligence
Summary of the subject: What is intelligent data analysis (Data Mining, Knowledge Discovery in Data). Data. Information. Knowledge. . Historical review. Origin, prospects, problems of Data mining. Concept of data. Object and attribute, sample, dependent and independent variable. Types of scales. Types of data sets. Concepts of database and DBMS. Concepts of "information" and "knowledge", comparison and comparison of these concepts. Methods and stages of Data Mining. Stages of Data Mining and actions performed within these stages. Classification of Data Mining methods. Comparative characteristics of some methods. Data Mining tasks. Problems of classification and clustering. Description of the essence of problems, solution process, solution methods, application. Comparison of these two tasks. Forecasting problems. The concept of a time series, its components, forecasting parameters, types of forecasts. Data visualization task. Fields of application of Data Mining. The main areas of human activity where Data Mining technologies can be successfully applied. Concepts of Web Mining, Text Mining, Call Mining. Methods of classification and forecasting. A method of building a solution tree for solving a classification problem. Algorithm "Cora" and C4.5. Basic ideas of the support vector method and the "nearest neighbor" method. Advantages and disadvantages of these methods. Examples of problem solving. Methods of cluster analysis. Basics of cluster analysis, mathematical characteristics of a cluster. Two groups of hierarchical cluster analysis: agglomerative and divisional methods. Methods of finding associative rules. The essence of the task of finding associative rules. Apriori algorithm. Data Mining process. The process of data preparation, concepts of data quality, dirty data, stages of data cleaning, data editing. Stages of the Data Mining process related to the construction, verification, evaluation, selection and correction of models. Concepts of "model" and "modeling".
Опис: Topic 1. Fundamentals of data mining. Data Mining Definition and Scope. Tasks, models and methods of Data Mining. The concept of Business Intelligence. Topic 2. Data, information, knowledge. The hierarchy of concepts "information - data - information - knowledge". Data attributes, types, values, properties of values. Measurements, scales. Metadata. Properties of information and knowledge. Topic 3. Solving the regression problem The essence of the prediction task. Methods for solving the regression problem. Linear regression, least squares method. Topic 4. Classification problem solving Formulation of the problem of classification and presentation of results. Methods for constructing classification rules. Methods of building decision trees. Methods for constructing mathematical functions. Methods the support vectors, the "nearest neighbor", Bayes. Analysis of multidimensional groups. Classification of objects in case of unknown data distributions. Methods for estimating classification errors. Topic 5. Solving the problem of finding associative rules The task of finding associative rules and presenting results. Sequential analysis. Types of associative rule search tasks. Methods of presentation of results. Algorithms for finding associative rules. Associative rule search methods: Apriori method, FP- construction search trees for data templates. Construction of hash trees. Topic 6. Solution of the clustering problem Formulation of the problem of clustering and presentation of results. Types of clusters. Measures of proximity, based on distances. Basic clustering algorithms. Adaptive clustering methods. Topic 7. Forecasting methods. Neural networks. The method of neural networks. Elements and architecture, learning process and neural network retraining phenomenon. Perceptron. An example of solving a problem using a neural network apparatus. Classifications of neural networks. The process of preparing data for training. Self-organized Kohonen cards. Examples of problem solving. Theme 8. The process of knowledge discovery. The cycle of obtaining, pre-processing, analyzing data, interpreting results and using them. The steps of the Data Mining process are related to model building, validation, evaluation, selection and correction. Methods of initial data processing. Data Mining Tools. Methods data structure research: data visualization. Topic 9. Data Warehouses. Features of the Data Warehouse Concept Basic Concepts of the Data Warehouse Concept Tasks that are solved in data warehouses. Data warehouse architectures. Theme 10. OLAP systems. The concept and model of OLAP data. The structure of the OLAP cube. The hierarchy of measurements of OLAP cubes. Operations performed on a hypercube. Factsheet. Measurement tables. Architecture of OLAP-systems. OLAP client tools. OLAP server tools. Technical aspects of multidimensional data storage. Multidimensional OLAP (MOLAP). Relational OLAP (ROLAP). Hybrid OLAP (HOLAP). Multidimensional Data Analysis Understanding SQL Server 2005 Analysis Services and SQL Server 2008 Analysis Services
Assessment methods and criteria: The final control of knowledge is carried out in the form of examination. Laboratory classes: 30 CT: 10 LC + CT: 40 Written Component: 50 Oral component: 10
Критерії оцінювання результатів навчання: The training is carried out in the form of lectures, laboratory work, performance of control work and independent work of the student. The subject of current control of students' knowledge is: • systematic, quality and active performance of practical tasks; • systematic and timely performance of students' independent work tasks; • quality of control work.
Recommended books: . Han J. Data Mining: Concepts and Techniques (Second Edition)/ J. Han, M. Kamber – Morgan Kaufmann Publishers, 2006. – 800 p. 2. Witten, I.H. Data mining : practical machine learning tools and techniques.—3rd ed. / Ian H. Witten, Frank Eibe, Mark A. Hall. – Morgan Kaufmann Publishers, 2011. – 629 p. 3. Ланде Д.В., Субач І.Ю., Бояринова Ю.Є. Основи теорії і практики інтелектуального аналізу даних у сфері кібербезпеки: навчальний посібник. — К.: ІСЗЗІ КПІ ім. Ігоря Сікорського», 2018. — 297 с.