Home/ Majors directory/Information Systems and Technologies/Methods of Big Data Processing
Methods of Big Data Processing
Major: Information Systems and Technologies
Code of subject: 7.126.01.O.003
Credits: 7.00
Department: Information Systems and Networks
Lecturer: doctor of sciences, professor Andrii Berko
Semester: 1 семестр
Mode of study: денна
Завдання: The study of an educational discipline involves the formation of competencies in students of education:
General competencies:
INT. The ability to solve problems of a research and innovation nature in the field of information systems and technologies.
ZK01. Ability to think abstractly, analyze, and synthesize.
ZK05. Ability to evaluate and ensure the quality of work performed in ICT.
Professional competencies:
SK01. Ability to develop and apply IST necessary for solving strategic and current tasks.
SK04. The ability to develop mathematical, information, and computer models of objects and informatization processes.
SK05. Ability to use modern data analysis technologies to optimize processes in information systems.
SK06. Ability to manage information risks based on the concept of information security.
SK07. Develop and implement innovative projects in the field of ICT.
SK08. The ability to conduct scientific and scientific-pedagogical activities in the field of ICT.
Learning outcomes: As a result of studying the academic discipline, the student must be able to demonstrate the following learning outcomes:
PH05. Determine the requirements for ICT based on the analysis of business processes and analysis of the needs of interested parties, and develop technical tasks.
PH06. Justify the choice of technical and software solutions, taking into account their interaction and potential impact on solving organizational problems, and organize their implementation and use.
PH10. Provide high-quality cyber protection of ICT, plan, organize, implement, and monitor the functioning of information protection systems.
PH12. Plan and carry out scientific research in the field of ICT, formulate and test hypotheses, choose methods, substantiate conclusions, and present results.
PH13. Develop and teach special disciplines on information systems and technologies in institutions of higher education.
Program learning outcomes are determined by the Higher Education Institution
PH14. To design, organize the implementation, use, and support of distributed and intelligent information systems of various kinds based on the analysis of organizational needs and capabilities.
Required prior and related subjects: Distributed information systems
Information resource integration technologies
Information technologies of computer networks
Summary of the subject: Concept and definition of Big Data. Development of the Big Data concept. Big data analysis techniques. Big data analysis technologies. Big data storage models. MapReduce computing model. Big data processing tools. Application of Big Data in the various domains.
Опис: 1. The concept of Big Data
Concept and definition of Big Data. Properties of Big Data. Requirements for Big Data. The specifics of Big Data. Classification of big data. Structured data. Sources of big structured data. Relational databases in big data. Unstructured data. Sources of unstructured data. The role of CMS in big data management. Management of heterogeneous data. Integrating different types of data into a big data environment.
2. Evolution of Big Data.
The evolution of data management.
Step 1: Create managed data structures
Stage 2: Website and Content Management
Stage 3: Big Data Management
Processing large volumes of data on MainFrame. Prerequisites and factors of emergence of the direction of Big Data. Formation and development of Big Data technologies. Subject areas of application of big data. The current state and prospects for the development of Big Data.
3. Big data analysis techniques.
A/B testing. Classification. Cluster analysis. Crowdsourcing (data selection). Data migration and integration. Data mining. Determination of agreement (harmony) of data. Genetic algorithms. Machine learning. Natural language processing. Network analysis. Optimization. Pattern recognition. Predictive modeling. Regression analysis. Signal processing. Spatial data analysis. Statistics. Imitation modeling (Simulation). Analysis of time sequences. Study of associative links. Study of functional relationships. Study of hidden connections.
4. Big data management technologies
Operational databases. Relational DBMS in a big data environment (relational database - SQL).
Non-relational DBMS (Key-value databases. Document databases. Columnar databases. Graphical databases. Spatial databases).
Specialized Big Data repositories.
Streaming data.
5. MapReduce calculation model
The MapReduce paradigm. Origins of MapReduce.
Principles of the Map function. Principles of the Reduce function. Combination of Map and Reduce functions.
Optimization of MapReduce tasks. Equipment/network topology. Data synchronization
MapReduce file system.
6. Big data processing tools
Big data processing system Hadoop. Principles of Hadoop. Hadoop Distributed File System (HDFS). HDFS name vertices. HDFS data representation. Hadoop and MapReduce. The Hadoop Ecosystem. Building a big data resource with the Hadoop ecosystem
Hadoop YARN resource and application management tool. HBase Big Data Storage Facility. Exploring Hive Big Data.
7. Big data analytics
Definition of big data analysis. Using big data to get results.
Basic analytics. Advanced analytics. Operational analytics. Descriptive analytics. Forecast (predictive) analytics. Recommendation (prescriptive) analytics. Monetization of analytics
8. Application of Big Data in subject areas.
Environmental monitoring. Social processes. State administration. Marketing. Trade. E-commerce. Medicine. Exchange activity. Policy.
Assessment methods and criteria: • Current control (45%): written reports on laboratory work, abstract, oral questioning
• Final control (55%, exam): Written-oral form.
Критерії оцінювання результатів навчання: Current control - performance, and defense of laboratory work, oral and frontal examination
Final control: oral survey, test control.
Порядок та критерії виставляння балів та оцінок: 100–88 points – (“excellent”) awarded for a high level of knowledge (some inaccuracies are allowed) of the educational material of the component contained in the main and additional recommended literary sources, the ability to analyze the phenomena being studied in their interrelationship and development, clearly, succinctly, logically, consistently answer the questions, the ability to apply theoretical provisions when solving practical problems; 87–71 points – (“good”) is awarded for a generally correct understanding of the educational material of the component, including calculations, reasoned answers to the questions posed, which, however, contain certain (insignificant) shortcomings, for the ability to apply theoretical provisions when solving practical tasks; 70 – 50 points – (“satisfactory”) is awarded for weak knowledge of the component’s educational material, inaccurate or poorly reasoned answers, with a violation of the sequence of presentation, for weak application of theoretical provisions when solving practical problems; 49-26 points - ("not certified" with the possibility of retaking the semester control) is given for ignorance of a significant part of the educational material of the component, significant errors in answering questions, inability to apply theoretical provisions when solving practical problems; 25-00 points - ("unsatisfactory" with mandatory re-study) is awarded for ignorance of a significant part of the educational material of the component, significant errors in answering questions, inability to navigate when solving practical problems, ignorance of the main fundamental provisions.
Recommended books: 1. White, Tom // Hadoop: The Definitive Guide // O'Reilly Media, 2009.
2. Hadoop. Apache Software Foundation // http://hadoop.apache.org/
3. Finley, Klint // Steve Ballmer on Microsoft's Big Data Future and More in This Week's Business Intelligence Roundup // ReadWriteWeb, 2011.
4. Fay Chang, Jeffrey Dean, Sanjay Ghemawat &, etc. // Bigtable: A Distributed Storage System for Structured Data // Google Lab, 2006.
6. Jeffrey Dean, Sanjay Ghemawat // MapReduce: Simplified Data Processing on Large Clusters // Google Inc., 2004.
7. Judy Qiu // Cloud Technologies and Their Applications // Indiana University Bloomington, 2010
8. The Hadoop Distributed File System: Architecture and Design // http://hadoop.apache.org/common/docs/r0.17.2/hdfs_design.html
9. Ralf Lammel // Google’s MapReduce Programming Model — Revisited // Microsoft Corp.
Уніфікований додаток: Lviv Polytechnic National University ensures the realization of the right of persons with disabilities to obtain higher education. Inclusive educational services are provided through the Service of accessibility to learning opportunities "Without restrictions", the purpose of which is to provide permanent individual support for the educational process of students with disabilities and chronic diseases. An important tool for the implementation of the inclusive educational policy at the University is the Program for improving the qualifications of scientific and pedagogical workers and educational and support staff in the field of social inclusion and inclusive education. Contact at:
St. Starosolskikh, 2/4, 1st academic building, room 112
E-mail: nolimits@lpnu.ua
Websites: https://lpnu.ua/nolimits https://lpnu.ua/integration
Академічна доброчесність: The policy regarding the academic integrity of the participants in the educational process is formed on the basis of compliance with the principles of academic integrity, taking into account the norms "Regulations on academic integrity at the Lviv Polytechnic National University" (approved by the academic council of the university on June 20, 2017, protocol No. 35).