Identification of Vortex Information. Detection of fake news eruption time
DOI:
https://doi.org/10.33077/uw.24511617.sm.2024.1.761Słowa kluczowe:
fake news, harmful information, bigrams (letter pairs), fake news detection, information vortex, Big Data, AI, information refininigAbstrakt
The purpose of this study is to develop and validate a procedure known as the Information Vortex Indicator (IVI) for its effectiveness, designed to detect the timing of information vortex formation in textual data streams. Research has established that the formation of this vortex coincides with the onset of the dissemination of fake news (FN) concerning a particular object (such as a person, organization, company, event, etc.). The primary aim of this detection is to minimize the time required for an appropriate response or defense against the adverse effects of information turbulence caused by the spread of fake news. Methodology: The study used Big Data information resources analysis instruments (Gogołek, 2019, 2022), including selected statistical and artificial intelligence techniques and tools, to automatically detect vortex occurrence in real time. Experimental validation of the efficacy of these tools has been conducted, enabling a reliable assessment of the timing of vortex emergence. This assessment is quantified using the V-function, procedure, or test, which formally describes the IVI procedure. The V-function’s parameters are derived from the distribution patterns of letter pair clusters within the textual information stream. Conclusions: A comparison of manual (reference) and automatic detection of vortex emergence times confirmed an accuracy rate of over 80% in detecting the appearance of fake news. These results underscore the effectiveness of the IVI procedure and the utility of the selected tools for rapidly automating the detection of information vortices, which herald the propagation of fake news. Furthermore, the study demonstrates the applicability of IVI for the continuous monitoring of information with significant media value across multiple multilingual data streams. Originality: This research introduces a novel approach utilizing the distribution of letter pair clusters within information streams to detect the onset of information vortices, coinciding with the emergence of fake news. This methodology represents a unique contribution to the field, as prior research on this subject is limited.
Bibliografia
Arutyunov, A., Borisov, L., Fedorov, S., Ivchenko, A., Kirina-Lilinskaya, E., Orlov, Y., Osminin, K., Shilin, S., & Zeniuk, D. (2016). Statistical Properties of European Languages and Voynich Manuscript Analysis. CoRR, abs/1611.09122. DOI: https://doi.org/10.20948/prepr-2016-52
Camps, J.-B., Clérice, T., & Pinche, A. (2021). Noisy medieval data, from digitized manuscript to stylometric analysis: Evaluating Paul Meyer’s hagiographic hypothesis. Digital Scholarship in the Humanities, 36(2), ii49–ii71. https://doi.org/10.1093/llc/fqab033 DOI: https://doi.org/10.1093/llc/fqab033
Gogołek, W. (2006). Hit z komputera. Polityka, 45. Pobrane z https://technopolis.polityka.pl/2006/program-na-hit
Gogołek, W., & Kuczma, P. (2013). Rafinacja informacji sieciowych na przykładzie wyborów parlamentarnych. Część 1. Blogi, fora, analiza sentymentów. Studia Medioznawcze, 2(53), 89–109.
Gogołek, W. (2019). Refining Big Data. Bulletin of Science. Technology & Society, 37(4), 212–217. https://doi.org/10.1177/0270467619864012 DOI: https://doi.org/10.1177/0270467619864012
Gogołek, W. (2022). Big Data o mediach. Dominanty świata mediów. Studia Medioznawcze, 23(2), 1171–1180. https://doi.org/10.33077/uw.24511617.sm.2022.2.684 DOI: https://doi.org/10.33077/uw.24511617.sm.2022.2.684
Gogołek, W., & Jaruga, D. (2016). Z badań nad systemem rafinacji sieciowej. Identyfikacja sentymentów. Studia Medioznawcze, 4(67), 103–111. https://doi.org/10.33077/uw.24511617.ms.2016.67.435 DOI: https://doi.org/10.33077/uw.24511617.ms.2016.67.435
Gogołek, W., Jarzyńska, K., Żukowski, K., Wierzbicki, P., & Durlak, U. (2022). Citizen Big Data Refining on the example of the capital city of Warsaw. Urban Development Issues, 73, 08. https://doi.org/10.51733/udi.2022.73.08 DOI: https://doi.org/10.51733/udi.2022.73.08
Gomes, H. M., Grzenda, M., Mello, R., Read, J., Le Nguyen, M. H., & Bifet, A. (2023). A Survey on Semi-supervised Learning for Delayed Partially Labelled Data Streams. ACM Computing Surveys, 55(4), 1–42. https://doi.org/10.1145/3523055 DOI: https://doi.org/10.1145/3523055
Hirst, G., & Feiguina, O. (2007). Bigrams of Syntactic Labels for Authorship Discrimination of Short Texts. Literary and Linguistic Computing, 22(4), 405–417. https://doi.org/10.1093/llc/fqm023 DOI: https://doi.org/10.1093/llc/fqm023
Huang, J. (2020). Detecting Fake News With Machine Learning. Journal of Physics: Conference Series, 1693, The 2020 3rd International Conference on Computer Information Science and Artificial Intelligence (CISAI) 2020 25-27 September 2020, Inner Mongolia, China. https://doi.org/10.1088/1742-6596/1693/1/012158 DOI: https://doi.org/10.1088/1742-6596/1693/1/012158
Koczkodaj, W. W., Mazurek, M., Pedrycz, W., Rogalska, E., Roth, R., Strzalka, D.,… Zbyrowski, R. (2022). Combating harmful Internet use with peer assessment and differential evolution. 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET), Prague, Czech Republic, 2022. https://doi.org/10.1109/ICECET55527.2022.9873437 DOI: https://doi.org/10.1109/ICECET55527.2022.9873437
Litvinova, T. A., Seredin, P. V., & Litvinova, O. A. (2015). Using Part-of-Speech Sequences Frequencies in a Text to Predict Author Personality: a Corpus Study. Indian Journal of Science and Technology, 8(S9), 93–97. DOI: https://doi.org/10.17485/ijst/2015/v8iS9/51103
Luo, M., & Mu, X. (2022). Entity sentiment analysis in the news: A case study based on Negative Sentiment Smoothing Model (NSSM). International Journal of Information Management Data Insights, 2(1), 100060. https://doi.org/10.1016/j.jjimei.2022.100060 DOI: https://doi.org/10.1016/j.jjimei.2022.100060
Markov, A. A. (2006). An Example of Statistical Investigation of the Text Eugene Onegin Concerning the Connection of Samples in Chains. Science in Context, 19(4), 591–600. https://doi.org/10.1017/S0269889706001074 DOI: https://doi.org/10.1017/S0269889706001074
Meel, P., & Vishwakarma, D. K. (2020). Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities. Expert Systems with Applications, 153, 112986. https://doi.org/10.1016/j.eswa.2019.112986 DOI: https://doi.org/10.1016/j.eswa.2019.112986
Meel, P., & Vishwakarma, D. K. (2021). A temporal ensembling based semi-supervised ConvNet for the detection of fake news articles. Expert Systems with Applications, 177, 115002. https://doi.org/10.1016/j.eswa.2021.115002 DOI: https://doi.org/10.1016/j.eswa.2021.115002
Operation of police powers under the Terrorism Act 2000 and subsequent legislation: Arrests, outcomes, and stop and search, Great Britain, quarterly update to December 2022. (2023, March 9). Home Office. Retrieved June, 2023, from https://www.gov.uk/government/statistics/operation-of-police-powers-under-the-terrorism-act-2000-quarterly-update-to-december-2022/operation-of-police-powers-under-the-terrorism-act-2000-and-subsequent-legislation-arrests-outcomes-and-stop-and-search-great-britain-quarterly-u
Rohit, B. (2011, March 31). The 5 Models Of Content Curation. [Blog Post]. Retrieved from https://rohitbhargava.com/the-5-models-of-content-curation
Sanger, D., E., & Bumiller, E. (2011, May 31). Pentagon to Consider Cyberattacks Acts of War. The New York Times. Retrieved from https://www.nytimes.com/2011/05/01/us/politics/01cyber.html
Sękiewicz, J. (2012). Łańcuchy Markowa i ich zastosowanie w filogenetyce. Praca magisterska na Wydziale Matematyki i Informatyki Uniwersytetu Jagiellońskiego. Pobrane z https://ruj.uj.edu.pl/xmlui/handle/item/182784?search-result=true&query=s%C4%99kiewcz¤t-scope=&rpp=50&sort_by=score&order=desc
Shawkat, N., Simpson, J., & Saquer, J. (2022). Evaluation of Different ML and Text Processing Techniques for Hate Speech Detection. 2022 4th International Conference on Data Intelligence and Security (ICDIS), Shenzhen, China, 2022, 213–219. https://doi.org/10.1109/ICDIS55630.2022.00040 DOI: https://doi.org/10.1109/ICDIS55630.2022.00040
Školkay, A., & Filin, J. (2019). A Comparison of Fake News Detecting and Fact-Checking AI Based Solutions. Studia Medioznawcze, 20(4), 365–383. https://doi.org/10.33077/uw.24511617.ms.2019.4.187 DOI: https://doi.org/10.33077/uw.24511617.ms.2019.4.187
Wang, T., Lu, K., Chow, K. P., & Zhu, Q. (2020). COVID-19 Sensing: Negative Sentiment Analysis on Social Media in China via BERT Model. IEEE Access, 8, 138162–138169. https://doi.org/10.1109/ACCESS.2020.3012595 DOI: https://doi.org/10.1109/ACCESS.2020.3012595
Wierzbicki, P. (2022). Identyfikacja zmian frekwencji publikowanych wpisów na przykładzie Twittera. Segmentacja strumienia informacyjnego. Praca magisterska na kierunku studiów Zarządzanie Big Data na Wydziale Dziennikarstwa, Informacji i Bibliologii Uniwersytetu Warszawskiego.
Zhou, Z.-H. (2022). Open-environment machine learning. National Science Review, 9(8), nwac123. https://doi.org/10.1093/nsr/nwac123 DOI: https://doi.org/10.1093/nsr/nwac123
Pobrania
Opublikowane
Jak cytować
Numer
Dział
Licencja
Prawa autorskie (c) 2023 Włodzimierz Gogołek
Utwór dostępny jest na licencji Creative Commons Uznanie autorstwa – Użycie niekomercyjne 4.0 Międzynarodowe.
Publikacje na łamach „Studiów Medioznawczych” ukazują się na zasadach odpowiadających licencji Creative Commons Uznanie autorstwa-Użycie niekomercyjne 4.0 Międzynarodowe (CC BY-NC 4.0).