Student life


Data Engineering and Semantics Research Unit: Projects
English | Français
DAAD (2016): Transferring the Research Training Group experience in Social Media Research [with University of Duisburg-Essen, Germany]
The aim of this project is to transfer the experience of the Duisburg-Essen team into social media research. The contribution focuses on semantic similarity / proximity methods and evaluation techniques used in this field. The work carried out has been published in an international peer-reviewed journal. Two Workshops are held, one in Tunisia and another in Germany. An engineering end-of-studies internship as well as two internships for doctoral students were carried out in Germany within the framework of this project.
DAAD (2017): Transferring the research networking and valuation experience as a bridge between academic and industrial issues [with University of Passau, Germany]
The objective of this project is to develop a service based on social data for the follow-up of the elderly. It should be noted that this project was carried out in collaboration with a research team from the University of Passau in Germany. The work carried out was the subject of a publication in an international conference. A workshop was held in Tunisia to present the experience of the German team in the development of H2020 projects. An engineering end-of-studies internship as well as two internships for doctoral students are being carried out in Germany as part of this project.
MoHESR (2020): Psycho-social surveillance and epidemiological prediction of COVID-19 in the Tunisian context
This project aims to create computer applications that are based on open technologies such as NIH semantic resources (e.g., PubMed, MeSH), open source programs available in GitHub (e.g., COVID-19 datasets) and Wikimedia Projects (e.g., Wikipedia, Wikidata) for the psycho-social support of the COVID-19 pandemic in Tunisia:
  • A COVID-19 prediction system that will identify and predict the course of the epidemic using Artificial Intelligence algorithms.
  • A thematic social media analysis system that will be used for real-time identification of user interests in Tunisia regarding the epidemic (the SARS-CoV-2 virus).
  • A scientific article recommendation system that will suggest scientific publications to the ministry to respond to rumors or confirm true information.
The onset of the virus is often linked to the first symptoms. Thus, it is possible to alert a person who may be a potential carrier of the virus so that they can take the necessary measures using conventional procedures. From a methodological point of view, the approach that we intend to develop is based essentially on automatic and deep learning algorithms, modeling the evolution of the COVID-19 epidemic in these different phases (identification, transmission). Determining statistical tools in relation to the spread of the SARS-CoV-2 virus represents very relevant information for physicians and researchers. Indeed it will minimize the radius of action of the virus. It is thus possible to propose decision support scenarios to carry out adequate treatments. Ultimately, it becomes possible to deduce the best way to act and the precautions to be taken once an outbreak is detected. The knowledge bases are used for the detection of topics from data extracted from social networks in relation to the CODIV-19-centered topics discussed and propagated between users in Tunisia. A preprocessing step will be set up for the transformation of textual data written in Arabic and Tunisian dialect into concepts belonging to biomedical ontologies. However, we exploit natural language processing skills which are essentially based on hybrid approaches combining statistical approaches (n-grams, Bayesian networks,…) and deep learning (fastText, Word2Vec,…) to overcome certain related problems to the nature of the Arabic language. Big Data technologies will be leveraged throughout the data pipeline from ingesting big social data to parallel processing (MapReduce and Spark) of data saved in a Distributed File Management System (HDFS). A Big Data architecture will be provided by a range of services provided by the Datacenter which will be made available via an appropriate software layer. The data will be collected in streaming mode in order to be able to make services available in real time. Also, an ontology-driven solution is used to deal with the heterogeneity of data collected from different social networks. Such a solution is already established within the framework of our research work. It will also allow the unification of social data for storage in a Big Semantic Data context and the manipulation of a knowledge graph by Network Embedding methods. The semantic technology driven by knowledge bases provides a representation based on semantic Web technologies and interoperable with biomedical ontologies to be able to provide the services in relation to the recommended system of recommendation.