Overview and details of the sessions and sub-session of this conference. Please select a date or session to show only sub-sessions at that day or location. Please select a single sub-session for detailed view (with abstracts and downloads if available).
F3: ID.10515 Epidemics Monitoring
10:30am - 11:30am
Session Chair: Yifang Ban
Session Chair: Chuanrong Li
Workshop: Land & Environment
Location: Sun Moon Room -2, 5.5 Floor, Junyi Dynasty Hotel
A Synthetic Approach for Estimating Distribution of Epidemic Disease Vectors
Ulster University, United Kingdom;
Development of classification models for predicting distribution and density of epidemic disease vectors, such as snails, requires large amount of training data constructed from satellite images using remote sensing methods, in conjunction with estimations of distribution of disease vectors that are obtained from field surveys. Clearly the performance of classification models is depending on the quality and size of training data, but obtaining large and high quality of training data is very labour intensive and costly. Building on the previous semi-supervised learning approach, in this report we will present the development of a synthetic approach for simulating environment training data based on labelled sample and comparative analysis over the synthetic training data and real world sample data. The report includes 1) a synthetic approach to simulating environment data for building prediction models; 2) a cumulative training approach for a semi-supervised machine learning; 3) and comparative evaluation results on synthetic data and the remote sensing data covering Dongting Lake region of China, which is provided by the Chinese partner.
A Comparison Between A Synthetic Over-Sampling Equilibrium And Observed Subsets Of Data For Schistosomiasis Disease Vector Classification
Ulster University, United Kingdom;
In this work we provide results of data mining and machine learning techniques which form the basis of our prediction model for snail density classification in relation to the Schistosomiasis epidemic disease. All experiments to date are cognitive components in the development of our prediction model for the epidemic disease Schistosomiasis. This disease is detrimental to the health of the communities of affected areas as well as the crop and cattle life. If detected for early warning of the disease, the local communities can be better prepared to deal with any consequences of a breakout.
The study area involved with this research is centred on the Dongting Lake area of Chengdu, China. When applied to remotely sensed data provided by our Chinese partners and the European Space Agency, we assess the performance of the synthetically created instances based on our data for classification, and the original real-world data for application with snail density classification performance accuracy.
This report gives an insight into the relationship between using a snapshot sample of environment data for epidemic disease vector classification, as opposed to the construction of an increased synthetic dataset. The synthetic data instances used are created based on the original real-world data we have using a modified version of the Synthetic Minority Over-Sampling Technique (SMOTE). The rationale behind proposing SMOTE is based on the fact that although we potentially have access to vast sources of satellite imagery with which to perform calculations for classification and prediction, this may not be the most sufficient, to achieve the greatest performance in possible classification accuracy. We have carried out testing on each year of training and testing in which we have modified the SMOTE method to achieve an equilibrium of snail density classes in order to provide balance in the sample and eliminate the likelihood of overfitting.
The problem we faced of partially complete data was initially addressed in the previous dragon symposium workshop looked at some initial methods of data imputation to assess the precision of the replacement values. The results of which showed that the proposed Double PreSuccession method proved to be the most accurate with replacing values. In this report we have also tested this method against an alternative approach which uses regression for replacing our missing data. Both of these methods were tested and analysed in comparison with the Weka replace missing values filter as a benchmark. The results with most accurate value replacements will be used as we proceed with any future missing values in a dataset.