Learning vocabulary using word books and memory cards is very popular in Japanese Highschool students. They buy different wordbooks that cover the vocabulary from the test of the university they want to apply for.
Castdice team collects all the vocabulary from large public test datasets using data mining and computer vision technology. In the mobile apps, users can browse, create different word books targeting various universities and levels from these data
- Public test data is not well organized and in various forms like pdf, scan images.
- After collecting the English sentences in the test, we have to convert them into words in dictionary form before using.
- Time and resource is very limited, we have to quickly develop 2 mobile apps to start beta testing within 2 months.
We build a data mining module to build a word database and develop 2 mobile applications using 1 source code built in Flutter.
Data mining process
We implement a data mining module that utilizes web crawlers in data collecting, OCR in the image to text converting then perform the word-stemming process before input to word data to the database. The module can be run every month to collect and update new data to the word database
Creating word books by test history
Word data is labeled with the universities which used that word for their test in the past and the date of the test. Users can use dynamic filters to create word books based on their university and test date selection