Problem

Learning vocabulary using word books and memory cards is very popular in Japanese Highschool students. They buy different wordbooks that cover the vocabulary from the test of the university they want to apply for.

SOLUTION

Project overview

Castdice team collects all the vocabulary from large public test datasets using data mining and computer vision technology. In the mobile apps, users can browse, create different word books targeting various universities and levels from these data

Challenges

  • Public test data is not well organized and in various forms like pdf, scan images.
  • After collecting the English sentences in the test, we have to convert them into words in dictionary form before using.
  • Time and resource is very limited, we have to quickly develop 2 mobile apps to start beta testing within 2 months.

Results

We build a data mining module to build a word database and develop 2 mobile applications using 1 source code built in Flutter.

FEATURE HIGHLIGHTS

Data mining process

We implement a data mining module that utilizes web crawlers in data collecting, OCR in the image to text converting then perform the word-stemming process before input to word data to the database. The module can be run every month to collect and update new data to the word database

Creating word books by test history

Word data is labeled with the universities which used that word for their test in the past and the date of the test. Users can use dynamic filters to create word books based on their university and test date selection

Technical stacks

  • Mobile architecture

    • FIREBASE
    • FLUTTER
  • Server architecture

    • PYTHON
    • GCP

other products