Introduction
The process of developing a data-driven model can be broadly abstracted into three key steps:
- Define Objective and Curate Data
- Develop and Train the Model
- Evaluate Performance
Then repeat 1-3 until the desired performance is achieved.
Three steps may sound straightforward and simple but a lot of interesting research topics would be involved.
- We do not know which data is more informative and how much data is enough for the model, Active Learning can be used to measure the informativeness of data and curate dataset.
- If we want to reduce the annotation cost by utilizing unlabeled samples, we can use Self- or Semi-supervised methods to train the model while saving the cost.
- Computing resources can be a huge cost considering repetitive data collection and model training, Meta-Learning or Continual Learning can be favorably adopted to deal with the newly added data.
Although not explicitly mentioned, good engineering should be necessary for all processes.
In my own journey, I had to struggle with the challenge of balancing performance and time constraints. Given limited resources, I chose to focus on developing a Minimum Viable Product (MVP) within the available time frame, rather than pursuing the ideal model with extensive time and effort.
It’s vital to note that this choice may not necessarily be the best fit for every circumstance, and context plays a significant role in decision-making.