Once I found my “gap,” I felt energized — for about a week.
Then reality hit.
Research isn’t a straight line. It’s trial and error , heavy on the error part.
I tried building my first model architecture by tweaking existing designs. I thought if I just stacked more layers or cranked up certain hyperparameters, magic would happen.
It didn’t.
My models became bigger and slower, the complete opposite of what I wanted.
Inference times were terrible.
Accuracy gains were negligible.
For weeks, it felt like pushing a boulder uphill only to watch it roll back down .
There were days I thought maybe I just wasn’t smart enough. Maybe computer vision wasn’t for me.
But with each failure, something shifted in my mind: I started paying closer attention.
Instead of just blaming the results, I asked:
- Why did this model become slower?
- Why didn’t this attention mechanism help as expected?
- What assumptions was I making that didn’t actually hold true in practice?
I realized that academic papers often highlight what worked , but rarely how much didn’t before it worked.
I realized that real-world data — like disaster images — isn’t clean, and building models that look great on tidy benchmarks doesn’t mean they’ll survive messy inputs.
Most importantly, I realized that understanding the principles — shift operations, efficient convolutions, memory bottlenecks — mattered much more than copying architectures.
Slowly, my work evolved.
I started thinking not like someone chasing results, but like someone building understanding .
It wasn’t glamorous. It wasn’t fast.
But it was real progress.