Once I found my “gap,” I felt energized — for about a week.

Then reality hit.

Research isn’t a straight line. It’s trial and error , heavy on the error part.

I tried building my first model architecture by tweaking existing designs. I thought if I just stacked more layers or cranked up certain hyperparameters, magic would happen.

It didn’t.

My models became bigger and slower, the complete opposite of what I wanted.

Inference times were terrible.

Accuracy gains were negligible.

For weeks, it felt like pushing a boulder uphill only to watch it roll back down .

There were days I thought maybe I just wasn’t smart enough. Maybe computer vision wasn’t for me.

But with each failure, something shifted in my mind: I started paying closer attention.

Instead of just blaming the results, I asked:

  • Why did this model become slower?
  • Why didn’t this attention mechanism help as expected?
  • What assumptions was I making that didn’t actually hold true in practice?

I realized that academic papers often highlight what worked , but rarely how much didn’t before it worked.

I realized that real-world data — like disaster images — isn’t clean, and building models that look great on tidy benchmarks doesn’t mean they’ll survive messy inputs.

Most importantly, I realized that understanding the principles — shift operations, efficient convolutions, memory bottlenecks — mattered much more than copying architectures.

Slowly, my work evolved.

I started thinking not like someone chasing results, but like someone building understanding .

It wasn’t glamorous. It wasn’t fast.

But it was real progress.