In the summer vacation of 2013, there was still about a month before the start of the competition.
"The training process of the model requires all weights, data and many intermediate processes to be put into the GPU for processing. Therefore, the size of the GPU memory is particularly important." Meng Fanqi sighed, "Even the flagship 690 we purchased is too small. Yes, it’s only 4G in size.”
Compared with the A100-80G, which was later banned from being sold to China by the United States, the 690 has 20 times less video memory alone, not to mention other performance. Meng Fanqi can now only iterate on the model using 16 pictures each time.
"Sixteen pictures at a time, one cycle requires close to one million times to update the entire data set. And if you want to converge the model well, hundreds of cycles are indispensable."
Meng Fanqi estimated that it would take nearly 20 days for this version to produce a result. The final training process indeed took about three weeks to converge to the current performance.
Fortunately, IMAGENET has basically become a training data set that every algorithm engineer must adjust. Meng Fanqi himself has been on the list countless times, so he is naturally familiar with it and knows the approximate settings of various parameters.
This saved him at least a month or two of precious time.
Even though a training session takes three weeks, Meng Fanqi still prepared a version of the model before the competition started.
Seeing that the final performance of the trained model met expectations, Meng Fanqi finally felt a big stone in his heart.
In the past few months, the only thing he was worried about was that the old framework from many years ago would have some unexpected problems, resulting in the final results not being consistent with theoretical expectations.
Once this happens, the cost of finding the problem and testing a solution is too high. If it cannot be resolved in time, it will greatly affect his initial planning.
The current result is about a top-5 error rate of 4.9%. This version is slightly worse than the performance in later papers, but fortunately it is still better than the human standard given by the competition.
Generally speaking, the specific data used in the game will not be announced before the game. It's just that the IMAGENET competition is special. With more than 10 million images, it is impossible to throw them away and no longer use them after one or two competitions.
Therefore, the data used in each competition changes very little, but the specific track, competition content and judging methods are often adjusted.
Although IMAGENET can actually submit results during the offseason, and Meng Fanqi can upload the results now and win the first place, the attention after all cannot be compared with the fierce competition during the game.
At the same time, Don Juan finally began to realize that the direction of things had deviated far from what he expected.
"I remember that I found that AlexNet's accuracy on this was less than 85. Now yours is over 95." Don Juan couldn't believe this fact when he came to check the results for the first time.
"Are you sure you're not mistaken? Don't fool me, brother. If you don't read enough, you can easily be deceived." Don Juan's mentality at the moment was very complicated. He really hoped that this was true, but because things seemed so good, he was very confused. Hard to believe.
"It's fake, I lied to you." Meng Fanqi rolled his eyes, "I added special effects, and they are all chemical ingredients."
"No way, I have seen with my own eyes that this performance has improved along the way." Tang Juan flipped through the model training log again, with a hint of grievance in his voice. He was already thinking about the scene where he hugged his thighs tightly and reached the pinnacle of his life.
This is a poor person worried about gains and losses. He can't believe it, but he is afraid that it is fake.
"Although I don't have the real answers to the test set, I did not use 5% of the training set as a verification method." Meng Fanqi can be said to have a clear understanding of the variance of this data set, 95% of which is not used. The data is used for training and 5% of the data is used for testing, which is a very safe and conservative ratio.
"In other words, as long as the 5% data is not much different from the test set data, your method can be ten percentage points better than last year's champion?" Don Juan was still in extreme shock. "Is it that simple? You all fell down before I even tried to do anything?"
Don Juan's feeling at this moment was like Ya Shenyue discovering for the first time that he could directly assign the God of Death to get rid of his biggest opponent L. The imagined effort, hard work and struggle did not happen and were completely unnecessary. Amazing results and progress were achieved even before the game officially started.
"This is life. Success or failure may have nothing to do with you in many cases. Just get used to it." Meng Fanqi patted him on the shoulder. "It's okay if you don't get used to it this time. There is still a long road ahead. You will accustomed."
Because there’s nothing you can do if you’re not used to it, right? People who can't change their weight can only change their aesthetics.
Otherwise, you will be tortured by yourself for the rest of your life.
Now that we have achieved this result on 95% of the data, the next thing to do is to add the remaining 5% and continue to fine-tune the model for a few days.
In this way, the final results can be used directly for submission in November.
If you continue to fine-tune the performance of a model that is already performing quite well, it will take far less than 21 days.
It only takes about two days for the new training log to show that the model's performance has basically converged to a fixed value and rarely continues to fluctuate.
In this case, Meng Fanqi has only one thing left to do before going to the Australian conference, and that is to complete the experimental data of these papers at hand.
Fill in the final missing piece of the puzzle in these articles.
As of this time, Meng Fanqi has completed approximately 7 articles. In addition to the core of this competition, the new model DreamNet based on residual ideas, as well as related training techniques, batch normalization, Adam second-order optimizer, and Mix-up data enhancement.
Meng Fanqi also prepared groundbreaking work in three other directions to capture three key areas.
Among the relevant content in the competition, only the residual network can be regarded as groundbreaking content. Although the remaining three are excellent works in their respective directions, they can hardly be regarded as the foundation works of a certain subdivision.
Writing a paper to describe it in detail was just a matter of helplessness, because in order to ensure the performance and training speed of DreamNet, Meng Fanqi had no choice but to use some techniques.
In order to ensure that such important results could be reproduced in the industry, Meng Fanqi had to describe these training techniques in detail, so he wrote a paper. But if you have a choice, there is no rush.
What he really hopes to seize the opportunity to deploy is, first, the generative adversarial network that he had previously discussed with Dean Fu. This is the most promising and elegant label-free learning method in recent years, and is a milestone that will be difficult to avoid for all future generative technologies.
Second, it is a real-time detection network based on new ideas. This will greatly increase the speed and accuracy of distinguishing objects and determining their location in pictures. The most widely implemented image detection technology in the future, whether it is face recognition, autonomous driving or industrial inspection. These new technologies have to mention the importance of this speed increase.
Third, it is the most concise and easy-to-use segmentation network, U-Net. This will be the baseline for complex segmentation tasks and dominate the field of medical images.
Meng Fanqi selected these three categories and added the residual network to cover the four major areas of classification, detection, segmentation and generation. Occupying the four main tracks of image algorithms.
The reason why all image-based technologies are chosen is to appear more reasonable. As for language, speech or multi-modal fusion algorithms, he plans to put them forward a little later.