The host of the meeting smiled and announced Meng Fanqi's grand appearance.
Meng Fanqi was obsessed with calculating the share and imagining a bright future. He didn't realize that time passed so fast.
He quickly turned off Google's financial report and slowly took the stage.
Seeing a classroom full of scholars in the field of AI, all looking at him with great anticipation, Meng Fanqi found that he was far less nervous than he thought.
"Hello everyone, the topic of this report is mainly about deep residual learning applied in the image field. I know that you may come here mainly for its achievements in the field of image recognition."
Having said this, Meng Fanqi paused for a second, and understanding laughter rang out from the audience.
"However, on such an occasion, I personally do not want to spend too much time on specific details. Relevant papers and codes will be open sourced one after another in the near future. Friends who are interested in the details can explore on their own."
After signing the contract with Google, Meng Fanqi no longer needs to keep all the papers in hand.
Taking advantage of the popularity of the press conference and this international conference, he planned to make it all public directly.
Of course, this part of the content now needs to be approved and reviewed internally by Google before it can be released, lest they cause losses to Google's interests.
However, since most of this technical content is in the field of image algorithms, Google's key interests are not yet involved.
If we wait a few weeks, after Meng Fanqi has formulated a series of strategies for Google's recommendation and advertising algorithms, and want to release this part of the technology, it is absolutely impossible for Google to agree.
When core interests are not involved, Google has always been very generous in these technologies. After Jeff and Hinton's review, Meng Fanqi was ready to publish some of the work he had saved previously.
Before the conference begins, some papers headed by DreamNet can already be read directly on arxiv.
Just now when other teams were introducing their practices, many people were secretly reading DreamNet papers just like Meng Fanqi read financial reports.
"Although the theme of the competition is image recognition and classification, there are really few opportunities for so many scholars to gather together. I hope to display some other aspects of content here at the same time."
There will be three visual summits, namely the International Conference on Computer Vision, ICCV, that Meng Fanqi participated in this time.
There is also the European Conference on Computer Vision, ECCV, a more European conference.
Both of these are held every two years, so there are only two such top international conference opportunities a year.
The only one held every year is CVPR, the International Conference on Computer Vision and Pattern Recognition.
But it is basically held in the United States, and there are often some problems with visas.
This opportunity is rare, so of course I need to promote my work more.
"Based on the residual idea of DreamNet, not only has it made a huge breakthrough in image recognition and classification, I have also derived some of its variants, such as generative networks, detection networks, and segmentation networks."
There are competition results for recognition and classification, papers on generative networks have been released, and with the launch of Baidu, everyone has already understood its power.
As for the segmentation network, it is the U-Net work released together with the DreamNet paper in the past two days. At this point, it can be said that the basic paradigms of several major visual tasks have been laid by Meng Fanqi.
In the future, whether it is recognition and classification, segmentation detection, or transfer generation, it will be difficult to avoid these lightweight and easy-to-use methods.
"It can be seen that after this kind of thinking swept the visual field, it made disruptive breakthroughs in the current main research directions."
Meng Fanqi placed the main experimental conclusions of these papers on the second page of the slide, hoping to shock everyone with the results first.
"Obviously, these algorithms have opened a huge gap with the second place in many fields, and a considerable part of the credit should belong to the revolution of network depth by residual ideas."
"In 2010/11, we were still using artificially designed SIFT, HOG and SVM. In 2012, Alex's eight-layer AlexNet made a huge breakthrough."
“This year, the deep revolution triggered by residual thinking has made it possible to train neural networks with 150+ layers.”
“Deep neural networks are the basic engine and backbone of many task scenarios, especially visual task scenarios, which is why it can quickly impact several mainstream tasks.”
"From a structural point of view, there is nothing special about DreamNet. Compared with previous networks where each layer was designed separately, I actually deliberately wanted it to be simple and repetitive."
The slide behind Meng Fanqi showed an extremely slender pattern, which is the DreamNet structure diagram with a depth of more than a hundred layers.
Zooming in to show its basic design, everyone found that its single-layer design was very simple and simple, using only the most conventional operator operations.
The long structure diagram scrolled, and the audience discovered that there was no difference between each layer, and they were just repeating themselves.
Because it took too long to roll more than a hundred layers, it seemed a bit joyous in this serious occasion, and there were bursts of laughter in the venue.
“Then this begs the question, is it always possible to achieve better results simply by making the network bigger and deeper?”
Meng Fanqi's question will not have a theoretical answer until 2023, but it is obvious that huge models are continuing to create miracles one after another.
Whether it is painting, dialogue, or image manipulation, it is far from reaching its limit.
"I very much hope that I can tell everyone this answer clearly and theoretically, but due to limited ability, I can only give my own guess, that is Yes."
"I believe that as long as we have more and better GPUs, more and better data, as well as bigger models and better optimization methods, we can continue to create miracles."
"As for the obstacles encountered by the previous network in terms of depth, I think it is not a problem of the network's capabilities, but that we have not found a suitable way to optimize it."
Letting the network repeat a few more times is something that many people have tried. Obviously, the results obtained are worse than the original ones.
This is not a strange phenomenon in traditional methods. Many people interpret this phenomenon as dimensionality disaster or over-fitting, without conducting sufficient in-depth exploration.
"After thinking about it for a moment, this is obviously a counter-intuitive phenomenon. For deeper networks, we can completely copy all the parameters of the small version of the network, and as long as the extra parts do nothing, the model will at least not worse."
"But in fact, it is not the case. I believe many people have observed this common phenomenon, that is, the deeper model becomes a lot worse."