Compared with the remarkable and excellent Tiger algorithm, the mobile optimization sorting algorithm is slightly less effective.
Therefore, Meng Fanqi did not rush to promote online testing, but waited for the update that combined the AI language interpretation model to be promoted together.
Currently, recurrent neural networks (RNN) and long short-term memory methods (LSTM) are usually used for language problems. Both of these methods are old methods from the end of the last century.
These two methods are simple and easy to use, so they have been flourishing until around 2017.
Until Transformer, which is the T method of ChatGPT, appeared.
Generally speaking, everyone thinks that the reason why the Transformer method can quickly replace RNN and LSTM is mainly because it is more convenient to perform in parallel.
It is easy to achieve parallelism on multiple devices. The core significance of this is to make large-scale versions possible, which also laid the foundation for the ultimate giant model such as ChatGPT.
"In fact, the old version of RNN also has a way to do parallelism very well. There is a big misunderstanding about this matter in the field." Meng Fanqi frowned and thought.
In the original timeline, after Transformer came out, everyone put down their research on the old methods and embraced the T method.
But in 2018, someone actually made a high degree of parallelism in RNN. Unfortunately, it was too late.
If this discovery could have been made a year earlier, RNN might have been a competitor of the T method for a long time, and we might also have seen the emergence of ChatRNN.
"The early T method required a lot of data, the various parameters were difficult to adjust, and the computing power required was huge." Even if Meng Fanqi made an improved version based on many methods that later matured, the T method was still troublesome in the early days.
"Fortunately, Google has no shortage of data and computing power, and I am also familiar with various classic parameter settings." Meng Fanqi first wrote a prototype version of the T method and conducted a test.
"However, limited by the memory of current graphics cards, the model cannot be made very large unless I specifically develop advanced parallel methods such as DeepSpeed."
Training the model on multiple cards may be for the sake of speed, or it may be because it cannot fit on one card.
Among them, data parallelism is the simplest, that is, different cards are doing the same thing, and each card will store a model.
It's just that the input data is different. After different cards complete the calculation, they are integrated and updated together.
It's like everyone used the same knife to cut different dishes, and finally piled the cut ingredients together.
But sometimes, there is no room for the model on one card, and this situation is more troublesome. Because one person cannot lift the knife at all, it requires the cooperation of many people.
Each layer can be split into different cards, or different layers can be assigned to different cards. In this way, multiple cards are actually used to achieve an effect similar to single-card training.
Obviously, the former will be much easier than the latter. The former only needs to copy these models on different cards and read the data separately for calculation.
The latter needs to be split and merged according to different situations and settings, and you can make mistakes if you are not careful.
I took a look at the Google Brain server and found several batches of 2013 GTX Titans inside. This thing is really worth a lot of money.
Considering other products at the time, 6G video memory still stood out.
Compared to the 4G flagship model that Meng Fanqi spent a lot of money to purchase, the extra 2G of video memory is enough to do many other things.
Trading speed for video memory, Meng Fanqi did a lot of repeated operations of transferring parameters and information between the CPU and GPU.
Because before he officially joined the company, the graphics cards assigned to him by Google Brain already had 16 Titans. These cards were allocated exclusively to Meng Fanqi and could be used at any time.
In addition, there are 32 GPUs on different nodes that can be applied for.
"At this time, there were not so many Google graphics cards, and this configuration is already quite generous."
Not only does it provide a unified configuration system and environment, but it also provides good multi-card parallel methods and examples.
In another two years, thousands or tens of thousands of TPUs will be standard equipment.
If Meng Fanqi wants to integrate AI into the search system, there are three main directions.
One is to split keywords and use language models to obtain their meaning in the real world, so as to better rank the results.
The second is to expand the scale of the model so that it has a certain broad understanding, thereby expanding the amount of content that can be searched.
The third is to allow search engines to better understand how different language sequences will change the intent of the query.
Two of them are currently more difficult to handle, while Meng Fanqi is very confident about the first and third.
The traditional RNN and LSTM loop methods make it difficult to handle longer statements properly, and the understanding of sequence changes is not so sufficient.
Meng Fanqi's prototype T method has unique advantages in this regard.
In addition, although the T method is difficult to learn from small data, each parameter is also difficult to fine-tune, and the overall training is difficult.
But this is not difficult for Meng Fanqi, an old alchemist. With the massive data already prepared by Google, Meng Fanqi is still very confident in the effectiveness of this method.
After putting all the graphics card resources into training, Meng Fanqi ended his working journey at Google Shanghai for about ten days on Christmas Eve in 2013.
The training of the model takes a certain amount of time, and the next two steps of the advertising algorithm may take two weeks, after New Year's Day.
Meng Fanqi was relieved to finally have basically completed the technique that attracted the most money in his early career.
Just when he was planning to start a company and starting to look at the work site and the amount of equipment, an unexpected phone call disrupted his rhythm.
"Hello Mr. Meng, I am Li Kaifu's secretary at Innovation Factory. He would like to have a face-to-face talk with you, but due to physical reasons, it is difficult to travel. I wonder if it would be convenient for you to come over?"
Li Kaiwen? He can be regarded as a senior Chinese person in the Google system, reaching the highest level of global vice president and the number one position in China.
Not only that, he has also held high positions at Apple and Microsoft.
However, after the four-year contract expired in 2009, he resigned and started his own dream of investing in college students with an angel fund.
"Where is Teacher Li Kaifu now?" Meng Fanqi was relatively familiar with Li Kaifu's experience. It was probably in the early stages of his cancer, but he didn't know where he received treatment.
"Mr. Li Kaifu first received treatment in Baodao North City. If it is convenient, let's make an appointment? In fact, the treatment effect during this period was not particularly good, so Mr. Li has basically stopped participating in any meetings and company work. But he insisted on taking a day to chat with you."
"I have just finished what I was doing, and I can apply for the island entry permit tomorrow." Meng Fanqi felt a little strange. Although he had made a name for himself in the AI industry, for a senior of Li Kaifu's level, there seemed to be no right and wrong. No.
Especially considering that his current physical condition is not very good.
"But it should be completed in two weeks."
Meng Fanqi asked the secretary, but she didn't know the specific reason. Meng Fanqi suppressed his curiosity and made an appointment to meet in mid-January.
It takes Shang Hai less than two or three hours to fly to Taoyuan, Bei City, which is actually closer than going to Yanjing. He has really never been to Treasure Island in his life, so it would be nice to go and see Li Kaifu and take a walk.
It's just a permit to enter the island, but it's like a visa, which is very uncomfortable to apply for.