Musk’s main business scope is heavy industries such as automobiles and rockets. In fact, it originally had little to do with artificial intelligence at this point in time.
However, he is indeed a person with extremely cutting-edge and radical ideas. Building ordinary cars is not his style.
Tesla will not only have electricity, but also autonomous driving!
Not only that, but in the face of Meng Fanqi's successive breakthroughs in visual algorithms, he had a bold idea.
That is, he hopes to build a pure computer vision system for his own Tesla electric car industry without resorting to other technical means.
This was the main reason why he came to look for Meng Fanqi again this time to pursue technological breakthroughs.
Personally speaking, he is actually very satisfied with the results last time, but he set his goal too high, and it is not enough to achieve it.
Autonomous driving had some good results before the development of deep learning, but most of them were based on radar and sensors.
It mainly uses lidar or other sensors to detect objects and the distance between these objects and the vehicle.
However, Musk feels that this is not the same as the way humans operate vehicles and is too uncool.
Think about how humans drive?
When a person drives a car, he is basically purely visual and can drive just by looking. Those mirrors on vehicles are mainly for people to see around and behind them.
There may occasionally be some auditory auxiliary effects, such as whistles, but they are not particularly critical. Mainly the visual system plays a role.
Musk calls it first principles thinking. He hopes to make an intelligent system that drives the vehicle entirely according to human logic, rather than relying on sensors. After all, humans do not have such superpowers.
However, the vision system is completely based on a large number of cameras and relies heavily on high-precision detection algorithms, which will cause many problems.
What if something is detected that has not been seen in the dataset? Can it still be detected?
The sensor method based on lidar can always detect particles and objects no matter what it encounters. Its principle is not like that of humans, at least it is not so easy to hit directly.
It’s hard to say that an intelligent system that relies purely on vision must first use the network to process the image and then analyze it.
Once the analysis is wrong and there is a misjudgment, there is no doubt that a collision will occur, a vehicle accident will definitely occur, and one head will be injured and die.
Musk’s radical technology strategy and preferences have led to a problem where artificial intelligence algorithms have too much to do.
If you want to completely abandon sensors, you must install on-board cameras in all directions to ensure that you can see clearly from the front, rear, left, and right.
In addition, there is another important thing, which is the estimation of distance.
It is too easy for humans to judge the distance based on a picture, but this is not an easy task for artificial intelligence vision algorithms.
With current technical conditions, very complex annotation is required to analyze the distance between various parts and pixels in the sample image.
Because pictures are 2D plane after all, and autonomous driving is a task that requires a good grasp of spatial distance.
It is necessary to reconstruct a three-dimensional space through a large number of plane pictures from different angles, even a three-dimensional space from a bird's-eye view.
But now this is just a castle in the air. Musk's reason for contacting Meng Fanqi again is very simple. He just hopes that the neural network as the backbone can be faster or the calculation amount can be smaller.
Otherwise, given the current situation, it would be difficult for Tesla to afford this amount of computing.
In fact, Musk doesn't have particularly high hopes for this matter. In his opinion, the plan Meng Fanqi gave last time was already ridiculously good.
At this point in time when everyone has just started to reproduce DreamNet and has not yet understood the principle of residuals and some variants, Meng Fanqi has already done quite a lot of experiments on other computing devices on various platforms.
As a result, by optimizing the operator structure and adjusting the specific calculation process, the amount of parameters of this core backbone network has been reduced by nearly ten times.
The calculation is so much faster, but the performance has not changed. This is already very remarkable.
Musk asked this question casually in private.
But his reputation was so great, and the things he had done in the past were so crazy, that when Meng Fanqi heard his rather deep and magnetic voice, he took it seriously.
I really thought this was a very serious demand.
"The popularity of autonomous driving is indeed rising rapidly. It is not a loss if I do some optimization work specifically in this aspect."
While Meng Fanqi took advantage of his rebirth and began to buy the stocks of some car companies, he also began to implement a clever method of accelerating and saving memory.
This new optimization method is called reparameterization of the network structure.
In the past six months, the rapid improvement in the performance of visual methods comes from the residual method proposed by Meng Fanqi, which changes y = F(x) into y = F(x)+ x.
The writing method here is relatively simple. A series of complex operations are abstractly summarized as F(). In the actual operation process, this F() is still relatively complicated, and it often takes a while to calculate.
But when calculating, there is a problem. When the original y = F(x) operation starts, there is no need to continue to store the variable x, because it is already participating in the operation of F(x).
During the operation, it will become other intermediate variables, and then finally become the y we want.
But in the residual method, y = F(x)+ x, the original input of x cannot be discarded.
There must be space that has been occupied to store this x, because it is still waiting to be added at the end.
In more complex tasks with higher resolution, the size of this variable is considerable.
Is there any way to avoid this situation? After the avoidance, can the performance improvement brought by the residual method not be affected?
The answer is of course yes, it can be done.
The core idea of the structural re-parameterization that Meng Fanqi plans to implement is the separation of model training and actual use reasoning.
First, a series of structures (generally used for training) is constructed, and its parameters are equivalently converted into another set of parameters (generally used for inference), thereby equivalently transforming this series of structures into another series of structures.
In real scenarios, training resources are generally very abundant and can be obtained on large servers.
During inference, computing resources are often limited, so everyone is more concerned about the cost and performance during inference.
You want to have a larger structure during training and have certain good properties, such as particularly good performance and particularly high accuracy.
But when reasoning, the structure is made smaller and faster, and at the same time it is mathematically equivalent to a large structure.
Meng Fanqi's new approach provides this possibility. He believes that heavy parameters + reduced computing power of mobile network will become a major catalyst in the field of autonomous driving. Latest website:
Our website: