Home » Liquid AI’s New Model: Fast AI on Your Phone

Liquid AI’s New Model: Fast AI on Your Phone

Liquid AI’s LFM

Liquid AI recently introduced LFM2-VL and it might alter the process with which we utilize AI. The new technology would enable the vision and language AI to be deployed on your smartphone, laptop, or even smartwatch. It is quick, perhaps two times quicker than other models. That is to say that AI can also be processed in your smartphone, not as before in massive data centers.

What is Liquid AI?

The Liquid AI is a product of the MIT CSIL, an established AI research unit. They do not create large but equivalent AI models, such as most firms. They are instead finding out new means of developing AI. Their models based on liquid foundations (LFMs) are based on math and signal processing. This makes the models light and quick and more flexible. Their product is the key of efficiency.

What’s New with LFM2-VL?

LFM2-VL is a form of vision language models. They are designed to behave responsively and appear to perform well in smaller devices.

There are two versions

  • LFM2-VL 450M: This smaller model is meant on by the devices with very little memory.
  • LFM2-VL 1.6B: It is a more powerful version of the former but it still runs on a single GPU or a good mobile device.


Normally, large servers would be required to conduct models as powerful as these. This is a radical improvement since these models are faster as compared with the other models of the same category. According to Liquid AI they can run twice as fast on GPUs. This gives a significant increase in responsiveness, something that can make AI a lot more responsive

How are These Models Built?

LFM2-VL was developed by Liquid AI in an alternative way. The model consists of three big components

  • Language Model Backbone: This is the brain that is used in handling text. The larger model is constructed with LFM2 1.2B and the smaller one was with LFM2 350M.
  • Vision Encoder: This part handles images. The problem is they are using SIGLIP2 NLEX encoders. The larger one has nearly 400 million parameters to obtain a detailed image understanding. A slower and lighter 86 million parameter encoder is used in the smaller one.
  • Multimodal Projector: This combines the text and vision parts. Moreover, it applies the method of pixel unshuffle, which reduces the extraneous detail in images.

The models use real size images which could be of even a size as 512×512 pixels. When a picture is large, then it is divided into squares. The model examines each component in no loss of detail. In the bigger version there is also the look of a small thumbnail at the entire picture. In that manner, it receives both the fine grained details and the macro picture.

The multimodal projector exploits pixel unshuffle to minimize the image tokens. This speeds the model and maintains quality at a good scale The models also can be modified. You can alternate settings between the speed and accuracy. This renders them highly flexible

Do you need to know more about the underpinning technology of LFM2-VL? Investigate in pixel shuffling, and multimodal projection.

Training the Models

The preparation of the training was well planned The backbone model was trained first. Later they merged vision and language and incrementally varying amounts of text and image information. It was predominantly text based at the beginning but by the end it was more of a combination, which gives the models a good blend of both. Lastly they optimised the models to image understanding. Overall, they processed approximately 100 billion multimodal tokens, both through open-source data as well as their own synthetic vision data.

How Well Do They Perform?

The models have good benchmarks:

  • Real world QA results using the 1.6 billion model showed 65.23, quite comparable to intern VL3.
  • On info VQA it achieved 58.68.
  • On OCR bench it scored 742.

The smaller version performed much better with 52.29 score on real-QA and 655 OCR bench. They are also high-speed. These models are more than twice as fast as others in a test when using a 1024 x1024 image on a short prompt.

It comes down to performance, to real uses. When you are developing a smart camera or a robot, you need every second. Shaving off processing time to one or two seconds saves a great deal of time, e.g., a change in three or four seconds to one or two seconds.

Easy to Use

Liquid AI has ensured that LFM2-VL can be used together with hugging face transformers. They provide example code in llama.cpp, CPP and they support quantization. Quantization will allow you to execute the models with less memory using smaller data types. You can run small AI models on the mobile with their Leap platform which, moreover, was launched in July. It is compatible with iOS, Android and others. Additionally, they even have an app called Apollo where programmers can test all of it with offline testing.

Moving AI Off the Cloud

Liquid AI aims at cutting down our reliance on cloud. They want your device to perform AI operations at the local, rather than transmitting data to large servers. This is preferable to:

  • Privacy
  • Cost
  • Speed

Licensing

The models are available on an open license LFM1.0. It is built upon Apache 2.0, however, it has conditions. Research and commercial projects utilizing the models can be undertaken by small companies that have less than 10 million revenue. Bigger organizations should get in touch with Liquid AI to get a commercial license.

How Can These Models Be Used?

Liquid AI suggests uses like:

  • Real-time image captioning
  • Multimodal chatbots
  • Visual search
  • Robotics
  • IoT systems
  • Smart cameras

The aim is to shift AI out of giant cloud environments. Liquid AI is demonstrating that one can achieve high accuracy and real time performance on devices we already own. This is where AI will be: personal, quick and cheap to operate.

Leave a Reply

Your email address will not be published. Required fields are marked *