current location:Home > Technical Articles > Technology peripherals > AI

  • Doubao Big Model Team releases new Detail Image Caption evaluation benchmark to improve the reliability of VLM Caption evaluation
    Doubao Big Model Team releases new Detail Image Caption evaluation benchmark to improve the reliability of VLM Caption evaluation
    The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The current visual language model (VLM) mainly performs performance evaluation through QA question and answer format, but lacks evaluation of the basic understanding of the model, such as reliable evaluation methods for detailimagecaption performance. In response to this problem, the Chinese Academy of Sciences,
    AI 927 2024-07-18 20:10:02
  • Samsung China Galaxy Z series new products access bean bag large model
    Samsung China Galaxy Z series new products access bean bag large model
    On July 17, Samsung Electronics released a new generation of Galaxy Z series products for the Chinese market. At the meeting, Samsung Electronics and Volcano Engine officially announced their cooperation to connect bean bag models to the smart assistants and AI vision of Galaxy Z Fold6 and Galaxy Z Flip 6 mobile phones to enhance the smart application experience of mobile phones. Previously, Samsung announced its in-depth cooperation with Google Gemini at overseas new product launches. In China, it selected manufacturers such as Volcano Engine as large model partners. fenye caption: The smart assistant and AI visual access bean bag model of Samsung Galaxy Z Fold6 and Galaxy Z Flip 6 mobile phones. In addition to the AI ​​functions that have been disclosed such as circle search, real-time translation, recording transcription, etc., this time
    AI 646 2024-07-18 20:07:33
  • Abandoning the visual encoder, this 'native version' multi-modal large model is also comparable to mainstream methods
    Abandoning the visual encoder, this 'native version' multi-modal large model is also comparable to mainstream methods
    The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com Diao Haiwen is a doctoral student at Dalian University of Technology, and his supervisor is Professor Lu Huchuan. Currently interning at Beijing Zhiyuan Artificial Intelligence Research Institute, the instructor is Dr. Wang Xinlong. His research interests are vision and language, efficient transfer of large models, multi-modal large models, etc. Let’s make Cui together
    AI 423 2024-07-18 19:21:11
  • Are all these VLMs blind? GPT-4o and Sonnet-3.5 successively failed the 'vision' test
    Are all these VLMs blind? GPT-4o and Sonnet-3.5 successively failed the 'vision' test
    The four major VLMs are all trying to fool the blind? Let the most popular SOTA models (GPT-4o, Gemini-1.5, Sonnet-3, Sonnet-3.5) count how many intersections there are between two lines. Will they perform better than humans? The answer is probably no. Since the launch of GPT-4V, visual language models (VLMs) have made the intelligence of large models a big step closer to the level of artificial intelligence we imagined. VLMs can both understand images and use language to describe what they see, and perform complex tasks based on these understandings. For example, if you send the VLM model a picture of a dining table and a picture of a menu, it can extract the number of beer bottles and the unit price on the menu from the two pictures, and calculate
    AI 690 2024-07-18 18:18:02
  • MotionClone: ​​No training required, one-click cloning of video movements
    MotionClone: ​​No training required, one-click cloning of video movements
    The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com No training or fine-tuning is required. The movement of the reference video can be cloned in the new scene specified by the prompt word. Whether it is global camera movement or local body movement, it can be done with one click. Paper: https://arxiv.org/abs/2406.05
    AI 1045 2024-07-18 17:06:12
  • A new track for humans to imitate AI, AI: When it comes to madness, you are my father
    A new track for humans to imitate AI, AI: When it comes to madness, you are my father
    Editor of the report on the power of machines: Yang Wen’s AI was led astray by humans! This world is so crazy... Recently, a bunch of funny videos have popped up on social media, under the banner of AI, real people cosplaying with AI, and Douyin even has a hot topic - the Human Imitation AI Contest. (The video comes from Douyin blogger "Guan Ni Luan Shi") Video link: https://mp.weixin.qq.com/s/1DVc8skecSsO0a9QcklZlwThe routines are all the same: an old photo on the left, and "AI Repair" on the right ” subtitles, the bloody “plot” of missing brain stems is actually performed by real people. -1-AI: This is the first time I was impersonated, but I didn’t expect it to be worse than mine.
    AI 1788 2024-07-18 16:51:08
  • The inference efficiency of large models has been improved by 3 times without loss, and the University of Waterloo, Peking University and other institutions released EAGLE
    The inference efficiency of large models has been improved by 3 times without loss, and the University of Waterloo, Peking University and other institutions released EAGLE
    Large language models (LLM) are increasingly used in various fields. However, their text generation process is expensive and slow. This inefficiency is attributed to the operating rules of autoregressive decoding: the generation of each word (token) requires a forward propagation, requiring access to an LLM of billions to hundreds of billions of parameters. This results in traditional autoregressive decoding being slower. Recently, the University of Waterloo, the Canadian Vector Institute, Peking University and other institutions jointly released EAGLE, which aims to improve the inference speed of large language models while ensuring a consistent distribution of model output text. This method extrapolates the second top-level feature vector of LLM, which can significantly improve the generation efficiency. Technical report: https://sites.google.com/view
    AI 1044 2024-07-18 14:43:48
  • To effectively evaluate the actual performance of Agent, the new online evaluation framework WebCanvas is here
    To effectively evaluate the actual performance of Agent, the new online evaluation framework WebCanvas is here
    Pan Yichen: First-year master’s student at Zhejiang University. Kong Dehan: Head of Model Algorithm at Cross Star Technology. Zhou Sida: A 2024 graduate of Nanchang University, he will study for a master's degree at Xi'an University of Electronic Science and Technology. Cui Cheng: A 2024 graduate of Zhejiang University of Traditional Chinese Medicine and will study for a master's degree at Suzhou University. Pan Yichen, Zhou Sida, and Cui Cheng jointly completed the research work of this paper as algorithm interns at Cross Star Technology. In today's era of rapid technological development, Large Language Model (LLM) is changing the way we interact with the digital world at an unprecedented speed. LLM-based intelligent agents (LLMAgent) are gradually being integrated from simple information search to complex web page operations.
    AI 636 2024-07-18 14:04:51
  • AKOOL supports the Cannes Advertising Awards and launches a revolutionary real-time digital human platform
    AKOOL supports the Cannes Advertising Awards and launches a revolutionary real-time digital human platform
    As the 2024 European Cup is in full swing, a football match video created by French telecommunications company Orange also quickly became popular. In the video, we saw Mbappe, Giroud, Griezmann... In fact, all the athletes running on the court are not real people, but virtual characters generated by artificial intelligence. With its outstanding creativity and uniqueness, the work won the "Oscar" in the advertising creative marketing industry - the sports category award at this year's Cannes Lions International Festival of Creativity. AKOOL provided core technical support for this award-winning work. The AI ​​facial capture system they developed can accurately capture the subtle expressions and movements of human faces. With the support of carefully designed rendering technology, the virtual characters in the work
    AI 565 2024-07-18 09:26:11
  • 178 pages, 128 cases, comprehensive evaluation of GPT-4V in the medical field, still far from clinical application and practical decision-making
    178 pages, 128 cases, comprehensive evaluation of GPT-4V in the medical field, still far from clinical application and practical decision-making
    Shanghai Jiao Tong University & Shanghai AILab released a 178-page GPT-4V medical case review, comprehensively revealing the visual performance of GPT-4V in the medical field for the first time. Driven by large-scale basic models, the development of artificial intelligence has made great progress recently, especially OpenAI's GPT-4. Its powerful capabilities in question and answer and knowledge have lit up the Eureka moment in the AI ​​field, causing widespread public concern. GPT-4V(ision) is OpenAI’s latest multi-modal basic model. Compared with GPT-4, it adds image and voice input capabilities. This study aims to evaluate the performance of GPT-4V(ision) in the field of multi-modal medical diagnosis through case analysis. A total of 1
    AI 1262 2024-07-18 06:20:10
  • ICML 2024 AI for Math Workshop call for papers and challenge launched!
    ICML 2024 AI for Math Workshop call for papers and challenge launched!
    ICML2024, AIforMathWorkshop Workshop on Formal and Natural Language AI Mathematical Reasoning Time: July 26/27, 2024 Location: Vienna, Austria. Held simultaneously on site and online. Workshop homepage: https://sites.google.com/view/ai4mathworkshopicml2024/ Mathematical reasoning is the most challenging and deep part of human intelligence. In the development process of mathematical reasoning, humans have summarized various formal languages, which can strictly describe mathematical problems and proof processes. In recent years, machine learning algorithms and large-scale language models are gradually approaching or even surpassing human performance in some mathematical reasoning.
    AI 753 2024-07-18 05:36:50
  • Meta develops System 2 distillation technology, and the Llama 2 dialogue model task accuracy is close to 100%
    Meta develops System 2 distillation technology, and the Llama 2 dialogue model task accuracy is close to 100%
    The researchers said that if System2 distillation can become an important feature of future continuous learning AI systems, it can further improve the performance of inference tasks where System2 does not perform so well. When it comes to large language model (LLM) strategies, there are generally two types, one is immediate System1 (fast response), and the other is System2 (slow thinking). Where System2 reasoning favors thoughtful thinking, generative intermediate thinking allows models (or humans) to reason and plan in order to successfully complete a task or respond to instructions. In System2 reasoning, effortful mental activity is required, especially in situations where System1 (more automatic thinking) can go wrong. Therefore, System1 is
    AI 1175 2024-07-18 05:07:20
  • To directly address the real AGI needs of Party A, the Artificial Intelligence Empowerment Industry Integration Development Forum was successfully held
    To directly address the real AGI needs of Party A, the Artificial Intelligence Empowerment Industry Integration Development Forum was successfully held
    On July 6, the "2024 WAIC Artificial Intelligence Empowerment Industry Integration Development Forum" was grandly held at the World Expo Exhibition and Convention Center. The main topic of this forum is to discuss issues related to artificial intelligence empowering new industrialization and promoting the development of industrial integration, including leadership speeches, signing ceremonies, keynote speeches, release of artificial intelligence scenario requirements for central and state-owned enterprises, and roundtable forums. Many enterprises from central state-owned enterprises and artificial intelligence fields participated, including China Electronic Information Industry Development Research Institute, China Mobile Research Institute, Sinopec Shengli Oilfield, State Grid Customer Service Center, China Electronics Yuchuang, China Southern Power Grid Digital Grid Group, Damo Institute, Baidu Smart Cloud, Innovation Qizhi, etc. Guests attending the conference focused on the application practice of artificial intelligence in different fields, the development and application of large models, and intelligent operation and maintenance.
    AI 575 2024-07-18 03:14:57
  • How can fashionable AIGC marketers achieve a win-win situation between 'lizi' and 'face'?
    How can fashionable AIGC marketers achieve a win-win situation between 'lizi' and 'face'?
    Innovation and security of AIGC technology in the marketing field In the past year, AI technology has set off a wave of change in all walks of life. The marketing circle, which has always been “fashionable”, was the first to embrace AIGC technology. Relevant data shows that in 2023, nearly half of my country's advertisers will apply AIGC technology in online marketing activities, and more than 90% of these applications focus on content creation and creative development. This new technology-driven advertising and marketing model is gradually taking shape, bringing more possibilities for advertisers to reduce costs and increase efficiency. However, while AIGC technology is making great use in the marketing field, it also comes with many challenges. For example, AIGC technology may cause content risks when generating marketing materials, and heavily invested marketing activities may accidentally serve as a wedding dress for illegal products. So,
    AI 874 2024-07-18 01:41:21
  • ICML 2024 | Gradient checkpointing too slow? Without slowing down and saving video memory, LowMemoryBP greatly improves backpropagation video memory efficiency
    ICML 2024 | Gradient checkpointing too slow? Without slowing down and saving video memory, LowMemoryBP greatly improves backpropagation video memory efficiency
    The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The first author of this paper is Yang Yuchen, a second-year master's student in the School of Statistics and Data Science of Nankai University, and his advisor is Associate Professor Xu Jun in the School of Statistics and Data Science of Nankai University. The research focus of Professor Xu Jun’s team is computer vision, generative AI and efficient machine learning, and they are working on top
    AI 775 2024-07-18 01:39:51

Tool Recommendations

jQuery enterprise message form contact code

jQuery enterprise message form contact code is a simple and practical enterprise message form and contact us introduction page code.
form button
2024-02-29

HTML5 MP3 music box playback effects

HTML5 MP3 music box playback special effect is an mp3 music player based on HTML5 css3 to create cute music box emoticons and click the switch button.

HTML5 cool particle animation navigation menu special effects

HTML5 cool particle animation navigation menu special effect is a special effect that changes color when the navigation menu is hovered by the mouse.
Menu navigation
2024-02-29

jQuery visual form drag and drop editing code

jQuery visual form drag and drop editing code is a visual form based on jQuery and bootstrap framework.
form button
2024-02-29

Organic fruit and vegetable supplier web template Bootstrap5

An organic fruit and vegetable supplier web template-Bootstrap5
Bootstrap template
2023-02-03

Bootstrap3 multifunctional data information background management responsive web page template-Novus

Bootstrap3 multifunctional data information background management responsive web page template-Novus
backend template
2023-02-02

Real estate resource service platform web page template Bootstrap5

Real estate resource service platform web page template Bootstrap5
Bootstrap template
2023-02-02

Simple resume information web template Bootstrap4

Simple resume information web template Bootstrap4
Bootstrap template
2023-02-02

Cute summer elements vector material (EPS PNG)

This is a cute summer element vector material, including the sun, sun hat, coconut tree, bikini, airplane, watermelon, ice cream, ice cream, cold drink, swimming ring, flip-flops, pineapple, conch, shell, starfish, crab, Lemons, sunscreen, sunglasses, etc., the materials are provided in EPS and PNG formats, including JPG previews.
PNG material
2024-05-09

Four red 2023 graduation badges vector material (AI EPS PNG)

This is a red 2023 graduation badge vector material, four in total, available in AI, EPS and PNG formats, including JPG preview.
PNG material
2024-02-29

Singing bird and cart filled with flowers design spring banner vector material (AI EPS)

This is a spring banner vector material designed with singing birds and a cart full of flowers. It is available in AI and EPS formats, including JPG preview.
banner picture
2024-02-29

Golden graduation cap vector material (EPS PNG)

This is a golden graduation cap vector material, available in EPS and PNG formats, including JPG preview.
PNG material
2024-02-27

Home Decor Cleaning and Repair Service Company Website Template

Home Decoration Cleaning and Maintenance Service Company Website Template is a website template download suitable for promotional websites that provide home decoration, cleaning, maintenance and other service organizations. Tip: This template calls the Google font library, and the page may open slowly.
Front-end template
2024-05-09

Fresh color personal resume guide page template

Fresh color matching personal job application resume guide page template is a personal job search resume work display guide page web template download suitable for fresh color matching style. Tip: This template calls the Google font library, and the page may open slowly.
Front-end template
2024-02-29

Designer Creative Job Resume Web Template

Designer Creative Job Resume Web Template is a downloadable web template for personal job resume display suitable for various designer positions. Tip: This template calls the Google font library, and the page may open slowly.
Front-end template
2024-02-28

Modern engineering construction company website template

The modern engineering and construction company website template is a downloadable website template suitable for promotion of the engineering and construction service industry. Tip: This template calls the Google font library, and the page may open slowly.
Front-end template
2024-02-28