Table of Contents
Magika Introduction
Features of Magika
Performance of Magika
Magika Online Example
Get started with Magika quickly
Install magika
Use in the browser magika
Using magika in Node.js
Reference materials
Home Technology peripherals AI 1MB of magical AI detects millions of files with 99% accuracy!

1MB of magical AI detects millions of files with 99% accuracy!

Apr 08, 2024 am 09:22 AM
python server AI ai

In web development, file type detection before uploading files to the server is crucial. This step can not only ensure the security of the server and users, intercept possible malicious files, but also ensure that the uploaded files are complete and meet expectations, improving data compliance. At the same time, by providing timely feedback and guidance to users, it can also improve user experience and avoid unnecessary confusion.

Before, Brother Abao introduced "How does JavaScript detect the type of file?" Now that we have entered the AI ​​era, we must keep pace with the times. Next, Brother Abao will introduce how to use Google’s open source Magika[1] tool to achieve accurate file type detection.

1MB of magical AI detects millions of files with 99% accuracy!Picture

Magika Introduction

Magika is a novel artificial intelligence file classification and detection tool that relies on the latest deep learning technology to provide Accurate detection. It uses a highly optimized custom Keras model that weighs only about 1MB and enables accurate file identification in milliseconds even when running on a single CPU.

In evaluations on over 1 million files and over 100 content types (covering binary and text file formats), Magika achieved over 99% precision and recall. Magika is used at scale to keep Google users safe by routing Gmail, Drive, and Safe Browsing files to the appropriate security and content policy scanners.

Features of Magika

  • Supports detection of more than 100 file types.
  • Supports multiple usage methods such as Python command line, Python API and experimental TFJS version.
  • After the model is loaded (this is a one-time overhead), inference time per file is approximately 5 milliseconds.
  • Near-constant inference time regardless of file size. Magika only uses a limited subset of file bytes.
  • Support batch processing: Support sending multiple files to the command line and API at the same time, Magika will use batch processing to speed up inference time.
  • Trained on a dataset of over 25 million files across 100+ content types.
  • After large-scale evaluation, Magika’s average precision and recall reached over 99%, outperforming existing methods.
  • Magika uses a per-content-type threshold system to determine whether to "trust" a model's predictions, or whether to return a generic label such as "Generic Text Document" or "Unknown Binary Data."
  • Supports three different prediction modes to adjust tolerance for errors: high confidence, medium confidence and best guess.

Performance of Magika

1MB of magical AI detects millions of files with 99% accuracy!Picture

In terms of performance, Magika, with its AI model and large training data set, has When evaluated on a 1M file benchmark of over 100 file types, its performance is approximately 20% higher than other existing tools. Broken down by file type, we see greater performance improvements for text files, including code files and configuration files that other tools may have trouble processing.

1MB of magical AI detects millions of files with 99% accuracy!Picture

Magika Online Example

Magika supports browser and Node.js environment, you can access Web Demo[2] website to experience its functionality.

1MB of magical AI detects millions of files with 99% accuracy!Picture

Get started with Magika quickly

Install magika

npm install magikaorpnpm add magika
Copy after login

Use in the browser magika

import { Magika } from "magika";const file = new File(["# Hello I am a markdown file"], "hello.md");const fileBytes = new Uint8Array(await file.arrayBuffer());const magika = new Magika();await magika.load();const prediction = await magika.identifyBytes(fileBytes);console.log(prediction);
Copy after login

Using magika in Node.js

import { readFile } from "fs/promises";import { MagikaNode as Magika } from "magika";const data = await readFile("some file");const magika = new Magika();await magika.load();const prediction = await magika.identifyBytes(data);console.log(prediction);
Copy after login

The relevant content about Magika is introduced here. If you want to know more about Magika, You can continue reading this article Magika: AI powered fast and efficient file type identification[3].

Reference materials

[1]Magika: https://github.com/google/magika

[2]Web Demo: https://google.github. io/magika/

[3]Magika: AI powered fast and efficient file type identification: https://opensource.googleblog.com/2024/02/magika-ai-powered-fast-and-efficient- file-type-identification.html

The above is the detailed content of 1MB of magical AI detects millions of files with 99% accuracy!. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP and Python: A Deep Dive into Their History PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Golang and Python: Understanding the Differences Golang and Python: Understanding the Differences Apr 18, 2025 am 12:21 AM

The main differences between Golang and Python are concurrency models, type systems, performance and execution speed. 1. Golang uses the CSP model, which is suitable for high concurrent tasks; Python relies on multi-threading and GIL, which is suitable for I/O-intensive tasks. 2. Golang is a static type, and Python is a dynamic type. 3. Golang compiled language execution speed is fast, and Python interpreted language development is fast.

Python vs. C  : Exploring Performance and Efficiency Python vs. C : Exploring Performance and Efficiency Apr 18, 2025 am 12:20 AM

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

Laravel and Python: Finding the Right Tool Laravel and Python: Finding the Right Tool Apr 18, 2025 am 12:14 AM

Laravel is suitable for building web applications quickly, and Python is suitable for projects that require flexibility and versatility. 1) Laravel provides rich features such as ORM and routing, suitable for the PHP ecosystem. 2) Python is known for its concise syntax and a powerful library ecosystem, and is suitable for fields such as web development and data science.

Improve Doctrine entity serialization efficiency: application of sidus/doctrine-serializer-bundle Improve Doctrine entity serialization efficiency: application of sidus/doctrine-serializer-bundle Apr 18, 2025 am 11:42 AM

I had a tough problem when working on a project with a large number of Doctrine entities: Every time the entity is serialized and deserialized, the performance becomes very inefficient, resulting in a significant increase in system response time. I've tried multiple optimization methods, but it doesn't work well. Fortunately, by using sidus/doctrine-serializer-bundle, I successfully solved this problem, significantly improving the performance of the project.

Use Composer to solve dependency injection: application of PSR-11 container interface Use Composer to solve dependency injection: application of PSR-11 container interface Apr 18, 2025 am 07:39 AM

I encountered a common but tricky problem when developing a large PHP project: how to effectively manage and inject dependencies. Initially, I tried using global variables and manual injection, but this not only increased the complexity of the code, it also easily led to errors. Finally, I successfully solved this problem by using the PSR-11 container interface and with the power of Composer.

See all articles