首页 > 其他分享 >Metaflow

Metaflow

时间:2024-12-29 11:57:10浏览次数:1  
标签:ML applications scientists data DS Metaflow

Metaflow

https://docs.metaflow.org/introduction/what-is-metaflow

What is Metaflow

Metaflow is a human-friendly Python library that makes it straightforward to develop, deploy, and operate various kinds of data-intensive applications, in particular those involving data science, ML, and AI. Metaflow was originally developed at Netflix to boost the productivity of data scientists who work on a wide variety of projects, from classical statistics to state-of-the-art deep learning.

Metaflow is available as open-source under the Apache License, Version 2.0.

What does Metaflow do exactly?

Metaflow provides a unified API to the whole infrastructure stack that is required to execute data science projects from prototype to production. Take a look at this simple Metaflow flow that illustrates the concepts:

image

You could use a separate tool for each of these layers but many data scientists prefer using a unified, thoughtfully designed library. This also minimizes the operational burden for engineers who manage the infrastructure.

 

Why Metaflow

https://docs.metaflow.org/introduction/why-metaflow

1. Modern businesses are eager to utilize data science and ML

In the past, data scientists and ML engineers had to rely on a medley of point solutions and custom systems to build ML and data science applications.

Many data science opportunities


2. What is common in DS/ML applications?

Applications can be built quicker and more robustly if they stand on a common, human-friendly foundation. But what should the foundation cover?

A solid foundation for all use cases


3. All DS/ML applications use data

Data may come in different shapes and sizes and may be loaded from various data stores. However, no matter what data is used, accessing and processing it shouldn't be too cumbersome.

Data


4. DS/ML applications need to perform computation

Some applications require a tremendous amount of compute power - think computer vision - while some do with less. Regardless of the scale, all applications need to perform computation reliably. Thanks to cloud computing, data scientists and ML engineers should be able to utilize elastic compute resources without friction.

Compute


5. DS/ML applications consists of multiple interconnected parts

Consider an application that loads data, transforms it, trains a bunch of models, chooses the best performing one, runs inference, and writes the results to a database. Multi-steps workflows like this are a norm in ML. A workflow orchestrator is needed to make sure all steps get executed in order, on time.

Orchestration


6. DS/ML applications evolve over time incrementally

Rarely a real-world application is built and deployed only once. Instead, a typical application is built gradually, through contributions by many people. The project needs to be tracked, organized, and versioned, which enables systematic and continuous improvement over time.

Versioning


7. DS/ML applications produce business value in various ways

To produce real business value, DS/ML applications can't live in a walled garden. They must be integrated with the surrounding systems seamlessly: Some applications enhance data in a database, some power internal dashboards or microservices, whereas some power user-facing products. There are many such ways to deploy ML in production. The more valuable the application, the more carefully it needs to be operated and monitored as well.

Deployment


8. DS/ML applications should leverage the best tools available

For many data scientists and ML engineers, the most rewarding part of the project is modeling. Using their domain knowledge and expertise, the modeler should be able to choose the best tool for the job amongst off-the-shelf libraries, such as PyTorch, XGBoost, Scikit Learn, and many others. Or, if necessary, they should be able to use a wholly custom approach.

Modeling


9. Metaflow covers the full stack of DS/ML infrastructure

Metaflow was originally created at Netflix, motivated by the realization that data scientists and ML engineers need help with all these concerns: Any gaps or friction in the stack can slow down the project drastically. Thanks to a common foundation provided by Metaflow, data scientists can iterate on ideas quickly and deploy them confidently by relying on a well-defined architecture and best practices, shared by everyone in the team.

Full-stack Metaflow


10. Metaflow takes care of the plumbing, so you can focus on the fun parts

Metaflow provides a robust and user-friendly foundation for a wide spectrum of data-intensive applications, including most data science and ML use cases. Data scientists and ML engineers who know the basics of Python can build their own applications, models, and policies on top of it, while Metaflow takes care of the low-level infrastructure: data, compute, orchestration, and versioning.

Full stack triangles


11. Metaflow relies on systems that engineers know and trust

Metaflow was designed at Netflix to serve the needs of business-critical ML/DS applications. It relies on proven and scalable infrastructure which works for small and large organizations alike. Metaflow integrates with all the top clouds as well as with Kubernetes and systems around them in a responsible manner. It respects the security and other policies of your company, making engineering teams happy too.

Existing infrastructure


12. Metaflow is used by hundreds of innovative companies

Today, Metaflow powers thousands of ML/DS applications at innovative companies such as Netflix, CNN, SAP, 23andMe, Realtor.com, REA, Coveo, Latana, and hundreds of others across industries. Commercial support for Metaflow is provided by Outerbounds. To hear first-hand experiences from these companies and many others, join the Metaflow Slack.

 

 

Beginner Recommender Systems: Episode 3

https://docs.metaflow.org/api

https://docs.outerbounds.com/recsys-tutorial-L2/

 

标签:ML,applications,scientists,data,DS,Metaflow
From: https://www.cnblogs.com/lightsong/p/18638578

相关文章

  • metaflow netflix开源的数据科学ML&AI 框架
    metaflownetflix开源的数据科学ML&AI框架,类似的也有kedro,metaflow相比kedro来说对于云原生周边支持的更加友好一张图了解metaflow能力如下图,很清晰的说明了metaflow的能力,而且都是基于代码声明的说明metaflow官方文档比较详细,使用上相比kedro基于项目代码结构的模式......