Metaflow
https://docs.metaflow.org/introduction/what-is-metaflow
What is Metaflow
Metaflow is a human-friendly Python library that makes it straightforward to develop, deploy, and operate various kinds of data-intensive applications, in particular those involving data science, ML, and AI. Metaflow was originally developed at Netflix to boost the productivity of data scientists who work on a wide variety of projects, from classical statistics to state-of-the-art deep learning.
Metaflow is available as open-source under the Apache License, Version 2.0.
What does Metaflow do exactly?
Metaflow provides a unified API to the whole infrastructure stack that is required to execute data science projects from prototype to production. Take a look at this simple Metaflow flow that illustrates the concepts:
- Modeling: You can use any Python libraries with Metaflow. Metaflow helps make them available in all environments reliably.
- Deployment: Metaflow supports highly available, production-grade workflow orchestration and other deployment patterns.
- Versioning: Metaflow keeps track of all flows, experiments, and artifacts automatically.
- Orchestration: Metaflow makes it easy to construct workflows and test them locally.
- Compute: Metaflow leverages your cloud account and Kubernetes clusters for scalability.
- Data: Besides managing the data flow inside the workflow, Metaflow provides patterns for accessing data from data warehouses and lakes.
You could use a separate tool for each of these layers but many data scientists prefer using a unified, thoughtfully designed library. This also minimizes the operational burden for engineers who manage the infrastructure.
Why Metaflow
https://docs.metaflow.org/introduction/why-metaflow
1. Modern businesses are eager to utilize data science and ML
In the past, data scientists and ML engineers had to rely on a medley of point solutions and custom systems to build ML and data science applications.
2. What is common in DS/ML applications?
Applications can be built quicker and more robustly if they stand on a common, human-friendly foundation. But what should the foundation cover?
3. All DS/ML applications use data
Data may come in different shapes and sizes and may be loaded from various data stores. However, no matter what data is used, accessing and processing it shouldn't be too cumbersome.
4. DS/ML applications need to perform computation
Some applications require a tremendous amount of compute power - think computer vision - while some do with less. Regardless of the scale, all applications need to perform computation reliably. Thanks to cloud computing, data scientists and ML engineers should be able to utilize elastic compute resources without friction.
5. DS/ML applications consists of multiple interconnected parts
Consider an application that loads data, transforms it, trains a bunch of models, chooses the best performing one, runs inference, and writes the results to a database. Multi-steps workflows like this are a norm in ML. A workflow orchestrator is needed to make sure all steps get executed in order, on time.
6. DS/ML applications evolve over time incrementally
Rarely a real-world application is built and deployed only once. Instead, a typical application is built gradually, through contributions by many people. The project needs to be tracked, organized, and versioned, which enables systematic and continuous improvement over time.
7. DS/ML applications produce business value in various ways
To produce real business value, DS/ML applications can't live in a walled garden. They must be integrated with the surrounding systems seamlessly: Some applications enhance data in a database, some power internal dashboards or microservices, whereas some power user-facing products. There are many such ways to deploy ML in production. The more valuable the application, the more carefully it needs to be operated and monitored as well.
8. DS/ML applications should leverage the best tools available
For many data scientists and ML engineers, the most rewarding part of the project is modeling. Using their domain knowledge and expertise, the modeler should be able to choose the best tool for the job amongst off-the-shelf libraries, such as PyTorch, XGBoost, Scikit Learn, and many others. Or, if necessary, they should be able to use a wholly custom approach.
9. Metaflow covers the full stack of DS/ML infrastructure
Metaflow was originally created at Netflix, motivated by the realization that data scientists and ML engineers need help with all these concerns: Any gaps or friction in the stack can slow down the project drastically. Thanks to a common foundation provided by Metaflow, data scientists can iterate on ideas quickly and deploy them confidently by relying on a well-defined architecture and best practices, shared by everyone in the team.
10. Metaflow takes care of the plumbing, so you can focus on the fun parts
Metaflow provides a robust and user-friendly foundation for a wide spectrum of data-intensive applications, including most data science and ML use cases. Data scientists and ML engineers who know the basics of Python can build their own applications, models, and policies on top of it, while Metaflow takes care of the low-level infrastructure: data, compute, orchestration, and versioning.
11. Metaflow relies on systems that engineers know and trust
Metaflow was designed at Netflix to serve the needs of business-critical ML/DS applications. It relies on proven and scalable infrastructure which works for small and large organizations alike. Metaflow integrates with all the top clouds as well as with Kubernetes and systems around them in a responsible manner. It respects the security and other policies of your company, making engineering teams happy too.
12. Metaflow is used by hundreds of innovative companies
Today, Metaflow powers thousands of ML/DS applications at innovative companies such as Netflix, CNN, SAP, 23andMe, Realtor.com, REA, Coveo, Latana, and hundreds of others across industries. Commercial support for Metaflow is provided by Outerbounds. To hear first-hand experiences from these companies and many others, join the Metaflow Slack.
Beginner Recommender Systems: Episode 3
https://docs.metaflow.org/api
https://docs.outerbounds.com/recsys-tutorial-L2/
标签:ML,applications,scientists,data,DS,Metaflow From: https://www.cnblogs.com/lightsong/p/18638578