• <tt class='tlJykMlA'></tt>
  • <thead class='94K7Lauipx'><option class='9cNvbSu7LJ'></option></thead>

    <em class='jepytdnyfeW3'><b class='oR1Ys12nk'><td class='6rzzLTN'></td></b></em>

  • <dl class='Td0EmkF'><b class='qoWhaibfJ9'></b></dl>

  • <span class='lS1G'></span>

     

    KDnuggets 500彩票下载app二维码 » News » 2020 » Feb » Tutorials, Overviews » 12-Hour Machine Learning Challenge: Build & deploy an app with Streamlit and DevOps tools ( 20:n05 )

    12-Hour Machine Learning Challenge: Build & deploy an app with Streamlit and DevOps tools


     
     

    This article will present the knowledge, process, tools, and frameworks required for completing a 12-hour ML challenge. I hope you500彩票下载app二维码 can find it useful for you500彩票下载app二维码r personal or professional projects.



    By , Engagement Lead at Dessa

    Figure

    Photo by on

     

    TL;DR —In this article, I want to share my learnings, process, tools, and frameworks for completing a 12-hour ML challenge. I hope you500彩票下载app二维码 can find it useful for you500彩票下载app二维码r personal or professional projects.

    Here is a table of content to help you500彩票下载app二维码 navigate:

    • Part 1: Find a Good Problem
    • Part 2: Define the Constraints
    • Part 3: Think, Simplify, & Prioritize
    • Part 4: Sprint Planning
    • Part 5: the App
    • Part 6: Lessons Learned
    • Bonus: Process & Tools for Lazy Programmers

    Disclaimer: this is not sponsored by Streamlit, any of the tools I mention, nor any of the firms I work for.

    Like What You Read? Follow me on , , or .

     

    Step 1: Find a Good Problem (The Christmas Problem)

     

     

    Well, Christmas. It used to be the time of the year when I hung out with my wife and puppy on the couch and binge-watched movies and shows.

    Then, this Christmas. Something changed. For some reason, most of the stuff I find on Netflix or YouTube seemed to be quite boring. Maybe I’ve reached a tipping point of zero from watching a similar content pushed by the recommendation algorithms. The algorithms that know me so well (maybe too well).

     

    I realized a problem: I am trapped by the recommendation algorithms that know me so well — (this post takes a more design lens).

    I can’t seem to find stuff outside of the content bubble. Everything the algorithms think I am interested in has gradually become boring; it’s ironic. I want to get out!

    The point is this: find a problem that’s annoying enough. It doesn’t have to be curing cancer or eliminating hunger (if you500彩票下载app二维码 can, bravo!), just something meaningful enough so you500彩票下载app二维码 are willing to commit and get started.

     

    Step 2: Define Constraints (the 12 Hour Challenge)

     
    Inspired by my friend, ’s of web-app development, I decided to do something similar, but for apps that have an ML component. In short, here are the constraints:

    • ~12 hours of total working time; they don’t need to be consecutive hours
    • must ship a usable and stable app for users other than myself
    • must have an ML component, but no unnecessary complexity
    • must share the work & learnings with others (a.k.a write this post)
    • (the experience must be fun)

    Why having a deadline? According to Matt:

    … having a deadline focuses individuals on prioritizing what they need to focus on in order to get their project to a workable state. Individuals must factor in the time it takes to design a project, to come up with a solution, deal with any unforeseen technicalities and everything in between to make it to the deadline.

    (So, why only 12 hours, instead of 48 hours? Well, I am not as intense as Matt. If you500彩票下载app二维码 decide to do this, pick a time frame that works the best for you500彩票下载app二维码 and sticks with it. The point is to execute and ship.)

    Here is my rough time budget for all the work that’s involved:

    • 2 hours: have a rough design of the app (e.g. research, UX, architecture).
    • 8 hours: re-design, build, and test the app iteratively.
    • 2 hours: write, edit, and publish this article (and ).

     

    Step 3: Think, Simplify, Prioritize, and Repeats

     
    Before coding, I need to address a few important questions to 1) crystalize what exactly I need to build and 2) prioritize what to build in the 12 hours. Although not exhaustive, here are some guiding questions:

    Putting a Product hat on, who are the users? What do the users want and need? How do their needs differ by segment? Which user group should I target first? What are the features to address the needs? …

    Putting a Data Science hat on, what data do I need and is available? Do I need BI analytics or predictive model? What business and model metric should I use? How do I measure performance? …

     

    Putting a Designer hat on, what emotions does the app need to trigger? What colour scheme should I use? How does the user journey look like? Given the features, what is the best user interaction? …

    Putting an Engineer hat on, how many users does the app need to support at a time? What does the development to deployment process look like? What technology stack to use that can balance prototyping speed and scalability? …

    Putting a Business hat on, how do I monetize the app? How to grow and sustain the audience for the app? How to minimize the cost of running technology and operation? …

     

    As you500彩票下载app二维码 can imagine, this exercise can get overwhelming quickly. Be sure to pull back from the urge of trying to solve everything.

    Ultimately, here are the top three “user wants/needs” I can address and the corresponding features in 8 hours of development:

    • I want surprises: the app should be able to suggest movies I haven’t seen before or different than my normal viewing history. > “Today’s Pick & Filtering”
    • I need to choose: the app should be able to show a trailer and provide some information about the movie quality. > “Trailer and Rating”
    • I want to control: the app should offer a simple way to allow users to control how different the suggestions look like. > “Filter Panel & Smart Exploration”

    Here are a few things I’d love to build, but de-prioritized:

    • user authentication / messaging
    • auto-emailing
    • back-up on the Cloud
    • multi-model recommendation
    • customer service bot

    With the features in mind, here is the rough architecture design of the solution, key components, and their interactions.

    Figure

    Conceptual Architecture Diagram, Author’s Work

     

    Note: this is the output of an iterative process. Yout initiate thinking might look very different. See Lessons Learn for tips on how to decide what to build vs. not.

     

    Step 4: Sprint Planning & Execute

     
    I decided to build this in four 2-hour . Here were the rough outcomes of each sprint:

    Sprint 1: an automated development-to-deployment pipeline; a simple click-able “Today’s Pick” and filtering features served on Heroku.

    Sprint 2: Build out the ETL; a set of automatic test cases for the ETL; improved front-end with YouTube Trailers & Personalized section with dummy data. Run time optimization.

    Sprint 3: Build out API for Smart Exploration. Integrate with front-end with a dummy model. Research on modelling options. More run time optimization.

    Sprint 4: Refactor and optimize a KNN-based Collaborative Filtering model. Add modelling test cases. Code clean-up and more optimization.

     

    Step 5: Ta-dah.

     

     

    YAME was born. Now you500彩票下载app二维码 can use YAME to find something interesting for you500彩票下载app二维码r weekdays, weekends, date nights, and family gatherings. The app aims to provide the convenience of a search engine while offering control without overloading the users.

    Convenience: The landing page has five movies the system recommends. It updates daily. The algorithm picks movies across years and genres; it tries to be unbiased.

    Figure

    Today’s Pick with Trailers

     

    Some control: if you500彩票下载app二维码 don’t like what you500彩票下载app二维码 see or just wonder what’s out there, you500彩票下载app二维码 can choose the year and genre using the panel on the left.

    Figure

    Filtering

     

    More control without sacrificing convenience: If you500彩票下载app二维码 really want something else, you500彩票下载app二维码 can explore based on how “adventurous” you500彩票下载app二维码 feel today with a simple interface. This UI allows users to have an option to choose. Users can decide what they might want to see without being cognitively overloaded.

    Figure

    Smart Exploration with KNN-based Collaborative Filtering

     

     

    Step 6: Lessons Learned

     
    1/ Be safe, be fast, be lazy. Automate tests before anything else. If you500彩票下载app二维码 find you500彩票下载app二维码rself manually testing something regularly, invest a bit of time and automate it. Having and CircleCI saved so much headache. For an ML app, you500彩票下载app二维码 should have two sets of tests. One for (e.g. unit and integration tests), the other for model testing (e.g. minimum performance and edge cases). Having dynamic test cases (inputs that are driven by random numbers) also helps to catch bugs in edge cases that are hard to anticipate.

    Figure

    Development-to-Deployment Workflow, author’s work

     

    2/ Avoid the Kaggle Trap. Since I only budgeted ~4 hours to work on the ML component, the key is to build a just-good-enough model to validate the functionality and usefulness of the ML feature. It’s very easy to fall into the trap of “Kaggle Mode” (e.g. spending lots of time building complex models for small performance gain). I use a Model-UX analysis to help set the boundary. This analysis is not meant to be a scientific exercise, but a tool to keep you500彩票下载app二维码 away from Kaggling.

    Figure

    Model-UX Analysis (Time to Develop), Author’s Analysis

     

    Note: the threshold of the minimum model performance varies on the use case. For example, an app that shows or fraud detection will likely need a very good model performance to convince users of its usefulness.

    So, my strategy is to start with the simplest model: a “model” that’s driven by a random number generator. Although it sounds naive from a modelling standpoint, it adds the greatest value to the UX with the least amount of development time (~5 mins). Users can play with a Personalization feature, which didn’t exist. It doesn’t really matter if it’s providing the “best” recommendation, the key is to validate the feature. Then I evolve the model to a rule-based and KNN-based Collaborative Filtering algorithm.

    3/ Building is fun, prioritizing isn’t. Here are some tips to make it easier:

    1. Start with the most annoying and profitable problem (don’t care too much about profit in this exercise).
    2. Think of an ideal solution & budget how much time you500彩票下载app二维码 need to build it; keep in mind that you500彩票下载app二维码 will likely under-estimate, but it’s okay.
    3. Cut the time to 1/3, re-think the solution and see if you500彩票下载app二维码 are comfortable implementing without a significant amount of research (some research is still good for learning)
    4. Repeat until the scope fits into a 2- to 4-hour timeframe

    If you500彩票下载app二维码 like and want to support YAME, please check out my page. The support will go towards covering the cost of running and improving YAME (e.g. server, website, etc.).

    I hope you500彩票下载app二维码 enjoy this post. I’d love to see you500彩票下载app二维码r work if you500彩票下载app二维码 decide to take on the 12-hour challenge. Connect with me on , , or .

    Until Next Time,
    Ian

     

     

    Bonus: Process & Tools for Lazy Programmers

     
    For anyone who’s interested (and got this far), I wanted a workflow that’s as automated as possible, so I can spend my time designing and coding, instead of doing manual testing or move codes around. Everyone has their own preferences. The key for me is being able to iterate fast and be ready to scale.

    From a tech stack standpoint, here are the tools I chose (also a few alternatives):

    • Python as the programming language for general workflow, ETL, and modelling. (alternative: SQL for ETL, R for Modelling, and Java for workflow)
    • as the front-end tool. It’s python based. Out of the box, it comes with most of the widgets I need for the User Experience; and it’s web- and mobile-friendly. It encouraged me to focus on user experiences much as modelling. Jupyter is great, but I feel like it tends to keep people in the Kaggle Trap. (alternative: Flask, Django, or React for the front end; Jupyter Notebook for Analysis and Model Experimentation)
    • as the back-end database tool. (alternative: GCP, AWS, Azure; note that SQLite doesn’t work with Heroku below if you500彩票下载app二维码 want to follow the same setup)

    From a standpoint, here are the tools:

    • as the IDE (alternative: , )
    • for code versioning (alternative: )
    • for managing test cases and run automatic testing
    • for Continous Integration and Deployment (alternative: )
    • for web-hosting (alternative: Cloud solutions such as GCP, AWS, Azure, or Paperspace)

    If you500彩票下载app二维码 are as lazy as I am as a programmer, I highly recommend you500彩票下载app二维码 to invest the time upfront to set up this DevOps workflow. It saves lots of time from manual testing and deployment. More importantly, it really safeguards you500彩票下载app二维码r codebase from stupid bugs.

    Finally, I used , as a simple , to keep track of stuff I need to do:

    Figure

    Kanban — Author’s work

     

    Note: The reason why I didn’t choose the alternatives is to avoid over-engineering and not being familiar enough to have the efficiency gain.

    If you500彩票下载app二维码 like this article, you500彩票下载app二维码 may also like these …
     


    How I cope with the boring days of deploying Machine Learning
     


    How to Design and Implement Reinforcement Learning for the Real World
     


    How not to apply Agile on an ML project
     


    How to develop and manage a happy data science team
     


    The numbers, five tactical solutions, and a quick survey
     


    One Thing Many Data Scientists Don’t Think Enough About

     
    Bio: is Engagement Lead at Dessa, deploying machine learning at enterprises. He leads business and technical teams to deploy Machine Learning solutions and improve Marketing & Sales for the F100 enterprises.

    . Reposted with permission.

    Related:


    Sign Up

    By subscribing you500彩票下载app二维码 accept KDnuggets Privacy Policy

    500彩票下载app二维码

  • <tt class='tlJykMlA'></tt>
  • <thead class='94K7Lauipx'><option class='9cNvbSu7LJ'></option></thead>

    <em class='jepytdnyfeW3'><b class='oR1Ys12nk'><td class='6rzzLTN'></td></b></em>

  • <dl class='Td0EmkF'><b class='qoWhaibfJ9'></b></dl>

  • <span class='lS1G'></span>