All posts by Max

Machine Learning Enthusiast, Foursquare Engineer, City Guide Ratings, Former Talk Radio Host, Marsbot's Mechanic. @marsbot, @swarmingnow

Episode 1: Bayesian Analysis of the Hawaii Missile Scare

I need to write a post about my recent trip to Cuba! But I got back around 7 and I had time to polish up the next Local Maximum episode.

In the rest of my discussion with Aaron, we discuss the how Bayes rule can apply to news items like the Hawaii ICBM Scare, the murder of DNC Staffer Seth Rich, Medical Diagnosis, Conspiracy Theories, theology, and politics.

As you can see, I’m expanding the range of discussion a bit from the first half of the discussion – at first timidly but in the future boldly. Send your questions and comments to localmaxradio@gmail.com

The Local Maximum – Check out My New Podcast

Exciting news today! This is the launch day of my new podcast, “The Local Maximum”.  Yes, the day has finally arrived.

So far on my guest and solo lineup, I’ll be covering AI, Product Design, Future Technology, and Current Events. The overall blend of topics is still TBD, but I’m going to start with 10 episodes to get a handle on things.

The first episode is with my friend Aaron Bell, and we will be discussing Bayesian Inference, which will be a recurring theme on the show. Aaron also advised me in our show prep that I should have a show notes page. I’ll formalize that a bit more in the future, but for now this blog post is the show notes for episode 0.

In the software/startup world we often launch a minimal product to get the feedback loop started. This is exactly what I’m doing here. There are many things I’d like to improve. It’s a lot harder to explore complex ideas in an audio format than I thought!. But that’s my goal for this project – and I’m going to continue in the pursuit.

To send a question that could be answered on or off the show, email localmaxradio@gmail.com. I’d love to hear your opinion on the first episode, and ideas on ways to make this podcast a success.

The Local Maximum is now available on iTunes. For now, I will host the mp3s here and on soundcloud and the feed is also available on Stitcher.

The book I mentioned in the show is The Theory that Would Not Die by Sharon Bertsch McGrayne.

 

Impress your Friends by Finding the Best Places

Designing the algorithm for Foursquare’s venue ratings is one of the best things I’ve worked on in my career. I hear people tell me that if they want to go to a good place, they make a cutoff on our 1-10 scale, say 8.5, and limit their choices to the select few elite places.

To me that sounds a little strict, but the fact remains that the Foursquare venue ratings are a great way to tell the difference between a good spot and a bad spot, and to assess the overall quality of a restaurant or bar before you go. Stephanie Yang and I spent a lot of time ensuring that our ratings are the best in the business, and I’d put these up against any venue rating system out there in terms of quality and accuracy.

Have you ever wondered how we do it? Well, we don’t give away all the secrets, but Stephanie and I wrote a blog post for the Foursquare Engineering blog called Finding the Perfect 10 where we break down some of the methods we use around venue ratings.


 

The Eclipse Experience

In catching up on my long backlog of potential blog posts, I want to talk about my trip this summer to see the total solar eclipse.

I knew this eclipse was coming for years, but I didn’t actually make my travel plans until a few weeks before. That wasn’t the best idea as it turns out that millions of people were visiting the same small band that stretches coast to coast to see totality. But – I made it work. I booked some flights to Louisville, Kentucky, and planned to see the eclipse down in Nashville.

 

Getting to Kentucky was messy; the flight was cancelled. I almost left the airport resigned to the fact that I wouldn’t be able to fly until the next day. At the last-minute, the American Airlines was able to book me a flight to Cincinnati, and from there I took an Uber to Louisville.

 

I had been proud of myself for coming up with the Louisville trick initially, but I had wondered whether the lack planning on my part was going to cause issues throughout the trip. Even my original shipment of eclipse glasses was cancelled and I had to rely on my second try with Amazon. Fortunately, the rest of the trip went smoothly.

 

For me living in New York City, it’s a treat to drive around with a rental car, particularly in a new and open area like in Kentucky and Tennessee. I enjoyed visiting the cities of Louisville, Bowling Green, and finally Nashville for the main event. As always, Foursquare directed me to the best places to see and eat. Louisville museums, Bowling Green shops and parks, and Nashville music were all great. Nashville got a little crowded with all the visitors, but if you’re willing to adjust and find backup plans it all works.

On the negative side was staying in Glasgow the night before. The hotels everywhere else had been booked far and wide. There’s really nothing there – and the bed I got wasn’t very comfortable. It was one of those cheap motels – but on that day they were charging a substantial premium. At least they had good wi-fi – I stayed up most of the night watching the latest episode of Game of Thrones – Beyond the Wall – and I was glued to my iPad. Particularly in that dragon fight beyond the wall at the end!

For many people, watching something like an eclipse isn’t worth it. It may just a natural phenomenon in the sky caused by the moon passing in front of the sun. But the event has a lot of historical and scientific significance. Ancient civilizations attached significance to this event – both positive and negative. Almost a hundred years ago in 1919, an eclipse was used to confirm Einstein’s theory of relativity. Because stars behind the sun are visible during a total eclipse, we can measure the effect of the sun’s gravity bending that light thus changing its position in the sky. Astronomers still gather data from eclipses to learn about the cosmos.

The eclipse itself – which I saw at Nashville’s Science Center – was a really cool communal experience. It started out as a really hot day, and as the more covered more and more of the sun it became cooler and cooler. The sun’s reflection on the ground – which appear through the leaves – become crescent-shaped (see picture). Finally, when all the light is blocked, you see what looks like a 360-degree sunset. It becomes dusk – and the crickets started coming out and chirping in the early afternoon! For a few minutes – everyone stopped an appreciated the view and the natural wonder that we had the privilege of catching firsthand.

I experience everything except one part. At the last-minute, a cloud passed over and I wasn’t able to see the moon-sun in totality. Fortunately, there’s another one nearby in 2024. Maybe I can plan that one a little further in advance!

Watch this Space in 2018

As the year wraps up, I just wanted to let you all know that I am planning a really exciting new project for 2018 that is going to take my “content-creation” to the next level. If you like to hear about probability, travel, technology, and all the topics I discuss – this will be something to look forward to!

I know that’s cryptic – but I wanted to tease it out before the new year. In addition, I’ve taken 2 really fascinating trips this year that I wanted to blog about. The first was my trip to Kentucky and Tennessee to see the eclipse, and the other was my trip to the Fort Mojave Indian Reservation to do some volunteer work in that community. I was heavily involved in planning the latter which meant that this blog was a bit neglected, but I hope to give a full report on it soon!

I have a whole bunch of posts lined up too – and at Foursquare I’ve been shopping some talks with the “Data” crew that I may want to share publicly.

But for tonight – I’m going to go out, celebrate, and possibly have my yearly slice at Koronet Pizza. Judging from the jackets, it clearly wasn’t as cold 3 years ago as it was tonight.

Happy New Year everyone!

 

The Idea of Subjective Probability

I’ve been deep in Bayesian analysis recently, and I want to discuss some of the philosophical foundations.

The background here is that there are roughly two camps of statistical thought: the Frequentists and the Bayesians.  They represent very different ways of thinking about the world.  I fall squarely on the Bayesian side. The purpose of this post isn’t to construct some grand argument. I just want to introduce a simple idea: Subjective Probability.

Just like the world of statistics is divided between the Frequentists and the Bayesians, the interpretation of probability is divided into objective and subjective. Objective probability is associated with the frequentists and subjective with the Bayesians.

The prime example of objective probability is a coin flip. Suppose that this is a fair coin and it produces heads on half of all flips. It is an objective property of the coin that it produces heads one out of every 2 times.

Let’s look at another example: a deck of cards. A standard deck is weighted to produce a heart a quarter of the time, and to produce a picture card 3/13ths of the time. Again, it’s helpful to think of the deck as yielding an objective probability – but this way of thinking is limiting.

For example, suppose you have a deck of cards on the table and you again want to assign a probability of seeing a picture card. You know that it’s 3/13, but you keep staring at the top card in that deck. You see the back of that card. You know it’s either a picture or it’s not. “What are you?” you say. As soon as you turn over that card, the probability either goes to 0 (it’s not a picture) or it goes to 1 (it is a picture).

What if you shuffled the deck and you happened to get a peak at the card on the bottom? You’d then change your expectations of what the top card is going to be. What if you caught a glimpse of that card, but you’re not exactly sure?

The probability now isn’t some inherent property of the deck, it’s a number in your mind that represents your expectations of the top card being a picture card. This number can take into account the inherent properties of the deck of course, but it can also take into account any other information you have as well as your experience.  For example, maybe you suspect the deck is rigged. You’re belief about the deck might be different from someone else’s.

Subject probability applies much better in real-world forecasting situations. Let’s say you want to assign a probability to a particular candidate winning an election. In the end, they’ll either win or they’ll lose – but the probability you assign is an expectation of that event. You don’t need to be well informed to have a subjective expectation – but you want to set yourself up to have more accurate expectations as you gather more information.

Sometimes we assign binary expectations to an event. For example, if I am absolutely sure something will occur I will assign it a 1. If I believe it is impossible, I’ll assign it a 0. And then I make decisions based on that belief.

But it turns out that we can make better decisions by hedging. If I see on my phone that there’s a 30% chance of rain, maybe I won’t bring my umbrella but I’ll wear clothes that I don’t mind getting wet.

What does it mean to have a degree-of-belief of 30% rain? It’s not like we’re living in a frequentist world where that particular day can be repeated over and over again to get a fraction. This is a difficult concept to define, but another way to think about it is a ratio of expectations. If there’s a 30% chance of rain, that means that there’s a 70% chance of no-rain, and the ratio of expectations is 3:7. It’s related to the amount of risk we’re willing to take on a certain outcome.

When the event finally occurs, we can quantify how surprising that event was by using logarithms on the assigned probability of that event. For the example above, if it rains the surprise is -ln(0.3), or roughly 1.2. If it doesn’t rain, it’s -ln(0.7) or roughly 0.35.

Just because you’re very surprised doesn’t mean you were wrong to assign the probabilities that you did. It could be that your forecasting was really good given the information at hand, and a rare event occurred. But it’s generally true that if you are surprised less often after adjusting your methods for assigning probabilities, your new methods are probably better. In complex systems, there’s no optimal method – you can always add more data and computation. In simple games, there’s usually an optimal – and these can be thought of as objective probabilities.

Anyone can assign a subject probability to an event. You’ll often hear in casual conversations remarks like “there’s a 20% chance we’ll be on time”. These probabilistic assignments are often made before any thought has been put into then. If you want to assign better probabilities, a good start is to follow some basic logic. For example, if X always leads to Y, the probability of Y must be greater than or equal to X. There’s also the indifference principle: if you have no information distinguishing two mutually exclusive events, then you should assign them equal probabilities.

And finally, there’s Bayes rule. This tell us how to update our beliefs when we are exposed to new information. This most important rule is how the idea of subjective probability gives rise to Bayesian inference.

I actually witnessed SQL Injection

SQL injection is one of those hacks you can do on websites with really bad security practices. It can occur whenever your database query includes user input. If the user puts something you don’t expect, they can alter the database in ways that you don’t expect.

A funny example – which is kind of famous in engineering circles – is given in the webcomic XKCD.

Now about 10 years ago, I coded up a site called Stickymap. It was a local search where users can post locations in their neighborhood that are interesting and leave description. It was coded in PHP. You can secure PHP if you’re careful but it’s very difficult to do so. If you use PHP in your organization, there should be very specific rules around running SQL queries.
biglogo

Well – one of my queries did not escape the user generated data. And, long story short someone changed every single venue name to “Bureau Veritas”. Every single one. In the world.

After I investigated, I don’t think that this was the intent. I think that the user was trying to add a (very spammy) description to a single venue that short-circuited the query so that the “WHERE clause” didn’t make it in. For those of you who don’t know, the WHERE clause in an UPDATE statement tells the database which items to update. If there is no WHERE clause, it’ll update everything. Pretty insane, right?! It should probably update nothing.

I wonder how that person/spammer felt after they did this. Where they shocked? Did they move on to another site? Who knows!?

Fortunately, I had enough backup data to restore the Stickymap database while I was in San Francisco. Of course this always happens when I’m in San Francisco away from my home computer!!

Furthermore, I plugged up the security hole on the site. It’s pretty cool that the security hole was left unexploited for 10 years and then all of a sudden was found. Who knows what problems we have lurking in our more critical systems? I like to hope those are more widely tested. You also want to see systems that hackers are constantly trying to exploit because that means that the owners of that system have been forced to plug the security holes. For example, I would rather trust software that’s been cracked and plugged a few times in the past than software that’s never been hacked but also never left out in the wild either.

Anyway – if someone out there wants to tell me there are more security holes in my site – let me know! But please try not to destroy Stickymap – it’s my fun mid-2000s space on the internet and a reminder of how far we’ve come on local search.

And if you are the accidental culprit and you come forward, I’ll either interview you for the blog, or I’ll owe you a beer!

Marsbot and Chatbots

marsbot

I spoke about Marsbot a number of times on this blog, but I wanted to write my own (short) piece on what we did and why we did it. The short of it is that Marsbot is a personal assistant that tells you about all the best places around you and what to do there. The secret sauce is that you don’t have to put much into it to get use out of it – you just download the app on your phone (iPhone or Android) and it automatically discovers where you go and what you like. Sometimes it’ll ask you a question or two, but it also infers a lot automatically.

To get more information about it from a product perspective, I recommend that you check out both Dennis Crowley’s post on Medium and also from Foursquare (and Marsbot) Product Manager Marissa Chacko.  You can also check out my talk at Talkabot in Austin. We all worked together on this for a while and are pretty psyched about the results.

Especially last December, when we got on Mashable’s 12 best Apps of the year. It’s nice to be on the same list as Pokemon Go – even though we far fewer users.

Now that it’s been out for a while, here are a few of my takeaways from the experience.

1) Context is everything. Discoverability in the bot space isn’t going to be like discoverability in the app space. There probably won’t be a “bot store” and even if there is, it’ll be very difficult to break through like the App stores. The winners are going to have to stand out and learn something very specific about users to help them complete a task (or have fun). Foursquare now has the Pilgrim SDK to allow other apps (and in the future hopefully bot platforms) to have the same superpowers that Marsbot has.

2) Natural Language Understanding (NLU) is the ability for a computer to understand human input. When it comes to bots, sophisticated NLU doesn’t mean much unless the backend code can actually act on that understanding. For example, suppose you text Marsbot to say that the recommendation is “too far”. An NLU system that gets that is only worth it if there’s a backend module where Marsbot can give a closer recommendation. (There is by the way)! Therefore when it comes to bot design, I think the thing to focus on is what actions you want the bot to be able to take and expand on those. The NLU can be heuristic-based at first, and one day can be replaced by a sophisticated AI system only after a wide variety of actions are coded in the system.

3) I’m really into the conversational aspect of this. The hook for Marsbot is that it talks to you, not the other way around – but many of our users talk to Marsbot and seem to try to form a friendship with it. I imagine a seamless conversation where you can object to Marsbot’s recommendations (for both places and menu items) with reasons until it comes up with a solution. I mentioned this in my talk in Austin, and some of it is implements (too far, too expensive) but Marsbot doesn’t understand more than 1 command at a time. It would take a bit of work to make a fully-fleshed out human-like conversation working.

4) Marketing these bots and getting them to capture the public imagination is hard. Marsbot was lauded in the tech press, but the user numbers remain small. And even if you can build a bot with very large user numbers, how do you transition from being a fun curiosity to an indispensable tool that people rely on? I think a lot of bot-makers are doing some interesting things in the enterprise space where they can sell their technology to organizations. For the individual consumer space, the secret to the bot-hit is still elusive, but may be cracked someday!

5) You haven’t heard the last of this technology from Foursquare. I think that our Pilgrim SDK will power bots like Marsbot, and our NLP + recommendation powers will continue to grow. If you’re in the US, download Marsbot on your iPhone or Android device, and let me know how it goes (@maxsklar)!

Talkabot: The bot conference in Austin

img_9305

Last month, I also attended the Talkabot Conference in Austin, TX. I gave a 30 minute presentation on Marsbot. I guess you could say I am on a national tour! This time, I shifted focus to how we’re adopting everything we’ve learned about user context (location stops, taste likes, time) to send messages that are really useful.

The conference was great – it was actually my first time in Austin. I had a very warm reception from the team at Howdy and on the last day got to go out on a trek to the salt lick for some BBQ with the founders of Kip, and reps from Slack. At the conference itself, there was a lot of talk about developing standards for chat bots, and building tools and platforms upon which these bots will be built. I loved ordering coffee from a bot barista on kik who in addition to giving you coffee also pitches you his screenplays.

img_9338

Even though chat bots have existed for a while, there is a sense that we’ve hit an inflection point and some of the killer apps are coming. There is hope that chat could be the next great platform for innovation.

I’ve been asked to share my slides. Here they are: talkabot-marsbot-presentation-1

If that wasn’t fun enough, Marsbot actually attended the Foursquare Halloween party last night!

Marsbot Slides for Industry Talk at RecSys 2016

I recently attended the 2016 conference on Recommender Systems at MIT with my Foursquare coworkers Stephanie Yang and Enrique Cruz. We had several contributions – 2 posters and a 20 minute industry talk on Marsbot.

Marsbot is a character in your pocket that acts as a text-based service for local recommendations. I’ve been working on it for a while, and we were able to do a full launch a couple months ago. I have so much to say about this project and I hope to expand on it more on this blog soon!

For now, a bunch of people have asked me to post slides from my talk at RecSys so I will post them here. I hope the video of the talk become available soon.

PDF of the slides:
recsys-marsbot-presentation