projects on Roy Tang

TriviaStorm: Text and Answer parsing

hello@roytang.net (Roy Tang) — Sun, 03 Feb 2019 05:56:56 +0000

A while back I started a Twitter trivia bot as a weekend project. That bot is still up and running on Twitter, you can check it out there!

But today, I thought I’d write about the answer-checking mechanism used by the bot. It was a bit interesting to me because it was the first nontrivial use I had for Django’s unit testing framework. I’m not too keen on unit testing web functionality (something I still have to learn), but this seemed an appropriate first use of a unit test framework for several reasons:

the bot had to be able to handle a wide variety of answers
there were a lot of test cases to check and a single checking function handling everything - I couldn’t risk breaking previous working tests
inputs were discrete and outputs were easily checkable
I needed to be able to add new test scenarios all the time as more problematic answers were provided

The project currently isn’t open source, but I did make a gist of the tests.py I used here.

The check function basically accepts three parameters:

a checking mode (currently only supports EXACT and ALL_ANYORDER)
the answer phrase provided by the player
a set of valid answers accepted for the question

There’s a number of test cases already handled:

checking should be case-insensitive
articles should be ignored if they’re at the start of the answer phrase
numbers should be acceptable for the spelled-out versions i.e. “7” should be accepted for “seven”, and vice versa
some minor soundex (phonetic matching) support (via Python Jellyfish)
handling of questions that support multiple answers. This is what ALL_ANYORDER is for - it means that all the given answers must be provided, but they can be in any order. i.e. if the valid answer set is "Huey", "Dewey" and "Louie", then "Louie Huey Dewey" should be accepted as an answer
nonalphanumeric characters should be treated as whitespace, except in some special cases
special case: abbreviations like "don't" or "can't" should be treated as if they were a single term like "dont" or "cant" instead of "don t" or "can t"

The answer checking definitely still isn’t perfect, but I’m pretty happy with where it’s at right now. There is also definitely an element of subjectivity as to which answers should be accepted. One time a player complained that his answer "Batman vs Superman Dawn of Justice" should count for "Batman v Superman: Dawn of Justice", but for this particular question I had chosen not to allow “vs” for “v” because that was the actual movie title, which might be unreasonable now that I think about it!

I do know that I need to implement a better “synonym” handling, i.e. mapping of "v"<->"vs" and other terms like "mr"<->"mister" or "natl"<->"national". The problem with handling things like that is that is the possible combinations of phrases expands when multiple such terms are found in the same answer, so it can’t scale too well. I suppose I need to normalize the answer sets at the time the question is defined. What do you know, I figured out how to do something just by writing a blog post!

I do have a bunch of other enhancements planned for the trivia bot, including support for slack and discord, and a longer time frame roadmap, but I’m not sure when I can commit more time to it. Still, it’s turned out to be a pretty fun endeavor, I’m hoping it leads to something cool!

Django Blog Application

hello@roytang.net (Roy Tang) — Sun, 28 Oct 2018 05:02:37 +0000

Ten years ago this month, I started studying Django by trying to build my own blog application. I found the code lying around while I was going through some backups lately. It’s way out of date, it uses an early version of django. I thought of bringing it up to speed, but that didn’t seem practical. Instead, for archival purposes, I cleaned it up a bit and uploaded the code to a github repo. (Helpful github immediately warned me that having a very old version of Django was a security risk lol). There’s a lot more information in the README.md of that repo. I actually used this as my main blog engine for a while before I decided the maintenance effort wasn’t worth it and switched to WordPress.

Tangent: I’m not actually 100% happy with WordPress, and when I found this old code I was tempted to maybe trying building Yet Another Blog Application (TM), except maybe using the opportunity to learn some other framework. My 2018 personal to-do list is already way too long though.

Weekend Project: Twitter Trivia Bot

hello@roytang.net (Roy Tang) — Thu, 23 Feb 2017 01:30:00 +0000

I had been meaning to try writing a Twitter bot for a while now. I figured a trivia bot would be pretty easy to implement, so I spent some time a couple of weekends to rig one together.

It’s (mostly) working now, the bot is active as triviastorm on Twitter, with a supporting webapp deployed on https://triviastorm.net/. The bot tweets out a trivia question once every hour. It will then award points to the first five people who gave the correct answer. The bot will only recognize answers given as a direct reply to the tweet with the question, and only those submitted within the one hour period.

Some technical details:

My scripting language of choice for the past few years has been Python 2.7. I’m using Tweepy to interact with the Twitter API, PyMySQL to connect to the database, and Flask to run the webapp. I haven’t used Flask in some time, but it’s still very straightforward. I actually had a harder time configuring the webapp with mod_wsgi on my host.

The main problem with a trivia system is that you need a large and high-quality set of questions. Right now the bot is using a small trivia set –around a thousand questions I got from a variety of sources. If I want to leave this bot running for a while, I’m going to need a much larger trivia set. However, reviewing and collating the questions is a nontrivial task. Hopefully I can add new questions every so often.

Feel free to follow the bot and help test it out. I’d be grateful!