Arjun Chidrawar ’24


Whooo’s Reading, San Diego, CA

Over the summer, I worked as a Data Science Intern for Whooo’s Reading. Established in 2013, this company is part of the education tech sector with a focus in K-12 education with a goal to provide schools an alternative to traditional multiple-choice questions and offer a more thoughtful, out of the box approach for teaching. Whooo’s Reading gives students more open-ended questions that better test understanding and grasp of content. The company does not target individual students but instead focuses marketing efforts on entire school districts so that this style of learning can be implemented in every English-speaking classroom in the school.

Finishing a presentation on a machine learning model for open-ended questions.

I was charged with improving the model’s ability to detect if a student restated the question asked in their own answer. As I consider myself a beginner to intermediate programmer, I had to learn a lot of new concepts throughout the process, though after two weeks, I put together a 25-minute presentation to demonstrate my findings to both the CEO and the CTO. The next two to three weeks were spent creating a usable data frame from which the machine learning model could learn. I had to take a dataset of around 60,000 responses from students to questions and sample 80 responses for each of 180 questions. This would end up creating a dataset with 14,880 rows, each of which was unique and filtered by the length of the students’ responses.

After creating the dataset, I created pairs of rows, which would be used for annotation in Mechanical Turk. Simply, I was taking the 14,880 data points from earlier, and giving each datapoint 20 unique “buddies.” So 14,880 rows*20 buddies ended up creating a new dataset with 296,000 rows, and each row had a unique pair of datapoints. By doing this, I could then input this dataset onto Mechanical Turk where real users around the world would go through each row and compare the two data points that exist, deciding which response restates the question better. This process is called the “annotation” of data.

In retrospect, the number of new things I learned in such a short amount of time has really surprised me. Without weekly check-ins and support/help from my supervisors, and earlier on, my fellow Williams peers, this process would have been significantly harder. I had a lot of fun during this internship and I recently found out that the software I have been working on is potentially going to be sold to other companies who have demonstrated interest in it, which is really exciting. After this internship, I am definitely considering switching to a CS major now that I have seen the sheer power and influence machine learning and programming has on the modern world. This internship has really opened my eyes and given me direction for what I want to do after Williams.

I just wanted to thank Bill McCalpin ’79 for truly making this all possible. I would also like to thank the ’68 Center for Career Exploration for providing me with this opportunity.