Home » HappyPancake 🌟 AI Research · Newsletter · ML Labs · About

Distributing Work

A season of vacations starts. This week was the last time when our team was online at the same time. Tomas takes a vacation starting from the next week. Pieter is probably going to take his as soon as he gets through bike exams (wishing him the best of luck). I'll travel to Georgia next week, while working remotely and taking longer weekends.

Obviously, we want to stay productive during this period and move forward on our project. There are things that usually require full consensus: important decisions about design, specific feature requirements, everything that involves multiple packages at once. Last week was spent going through these things in advance to make sure we have plenty of non-blocking work queued up for the next month.

More Features

We have some basic features implemented in the system so far. Software design evolved a bit to support them all while keeping things simple.

At this point, if HappyPancake2 were a brand-new product, I'd recommend going live (e.g. in stealth mode) as soon as possible in order to start getting real-world feedback from the users.

No amount of testers and visionaries can replace knowledge and insights coming from the real world feedback. Duty of software developers is to make his happen as soon as possible and then iterate, incorporating lessons learned.

However, HappyPancake2 is special - it is already used by thousands of users, so there is already plenty of feedback. We know quite well which features are necessary, which could be discarded and which enhancements we could try next.

Hence we can keep on working on this project without releasing it. Tomas has all the domain knowledge we need right now.

We are planning to introduce these features next:

  • Interests - tags that members can add to their profile, allowing other people to find them by interests (and potentially allowing us to provide better matching);
  • blocking - allowing a member to ignore another one (removing him or her from all search results and blocking communications);
  • online list;
  • abuse reports on content with admin review queues;
  • favorite profiles.

During the week Pieter focused his efforts on developing review functionality, which is one of the most important features in our system.

Node.js

We are planning to make a slight tech change in our stack by implementing front-end in node.js (which is something Tomas explored last week). This is a relatively small change to the existing system - http endpoints will need to return JSON instead of rendered HTML, so the cost is relatively low. Benefits are:

  • better separation of concerns in our design;
  • ability to use Rendr (render backbone.js apps on the client and the server).

This would turn our existing code into back-end with an API, serving JSON requests and streams to node.js front-end. Such separation allows to have more flexibility in UI while introducing a much better testing to the back-end.

Behavior Testing

Thanks to the switch from HTML endpoints to JSON, I started introducing package behavior tests to our system last week. These tests set and verify expectations about public contracts exposed by packages. This is quite simple to do:

  1. Given a set of events and dependencies
  2. When we execute an action (usually calling a JSON endpoint)
  3. Expect certain assertions to be true.

In the longer term I hope to convert these tests to self-documenting expectations (like I did in my previous .NET projects). Ability to have up-to-date documentation of the code that is expressed in human-readable language can be a powerful thing for keeping project stake-holders involved in the project. This means better feedback and faster iterations.

Code looks like this in golang:

func (x *context) Test_given_nancy_flirts_bob_when_GET_bobs_alerts(c *C) {
    s := run_nancy_flirts_bob(x)

    r := x.GetJson(s.bobId, "/alerts")
    c.Assert(r.Code, Equals, http.StatusOK)

    var m model
    r.Unmarhal(&m)
    c.Check(m.Title, Equals, "Alerts")
    c.Check(m.HasMore, Equals, false)

    c.Assert(m.Items, HasLen, 1)
    i1 := m.Items[0]

    c.Check(i1.Member.Nickname, Equals, "nancy")
    c.Check(i1.Unread, Equals, true)
    c.Check(i1.Member.IsOnline, Equals, true) // since we have allOnline

    c.Check(x.Service.AnyUnread(s.bobId), Equals, false)
}

where nancy flirts bob scenario is a simple code setting up preconditions on the system:

func run_nancy_flirts_bob(x *context) (info *nancy_flirts_bob) {
    info = &nancy_flirts_bob{hpc.NewId(), hpc.NewId()}
    x.Dispatch(hpc.NewRegistrationApproved(
        hpc.NewId(),
        info.bobId,
        "bob",
        hpc.Male,
        hpc.NewBirthday(time.Now().AddDate(-23, 0, 0)),
        "email",
        hpc.NoPortraitMale))

    x.Dispatch(hpc.NewRegistrationApproved(
        hpc.NewId(),
        info.nancyId,
        "nancy",
        hpc.Female,
        hpc.NewBirthday(time.Now().AddDate(-22, 0, 0)),
        "email",
        hpc.NoPortraitFemale))

    x.Dispatch(&hpc.FlirtSent{hpc.NewId(), info.nancyId, info.bobId})
    return
}

The Truth is Born in Argument

I can't be grateful enough to Pieter who has enough patience to go with me through the design discussions in cases when we disagree about something. Talking things through with him is one of the reasons why our design stays simple, clear and capable of future evolution.

Design Game

A lot of our work resembles some sort of puzzle, where we have to do 3 things:

  • find names and words that let us communicate better (we are a distributed team from different countries);
  • discover ways to break down large problem into small coherent parts (team is too small to be able to tackle huge problems);
  • decide on optimal order in which these parts could be handled (our time is limited and has to be applied to the areas where it will make the biggest impact for the project).

The hardest part is deciding which things have to be done right now and which can be deferred till some point in the future. In some cases implementing a feature without all the necessary data at hand can be a waste of time, in other cases, this could lead to a deeper insight required to move forward.

We try to optimize implementation chain a lot - bringing most rewarding and easy features ("low hanging fruites") and depreriotizing ones that are less beneficial for the project. That is an ongoing process required for applying our limited time most efficiently.

For example, previously we pretended to store photos in our system. We simply passed around urls pointing to photos from the original version of HappyPancake. That was a good decision (defer functionality as long as possible), but time came to implement it.

During last weeks Pieter pushed new media module and spent some time integrating it with the other our services. This brought new insights to how we are going pass around this information through events. We also know how we could host and scale such such module in production (deploy to multiple nodes and rsync between them).

Anything related to performance is another example of things we deferred.

"Big Data"

So far our development intentionally focused on software design while deferring any potential performance optimizations. Now it is time to start learning about the actual numbers and real-world usage.

At the end of the week I went back to Visual Studio to start writing an extractor utility. This tool merely connects to the original database and saves some data to a compact binary representation (compressed stream of protobuf entities). Then, I started working on the golang code which will scan through that data, producing a stream of events which could be passed to our development project.

It is recommended to use such two-step data processing (dump data store to intermediary format and then iterate on data dumps) whenever you are working with large datasets coming from a live system. This decouples the process from production systems, reducing the impact and allowing to have faster iterations.

We started working on Finland database, which is one of our smaller installations, yet the already is a bit of data to process. For example, there are more than 1200000 messages, taking 230MB in compressed binary form, 11MB of member data and 2MB of user flirts. Sweden is 100-50 times larger than that.

This might seem like a lot of data, however it is not so. Our entire Sweden dataset, if encoded properly, could fit on a single smart-phone and be processed by it. It's just large. However, since we didn't introduce many performance considerations into our design yet (aside from keeping it scalable), some tuning will be necessary.

I haven't worked with large datasets for more than half a year, so I'm really looking forward to get back in this field. Real-time reactive nature of the data makes this even more interesting and exciting.

Published: July 06, 2014.

Next post in HappyPancake story: Smarter Development

🤗 Check out my newsletter! It is about building products with ChatGPT and LLMs: latest news, technical insights and my journey. Check out it out