Starting and completing a weekend project, is often challenging. Feature creep, tech stack choices and unwaranted tech excelency (100% test coverage for a weekend project anyone?) are your enemies, making the project timeline go beyond one weekend, which in turn usually results in the project never being done.
After countless such projects, I got to a place where I can confidently estimate a small project, define its features and use tech as a tool to complete my project. Preventing some ideological concepts and unnecessary perfection from getting in my way.
I've built a small project this weekend that will help me find my next apartment. And it is a great case study to demonstrate all lessons learned.
I'm looking for a new apartment to rent. In my local market, most land lords only advertise in private facebook groups. Each group is dedicated to another city/area. Which for me resulted in 7 different groups to cover my search area.
Going through the ads in facebook groups is a mess:
- You need to individually visit several group pages in every "scan".
- Posts with recent comments are bumped up, making you go through posts you already saw.
- There is no easy way to filter posts.
- This is tiresome, which might lead to me missing great opportunities that are time constrained (as these are high demand locations).
Before we dive in, if you are interested in the end result, my code is freely available at Github. Ok, lets go.
Manually scanning multiple facebook groups every few hours is a long error-prone process. My automated solution had these requirements:
- Visit all groups manually and receive all ads.
- Remember which ads it already sent me, and filter those out.
- Filter ads by some text, in my case "3 rooms", to get only relevant ads.
- Send it to me in a way easily shareable with my partner.
I wanted to solve each of these requirements in the most pragmatic way possible. My top priority was to make it work, and be done with it within the span of a weekend. In the rest of this article I will share with you my technical decisions, reflecting on why I made them.
Facebook groups scanning
My initial thought was to use the facebook API to retrieve the posts in an easy to consume way. Unfortunately, some of the groups I needed are defined as "secret" and you cannot get their posts using the API.
With API usage out of the question, I had to resort to web scraping. I have strong experience with nodejs so it was no brainer for me to go that route. For node ecosystem Puppeteer came up and I started using it.
Puppeteers default is a headless browser. Every run is a clean start with no prior state (cookies, sessions etc'). Which means that I would have to authenticate to facebook every time. Handling authentication seemed out of my scope, and it is a one-time thing: connect to facebook.
Puppeteer also allows for a full regular Chrome session, connected through the debug socket. This allowed me to open Chrome, connect to my facebook account manually and then let my script use this Chrome instance. Not everything has to be solved by programming. I have goals I want to keep - functionality and time constraints.
I looked at the generated HTML, searching for classes or attributes that seems like they are not randomly generated. Leaning more into "data-testid" attributes, as they are wide spread and constant. Using them to querySelect the elements with the necessary data.
Is it resillient? not at all. As someone who ran a company that focused on data gathering through scraping, I know how fragile it is. In fact, in the week between writing the bot and writing this post, the scraping already broke and I had to wrap a selector in a more protective way.
But it is good enough. And when things break, fixing them is a very low effort thing. The upside of creating a more resillient solution is too low compared with the time effort needed for that.
One of my requirements were that the bot won't send an ad twice. For that, I had to add some data storage layer, to keep a list of all ads already seen.
I wanted to keep it simple, before sending me an ad, check if its url is in my already seen list. If it is not, add it to the list and send me the ad.
Figuring out how to do it raised some question. Should I go with an SQL database? No. That was overkill. Even for SQLite. Should I go with file based system? just writing to a file? That sounds more reasonable. Maybe write to a JSON file so it'll be easy to read.
I knew I just didn't want to manage this myself. I wanted the ease of SQL API without the hurdles of setup. So I went hunting for libraries. Soon enough I found lowdb.
Lowdb is a file based (json) database powered by lodash. One look at the code examples, then a look at the tests made it clear this is what I was looking for.
Access layer to data
I chose to write an access layer to the data storage instead of directly accesing it everywhere. Thus, creating this API:
I decided to add this layer for 2 reasons: testability and speed of development. To the testability part I'll get in the next section. Speed of development is felt as I used the data layer and various places, and the benefits of DRY outweigh changing the code in-place everywhere for every change.
I didn't start with this abstraction. But rather started with using the library directly. It quickly occured to me through multiple instances that as I change the data structure in development, changing all instances is time consuming and error prone.
The only tests I wrote were for the data layer abstraction. I used tests as means to an end, and not as the end itself.
A weekend project is something small enough that manual tests are possible and time efficient enough for the most part. Some parts though might change so frequently that they do call for the time saving benefit of automated tests.
The data storage layer is one such part. The fast pace in which data structures changed and requirements revealed themselves, led to constant changes there. Add this up with multiple possible states, and every manual test to check I didn't break something quickly grew up to >10 minutes.
In this case, writing automated tests that are testing expected end result saved me alot of time. I could quickly iterate on the code using the automated tests, and save up multiple manual checks. Speeding up my development significantly.
Telegram bot - why write a service
Last thing to resolve was where I will get the messages. Telegram bot were an easy choice, since me and my partner are using telegram and it easily allow me to add a bot to a group chat.
I quickly wrapped a node-telegram library I found within a service in my code and used it. Wrapping the telegram library API with a service made the code more organized and easier to navigate. It replaces telegram-specific lingo with my own generic lingo and allows for encapsulation of the initialization logic for the bot that is telegram-specific.
A service has the added benefit of easier drop in replacement of other measures of message sending (for example, mails, whatsapp bot etc'). But this is not why I did that. The service abstraction is easy enough and have high ROI of code organization and manageability. I did not need the benefit of drop-in replacement to justify it.
Wrap up - key lessons for weekend projects
Working on this project, I wasn't actively thinking about everything mentioned here. These are lessons I learned before, automatically acting on them when trying to get things done.
- Not everything should be solved by automation. If automating something is much harder than manually doing it, prefer to do it manually.
- Solutions can be fragile when the alternative requires much preparation work. Prefer fragile and simple to resillient and complex.
- Leverage open source as much as possible. Get to your functionality goal and dont let tech distract you.
- Use automated tests as a tool to enhance your ability to deliver. Don't test everything, but test when it benefits the speed of iteration.
- Write code as it is comfortable for you. Use "unnecessary" abstraction if their cost is low and they enable you higher dev experience.