Uncategorized

Search The Deck

Filtering tags is so 90’s

When I was putting together Klotho’s pitch deck, I followed the classic sections everyone suggested: introduction, problem, solution, target market, market size, competition, go-to-market strategy, product or service, team, financials, funding, milestones, and conclusion.

I searched for slide decks, but it was difficult to find the specific sections I wanted to learn from. Opening tens of decks and scrolling through to find that one relevant slide felt wasteful.

That’s when I decided to create a tool that would make it easier to search inside the decks.

Searching Decks

I needed to find pitch-deck slides of a certain type, like the ‘Problem’ slide or ‘Vision’ slide.

There are several pitchdeck sites out there, and with a little scraping, I collected 15k+ slides, totaling 5GB of data. The only way to find the relevant slides without manually tagging them was to OCR the images and search the words inside the slides.

However, the resolution of the slides was too low and the OCR library tesseract wasn’t able to recognize the text in them.

To solve this, I used the latest GPU-based open source upscaler called Upscayl to upscale each slide to 4x its original size. This created a data set of 150GB of images that was ready for OCR.

Running tesseract on 150 GB of images on a single machine proved to be slow, and to speed things up, I wrote a lambda-based event-driven Klotho application to parallelize the scan.

Klotho

The application would get an image path, pass it to a function that runs tesseract on it, then passes the detected text and image path to another klotho::exec_unit that resized and optimized the image into a smaller yet still high resolution webp file.

To upload the images, I used the klotho::persist capability to create a data store backed by S3, and manually uploaded the 150 GB of images.

The event driven flow used the klotho::pubsub capability. The processed image was then written into the same klotho::persist‘ed object store but with a different path, and the path + detected text were saved into a klotho::persist‘ed key-value store.

The processed data-set was only 2GB in size.

Fast search

In order to create a fast, searchable data set, I used Algolia to index the text results from the OCR. Facets such as the startup name and the public image URL for the slide made the UI easy to construct.

For the front-end, React, NextJS, NextUI, static building, and Klotho’s klotho::static_unit capability made a great combo running on AWS’s S3+Cloudfront CDN. Due to the 15k results going over the Algolia free-tier, we decided to sponsor the project.

What I liked

  • The Upscayl GPU-based upscaler quality was impressive despite the low resolution of the sources.
  • I enjoyed the developer experience building the cloud system with Klotho. (Though being one of the founders, I’m biased). I used the open source klotho::persist and klotho::static_unit and the pro klotho::pubsub and klotho::exec_unit capabilities to construct the larger system in a few hours with virtually no infra/platform work – maximum productivity!
  • Tesseract’s OCR produced quality results and worked well in a Lambda-based environment.
  • Algolia APIs had a seamless experience and their starter React components for the front end UI worked as expected.

What I disliked

  • I couldn’t figure out how to run Upscayl in a cloud environment, so I wound up not automating it. That meant that Search the Deck isn’t fully automated (yet), and there are manual steps that have to be taken to add or update the decks.
  • The manual nature of collecting all the slides from all the web sites felt unnecessary. Similar projects pop up all the time, there’s no point in re-scraping them.
  • The developer experience for klotho::static was experimental at the time of writing, but I wanted to use it anyway. This wound up being useful input for its next iteration.

Opening the data set

We’ll be releasing the image dataset as a downloadable set, or hosting it in a repository so people can contribute to it. That way the next person that wants to create a fun new version can use that central set and benefit everyone.

Open Source

I wasn’t originally planning to make this an open source project, but it seems like it would be really useful to make it available to everyone.

Help us get 1000+ Github stars within a week and we’ll prioritize the effort to open source it.

Now go and Search the Deck!

via GIPHY

Serverless vs. Microservices: Two Sides of the Same Coin

I just wrote up a piece around the confusion the Internet creates around Serverless and Microservices. The “Serverless vs. Microservices” debate presents a dilemma between two supposedly incompatible strategies that must be fundamentally at odds with each other. In reality, they are as similar as two flavors of ice cream – you might prefer chocolate chip, but strawberry will work just as well.

Check it out at the official Klotho blog.

Cloud computing architecture for the next ten years – Part 2

Cloud development has become prohibitively complex, and the current generation of solutions have low-level interfaces that require extensive investment from developers and operators to understand how to configure, learn, assemble, and scale them properly. For a new architectural shift to occur, we need approaches that absorbs the cognitive load, not streamline it.

Maintain benefits from existing architectures

There has been a continuous discussion among backend and service developers about whether things should be built using one strategy (monoliths) or the other (microservices). There are no one-size-fits-all solutions, because there’s always a trade-off involved.

Monolithic development offers high productivity, ease of deployment, and a straightforward observability story. Microservices offer flexibility in fault isolation, resource tuning and team autonomy. Unfortunately, microservice-based architectures usually involve piecing things back together – back into the monolith’s basic architecture, but with duct tape. As a result, the benefits of neither solution are fully realized.

When building Klotho, we zoomed out and asked, “What aspects of computer engineering can we apply to bridge that gap?”. We concluded that a key characteristic in the new architecture must be the convenience of monolithic development, coupled with an adaptive system that leveraged the benefits found in microservices architectures. Most importantly, it has to reduce the cognitive load for developers, while maintaining configurability and control for operators.

By focusing on developer and operator intent, we created a solution based on ease of use through separation of concerns. Using three different programming constructs, Capabilities, Requirements, and Directives, developers and operators can specify what parts of the application should be cloud-aware, what additional tradeoffs Klotho should consider for your application, and what specific overrides are required.

Solution: Developers should write code the way they know best. We leverage their intent early on to determine what backend wiring and analysis is done behind the scenes to properly meet their needs. Requirements and Directives allow developers and operators to provide more fine tuning and controls without developers needing to change the code.

Read the rest of the post on the official blog:

Cloud computing architecture for the next ten years

In computing, bigger and more ambitious dreams have always been realized by pushing the limits. Cloud computing is no exception; parallel computing, cluster computing, grid computing, and edge computing are all continuously expanding what we consider to be possible. But they also make development more difficult.

Cloud computing is now in the phase of streamlining complexity. There are several examples of integrated solutions that are optimized for certain workloads or development models: Google’s Anthos, Amazon’s Outposts, Azure’s Stack Hub, and Hashistack.

These solutions bundle together building blocks necessary for larger-scale applications and systems, but they present complicated low-level interfaces that require developers and operators to configure, learn, assemble, and scale appropriately.

It’s similar to the complexity reduction evolution happening in programming languages: Punch cards, assembly, C, C++, Java …

Continuous improvement keeps happening, but at some point, an architecture shift emerges that addresses the accumulation of complexity.

In our first blog post on Klo.Dev, we take a look at a few principles that we view as critical for this architectural shift to emerge, and what we need from products to effectively take us into the new world of cloud computing: