More

michalc · 2026-04-17T15:48:49 1776440929

> the rest will soon follow

If you’re looking for requests ;-), I would love an ECS (and specifically Fargate) emulator that actually ran Docker containers locally as though they were in ECS

ozarkerD · 2026-04-19T20:14:53 1776629693

That's a great idea i've had floating around in my head too!

michalc · 2026-04-16T08:01:37 1776326497

Submitted to Unicode yesterday (so please be gentle!)

michalc · 2026-03-23T07:46:15 1774251975

I think I can understand why this wasn’t addressed for so long: in the vast majority of cases if your db is exposed on a network level to untrusted sources, then you probably have far bigger problems?

pilif · 2026-03-23T09:25:55 1774257955

it's also very tricky to do given the current architecture on the server side where one single-threaded process handles the connection and uses (for all intents and purposes) sync io.

In such a scenario, listening (and acting) on cancellation requests on the same connection becomes very hard, so fixing this goes way beyond "just".

michalc · 2026-03-12T12:18:20 1773317900

So my definition of big data was data so big it cannot be processed on a single machine in a reasonable amount of time.

I guess they’re using a different definition?

jawns · 2026-03-12T12:27:54 1773318474

I think it's partly tongue in cheek, because when "big data" was over hyped, everyone claimed they were working with big data, or tried to sell expensive solutions for working with big data, and some reasonable minds spoke up and pointed out that a standard laptop could process more "big data" than people thought.

rattray · 2026-03-12T12:23:03 1773318183

> For our first experiment, we used ClickBench, an analytical database benchmark. ClickBench has 43 queries that focus on aggregation and filtering operations. The operations run on a single wide table with 100M rows, which uses about 14 GB when serialized to Parquet and 75 GB when stored in CSV format.

very much so…

rrr_oh_man · 2026-03-12T12:36:35 1773318995

In my former life as a soulless consultant mid-level IT managers really liked to hear the 3 "V"s mentioned: Velocity, Volume, Variety

speedgoose · 2026-03-12T12:38:09 1773319089

The V of Value is very important in some circles.

speedgoose · 2026-03-12T12:37:33 1773319053

Computers got bigger and software got smarter.

You have phones that are faster than cloud VMs of the past. You can use bare metal servers with up to 344 cores and 16TB of ram.

I used to share your definition too, but I now say that if it doesn’t open in Microsoft Excel, it’s big data.

Zambyte · 2026-03-12T12:43:28 1773319408

Processing data that cannot be processed on a single machine is fundamentally a different problem than processing data that can be processed on a single machine. It's useful to have a term for that.

As you say, single machines can scale up incredibly far. That just means 16 TB datasets no longer demand big data solutions.

speedgoose · 2026-03-12T12:49:15 1773319755

I get your point, but I don’t know if big data is the right term anymore.

Many people like to think they have big data, and you kinda have to agree with them if you want their money. At least in consulting.

Also you could go well beyond a 16TB dataset on a single machine. You assume that the whole uncompressed dataset has to fit in memory, but many workloads don’t need that.

How many people in the world have such big datasets to analyse within reasonable time?

Some people say extreme data.

brudgers · 2026-03-12T12:50:22 1773319822

“Your data isn’t big” is a good working definition of big data.

Google has big data. You are not google.

antonyh · 2026-03-12T14:50:52 1773327052

I think the definition of big is smaller than that. Mine was "too big to fit on a maxed-out laptop", effectively >8TB. Our photo collection is bigger than that, it's not 'big data'.

Or one could define it as too big to fit on a single SSD/HDD, maybe >30TB. Still within the reach of a hobbyist, but too large to process in memory and needs special tools to work with. It doesn't have to be petabyte scale to need 'big data' tooling.

brudgers · 2026-03-12T23:44:39 1773359079

“Your data is not big” comes from this thread…https://news.ycombinator.com/item?id=7192839

8TB is a couple hundred hours of 4k RAW video assets.

antonyh · 2026-03-16T17:21:35 1773681695

This is true, but 8TB is big data if it's text.

bcye · 2026-03-12T12:19:44 1773317984

I think they are simply referring to analytical workloads.

michalc · 2026-01-31T12:07:56 1769861276

Hmmm... depends on the project / phase of the project?

I am particularly not a fan of doing unnecessary work/over engineering, e.g. see https://charemza.name/blog/posts/agile/over-engineering/not-..., but even I think that sometimes things _are_ worth it

michalc · 2026-01-12T22:01:35 1768255295

Short answer is no, not as far as I am aware/can reason about it

In more detail: so by my understanding there are two techniques in making zip bombs…

Firstly nested ZIPs that leverage the fact that some unZIP programs recursively extract member files. stream-unzip doesn’t do this (Although you could probably use stream-unzip as a component in a vulnerable recursive ZIP parser if you really wanted to… but that I would argue is not the responsibility of stream-unzip)

The second technique is overlapping member files, but this depends on them overlapping as defined by the central directory at the end of the ZIP, which stream-unzip does not use

But if you are accepting files from an untrusted source, then you should validate the size of the uncompressed data as you unZIP (which you can do as you validate along with any other properties of the data)

michalc · 2026-01-04T14:18:13 1767536293

> Beyond that, I've grown fond of 'sticking to the defaults' over the years.

This resonates with me! Both in terms of things I use and things I make - I want them to "just work"

michalc · 2026-01-04T13:13:51 1767532431

> without regard for the maintenance burden 1, 2, 5, 10 years down the road.

To me software craftsmanship isn't just about the code, it's about engineering use of time.

In general shouldn't knowingly make choices that would result in pain in the future, but if you're increasing the chance of the project not making it to the future, then is that really the better option? Finding out enough information to make the judgement call between long term/far future pain and short term benefits is all part of the craftsmanship.

> I don't blame agile. But I do kind of blame Agile™

(Loving the phrasing here! I think I'm right on board, especially if we're talking Scrum/Scum-ish)

michalc · 2026-01-04T09:57:19 1767520639

> why not remind people of the purpose?

To answer this, I suspect that trying to change what certain words/phrases mean to people en-masse is extremely difficult, to the point of impossibility in most cases. However, we each have the power to be clearer in the words we use so they are understood by the people we're communicating with.

> engineering quality matters

But also, this to me suggests that there is some sort of absolute definition of quality, but it's much more nuanced. Nothing is inherently "bad quality", but instead has certain consequences, which may or may not happen or may or may not be acceptable in certain circumstances, and you might not even know what these are until the future. This I think is the point I'm trying to make - there is no absolute definition of engineering quality, and I suspect the term "technical debt" all too often suggests there is.

michalc · 2025-10-06T06:45:45 1759733145

Have to admit the lazy thing threw me, but I can see how the “doing less” I’m arguing for could be taken that way. The “less” is not about avoiding handling edge cases that are possible now, but about avoiding putting in layers of code to handle cases possible only in some future versions of the code (with some limited exceptions that I mention at the bottom of the post)

In fact, it’s crossing my mind that people might not want to be accused of being lazy, and that is a motivation to over-engineer solutions.