If you’re looking for requests ;-), I would love an ECS (and specifically Fargate) emulator that actually ran Docker containers locally as though they were in ECS
I think I can understand why this wasn’t addressed for so long: in the vast majority of cases if your db is exposed on a network level to untrusted sources, then you probably have far bigger problems?
it's also very tricky to do given the current architecture on the server side where one single-threaded process handles the connection and uses (for all intents and purposes) sync io.
In such a scenario, listening (and acting) on cancellation requests on the same connection becomes very hard, so fixing this goes way beyond "just".
I think it's partly tongue in cheek, because when "big data" was over hyped, everyone claimed they were working with big data, or tried to sell expensive solutions for working with big data, and some reasonable minds spoke up and pointed out that a standard laptop could process more "big data" than people thought.
> For our first experiment, we used ClickBench, an analytical database benchmark. ClickBench has 43 queries that focus on aggregation and filtering operations. The operations run on a single wide table with 100M rows, which uses about 14 GB when serialized to Parquet and 75 GB when stored in CSV format.
Processing data that cannot be processed on a single machine is fundamentally a different problem than processing data that can be processed on a single machine. It's useful to have a term for that.
As you say, single machines can scale up incredibly far. That just means 16 TB datasets no longer demand big data solutions.
I get your point, but I don’t know if big data is the right term anymore.
Many people like to think they have big data, and you kinda have to agree with them if you want their money. At least in consulting.
Also you could go well beyond a 16TB dataset on a single machine. You assume that the whole uncompressed dataset has to fit in memory, but many workloads don’t need that.
How many people in the world have such big datasets to analyse within reasonable time?
I think the definition of big is smaller than that. Mine was "too big to fit on a maxed-out laptop", effectively >8TB. Our photo collection is bigger than that, it's not 'big data'.
Or one could define it as too big to fit on a single SSD/HDD, maybe >30TB. Still within the reach of a hobbyist, but too large to process in memory and needs special tools to work with. It doesn't have to be petabyte scale to need 'big data' tooling.
Short answer is no, not as far as I am aware/can reason about it
In more detail: so by my understanding there are two techniques in making zip bombs…
Firstly nested ZIPs that leverage the fact that some unZIP programs recursively extract member files. stream-unzip doesn’t do this (Although you could probably use stream-unzip as a component in a vulnerable recursive ZIP parser if you really wanted to… but that I would argue is not the responsibility of stream-unzip)
The second technique is overlapping member files, but this depends on them overlapping as defined by the central directory at the end of the ZIP, which stream-unzip does not use
But if you are accepting files from an untrusted source, then you should validate the size of the uncompressed data as you unZIP (which you can do as you validate along with any other properties of the data)
> without regard for the maintenance burden 1, 2, 5, 10 years down the road.
To me software craftsmanship isn't just about the code, it's about engineering use of time.
In general shouldn't knowingly make choices that would result in pain in the future, but if you're increasing the chance of the project not making it to the future, then is that really the better option? Finding out enough information to make the judgement call between long term/far future pain and short term benefits is all part of the craftsmanship.
> I don't blame agile. But I do kind of blame Agile™
(Loving the phrasing here! I think I'm right on board, especially if we're talking Scrum/Scum-ish)
To answer this, I suspect that trying to change what certain words/phrases mean to people en-masse is extremely difficult, to the point of impossibility in most cases. However, we each have the power to be clearer in the words we use so they are understood by the people we're communicating with.
> engineering quality matters
But also, this to me suggests that there is some sort of absolute definition of quality, but it's much more nuanced. Nothing is inherently "bad quality", but instead has certain consequences, which may or may not happen or may or may not be acceptable in certain circumstances, and you might not even know what these are until the future. This I think is the point I'm trying to make - there is no absolute definition of engineering quality, and I suspect the term "technical debt" all too often suggests there is.
Have to admit the lazy thing threw me, but I can see how the “doing less” I’m arguing for could be taken that way. The “less” is not about avoiding handling edge cases that are possible now, but about avoiding putting in layers of code to handle cases possible only in some future versions of the code (with some limited exceptions that I mention at the bottom of the post)
In fact, it’s crossing my mind that people might not want to be accused of being lazy, and that is a motivation to over-engineer solutions.
If you’re looking for requests ;-), I would love an ECS (and specifically Fargate) emulator that actually ran Docker containers locally as though they were in ECS
reply