Then they get a relay of all those aggregated information out of it.
What we would like to do, is if it describes a particular place, and we know the place already, we would like to convert it to some kind of geotagging. We’re geotagging it.
There’s an enrichment, exactly. Then, of course, the reported time may not be the same as the time that it actually happened. You also want that part of the ETL, which is the time part, not just the space part.
The end result, what we want, of course, is a layer on top of a public-visible map, like OpenStreetMap, with anything that let people query what the disaster is like around your neighborhood.
Of course, the EMIC System is already under a lot of stress performance‑wise from the existing governmental users, and the existing aggregation, and the queries.
The existing hardware would not support any more ETL. What we said last meeting, even though we didn’t say EMIC specifically, is that because it’s all in this huge relational database, we need somehow to get it to export in bulk regularly, with a clean data description language that we can amend as the enrichment goes on.
For example, out of every message, we currently only extract time and text. At some point we would like to extend a TopoJSON-compatible description, like when it talks about road we want the road there, which is a line or a path and so on. We don’t have the complete listing yet.
We want the DDL to be able to grow, and when we grow it we want the API to stay clean, compatible with existing consumers but also versioned in some way.
There’s already users in other government agency who broadcasts push notifications, and we don’t want those services to stop. It would be silly to rebuild that. We want this to continue, but against a clean API from now on which is where an data schema comes in. It’s already communicating over HTTP anyway, so why not structure it?
What we talk about is we take the DDL here and say, "Every hour get me a data package of everything that happens around EMIC, and then publish it as open data packages," and then send it to a second system which is called the NCDR system. The NCDR system currently is for disaster prediction and reduction. The R stands for reduction I think.
While NCDR doesn’t have the official mandate to publish alerts around disasters, their develops are in‑house, and they’ve got some good ArcGIS developers there. They can very easily do visualizations that not only look beautiful, but it’s also very useful.
What they don’t have at the moment is first‑class access to the aggregated database of EMIC.
Data packages obviously is a way for people who don’t have a relationship, let’s say, with each other, to nevertheless fully consume the other person’s data with the confidence that it will not break. What we are saying is that on the national open‑data platform, we would now house a schema for the data package.
The EMIC will start producing structured data, and then the existing National Development Council’s open data validation team...I don’t know whether you’ve talked to these people.
They already have a data quality program this year. They will use machines to ensure that the EMIC keeps honoring its open data promise.
Note that it’s not checking some API description. It’s just plain open data.
Now we have NCDR consuming this data, and during disaster we want the frequency to pick up.
We also want to do enrichments at the NCDR, because there are now developers in‑house. We can tell them to get all sort of push notifications, as long as it’s not called government official alerts which is outside of their purview. They can do a lot of even more value adding with for example the weather data, Water Dam control data and so on.
Also they already partner with Google and other vendors, with standard CAP protocols. Then they can publish not with their own API, but with the existing web‑based APIs for disaster recovery to the consumers of Facebook Safety Check.
And Google, and whatever. This is standardized. We don’t need to do anything here. It can also publish its own enriched data as a kind of derived data set, as also open data on the national data platform. People who want data don’t really need to look at the raw EMIC data.
They can look at the enriched data that NCDR will already have, not only the raw fields but also the enriched data fields. That’s what we planned essentially with us meeting, without me telling you the used case. That’s essentially the idea.
It’s kind of naive.
The "run" here is arbitrary dot‑separated identifiers that mean something to the tools. It doesn’t have a typology?
So it’s designed like a Docker Compose file?
So all the drawings here that we do can be described by a less naive, but still useful — usefully inspectionable — variant of data package pipeline.
Sure, of course.
Thank you for the consultation, wizard.
Really, this is great. Before this case, we’ve never really dealt with true cross‑ministry regular pipelines.
We dealt with different units in the National Development Council, and that’s working pretty well with the regulation preview and also visualization of the national budget.
These are managed by different units within the NDC, so it’s easy, and they have a common IT department.
This afternoon is the first case where we’re going in and say to two different ministries, "Hey, you’re going to do things the new way." I’m pretty excited. If this can be made to work and well‑documented, we can do this with more ministries.
Of course. It’s our basic procurement data.
This shouldn’t take long to find; it’s between $2-3 billion TWD per year.
It’s got to be less that 0.1 percent, so I don’t think anyone...
Exactly. No one bothered counting.
Actually, when I’m look at the raw numbers now, a lot of it is in services. It’s in building an IT system or maintaining an IT system, not on software licensing.
We do have a breakdown.
It’s mostly hiring services, it’s building, it’s maintaining, and very little on licensing actually.
It will vary depending whether it’s a generic or it’s a label drug.
There’s a fixed portion that goes to SMEs.
Yeah. There’s a fixed SME portion.
Just a second. Let me double‑check before I say anything silly.
There was a huge push around OpenOffice in 2013, which is what I was trying to find, because there was this justification paper that says how much licensing cost it saves over Microsoft Office subscriptions and so on.
I couldn’t quite find the report, but the original justification was very close to what you have said, which is to promote a local community around the then OpenOffice, now LibreOffice community and for people to get in touch with the Chinese text processing part of it instead of waiting for Microsoft to roll next.
The main argument was that we can save this much amount on licensing cost, which I was trying to find but couldn’t.
I’m pretty sure the IT decision makers know about this, but they don’t usually consider it practical. [laughs]
They have heard of this argument a lot.
I don’t think it’s personal. It’s insecure, not HTTPS, and embedded from a HTTPS site.
That is what it is. [laughs]
You probably want to run Let’s Encrypt on your website.