HPCC : 9.14 Platform Release Announcement
We're thrilled to unveil the latest evolution of the HPCC Systems® Platform. Version 9.14.0 delivers powerful new capabilities that reduce cloud expenses, accelerate performance, and enhance the user experience across the board.
The Release Notes page contains a complete list of changes in this release, but here are a few noteworthy features:
Workunit Summary Page in ECL Watch
ECL Watch now displays a Workunit Summary (ECL Watch → Operations / Topology → WU Summary) that aggregates all the WUs within a given date range into unique error/warning groups. This is intended to help operations spot new error patterns on a release-by-release basis before issues arise.
A new ECL function - REGEXEXTRACT
This built-in function allows you to extract text from a STRING, a UTF-8 string, or a UNICODE string based upon a regular expression you provide. Essentially, it filters the text, returning the portions that match the provided pattern and the deleted portions separately.
Platform API Enhancement
The platform now supports configuring block compression when reading via the API. Compressed files now observe the blockedFileIOKB option on the storage plane. If you are using the Azure API to access a storage plane, this can reduce the cost of file I/O. For more details, see blockedFileIOKB.
Managed Observability Helm Chart
This is a self-contained Observability solution for the HPCC Platform. It leverages the Elastic Stack (Elasticsearch, Kibana, APM Server) and OpenTelemetry Collector to provide tracing Observability data for HPCC clusters. This project delivers a streamlined, secure, and automated way to deploy, configure, and access an Observability infrastructure for testing and development using Helm charts.
Rowservice Random Sampling Support
This allows users of the rowservice to randomly sample datasets instead of reading the entire set. For example, if a data scientist is working on a model that uses a large dataset, they don't need the entire file when working; they just need a representative slice of the dataset. With the new sampling feature, they can work faster by reading only a small portion of the dataset as they develop a new model.
It samples randomly across the entire file. So, for example, you could request the dataset be sampled with a sampling rate of 0.01, which would result in 1% of the file being returned.
The specific records in that file are randomly sampled so it won't inadvertently overrepresent some pattern within the file. Essentially, with a sample rate of 0.01, every 100 records the algorithm would choose a random record from that 100.
There is also support for setting the recordSamplingSeed used by the random generator, so you can get a reproducible sampling.