AUTOSCALING IN ALCHEMISCALE TECHNICAL DESIGN DOCUMENT
The autoscaling API allows custom, platform specific manager services to execute automatic horizontal resource allocation.
Background
Currently, the number of alchemiscale compute services needs to be manually managed by administrators. This is tedious and error prone, especially when balancing across a heterogeous set of compute platforms.
Objectives
In order to decrease the need for administrators to micromanage compute services, we aim to define a protocol to enable the implementation of a "compute manager" application.
These compute managers will allow compute services to be horizontally scaled based on the demand, as determined by the claimable tasks in the statestore.
Requirements
- Compute services growth can be made automatic, dictated by a compute manager's settings and statestore contents.
- Compute managers can be signaled to stop creating new services.
- Scaling descisions will be based on throughput estimates and the current number of compute services registered.
- A maximum number of instances will be enforced to prevent resource exhaustion.
Design
A compute manager is responsible for communicating with the Alchemicale compute API to determine whether or not to allocate more resources in response to the number of claimable tasks. The compute API will issue a signal (and an optional data payload) with how the manager should proceed:
GROW
,int
- The compute manager is allowed to scale up. The number of claimable tasks is included along with the signal.
SKIP
,None
- The compute manager should skip this cycle and try again later.
SHUTDOWN
,string
- The compute manager should shut down. A string is provided to indicate why a shutdown was requested and should ideally be included in the compute manager log before shutting down.
In order to scale up, compute services, which claim tasks and execute their associate protocols, are created programatically. In addition to the number of tasks, a compute manager can also query the statestore about the compute services that it was responsible for creating. The compute services are created by a compute manager from a predefined computer service settings template file.
We will make the assumption that a compute manager won't have direct access to the compute services that it creates. This mostly accomodates for systems that are running on a queuing system, such as slurm. Because of this, created services must be able to cleanly shut themselves down. This can be triggered by either a time constraint, the number of executed tasks, or the number of empty task claims, all of which are configurable when creating the compute service. Therefore, while the compute manager has the responsiblity of up-scaling, the created services are responsible for down-scaling.
A note on uniqueness and manager identities
It is possible that competing managers will erroneously be duplicated during deployment by an administrator. Requiring a unique ID to deploy a compute manager will allow the alchemiscale server to limit communication to one instance of a manager with a given ID. This unique ID will also be tracked by the compute services that result from the compute manager, allowing managed services to be reassociate with a restarted compute manager. Should a manager fail unexpectedly, the alchemiscale server will not allow the registration of a new manager until the previous registration has been flushed.
In order to effectively differentiate compute managers sharing the same manager ID, a uuid (v4) is introduced. This will allow new compute managers to take over the responsibilities of computer managers that may have not been able to cleanly shut down or who are unable to communicate with server due to network issues. Only a compute manager with the correct manager ID and uuid will be given the affirmative signal for compute upscaling.
The two identifiers will be combined into a single identifier matching that pattern {manager_id}-{uuid4}
.
Implementation
Introduction of autoscaling will require a new client, an additional model in the statestore, new api endpoints, and an abstract class for implementing new compute managers. Additionally, ComputeServiceRegistration
nodes in the statestore will have an additional field referencing the name of a compute manager.
A new client supporting compute managers
While compute managers share a lot of overlap with compute services, the AlchemiscaleComputeClient
provides functionality that shouldn't be exposed to the manager, such as task claiming. We will define a new, minimal client that fulfills the requirements for a compute manager.
class ComputeManagerClient(AlchemiscaleBaseClient): def register(self) -> ComputeManagerID: ... def deregister(self) -> ComputeManagerID: ... def heartbeat(self) -> ComputeManagerID: ... def get_instruction(self) -> (ComputeManagerInstruction, int | str | None): ... def get_registered_compute_services(self) -> list[ComputeServiceID]: ...
where ComputeManagerID
is a subclass of str
:
class ComputeManagerID(str): @property def uuid(self): return self.split('-')[0] @property def name(self): return self.split('-')[1]
and the ComputeManagerInstruction
is a StrEnum
, reflecting the signals specified in the design section:
class ComputeManagerInstruction(StrEnum): GROW = "GROW" SKIP = "SKIP" SHUTDOWN = "SHUTDOWN"
Statestore represenatation
A compute manager must be registered with the alchemiscale statestore, similarly to how compute services are registered.
class ComputeManagerRegistration: compute_manager_id: str registered: datetime heartbeat: datetime
New API methods
New methods will be added to the compute API.
register_computemanager
-
Attempt to register a compute manager with the statestore. If there is already a compute manager
deregister_computemanager
-
Deregister a compute manager from the statestore. This leaves any
ComputeServiceRegistration
nodes resulting from the manager intact. created_compute_service_ids
-
Given a
ComputeManagerID
, return a list ofComputeServiceID
strings that were created by that compute manager.
Compute manager abstract class
The compute manager abstract class structure largely follows the current implemantion of the SynchronousComputeService
.
class ComputeManager: compute_manager_id: ComputeManagerID heartbeat_interval: int sleep_interval: int client: AlchemiscaleComputeClient service_settings_template: bytes
class ComputeManagerSettings: ...
The implementation requires a definition of a new
interface, ComputeManager
, which defines the following methods:
-
register
-
request_claimable
-
check_managed_compute_services
-
create_compute_service
(abstract) -
introspect_local
(abstract)
The register
method communicates to the alchemiscale compute API the unique ID given to the compute manager. If a compute manager with that ID already exists in the statestore, None
is returned.
Changes to ComputeServiceRegistration
and the compute service register
method
ComputeServiceRegistration
nodes will need to keep track of the compute managers responsible for their creation. To support this at the statestore level, we need only add another string field, which can optionally store a compute manager name.
Testing and validation
Since this feature defines the interface through which a specific compute manager interacts with the database and creates compute services, the majority of testing is left to those specific implementations.
Risks
Since this feature interacts directly with the allocation of high performance computing resources, implementation errors and deployment misconfigurations manifest as a very real, potentially ballooning financial cost. Communication of these risks and putting appropriate guardrails in place is paramount.