Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching
Since local LLM inference on resource-constrained edge devices imposes a severe performance bottleneck, this paper proposes distributed prompt caching...
Tell Me What To Learn: Generalizing Neural Memory to be Controllable in Natural Language
Modern machine learning models are deployed in diverse, non-stationary environments where they must continually adapt to new tasks and evolving knowle...
Towards Autonomous Graph Data Analytics with Analytics-Augmented Generation
This paper argues that reliable end-to-end graph data analytics cannot be achieved by retrieval- or code-generation-centric LLM agents alone. Although...
Attention Is All You Need
Revolutionary paper introducing the Transformer architecture that became the foundation for modern language models.
Bigtable
Google's Distributed Storage System for Structured Data.
Dynamo
Amazon's distributed key-value storage system designed for high availability and eventual consistency.
Google File System
Google's scalable distributed file system designed for large distributed data-intensive applications.
Kafka
LinkedIn's distributed streaming platform designed for high-throughput, low-latency data streaming.
MapReduce
Google's simplified data processing on large clusters.
RAFT
Raft is a consensus algorithm for managing a replicated log that is easier to understand than Paxos while providing equivalent functionality and efficiency.
