#low-latency #lru-cache #resp #redis #protocols #framework #table

jupiter

Jupiter is a library for providing high-throughput ultra low latency services via the RESP protocol as defined by Redis

47 releases (17 stable)

3.1.4 Jul 22, 2024
3.1.3 Jun 3, 2024
3.1.2 Aug 23, 2023
3.1.1 Mar 28, 2023
0.0.1 Nov 11, 2019

#22 in Caching

MIT license

545KB
9K SLoC

JUPITER

Jupiter is a framework for wrapping compute or memory intense components for providing them as high throughput and ultra low latency services to applications built on managed runtimes like node.js, Java, Ruby. Jupiter uses the RESP protocol used by Redis to maintain a low overhead when communicating with the host application.

We at scireum use Jupiter in conjunction with our open source Java framework SIRIUS to build web based applications. sirius-biz provides a bunch of tooling to connect a Java application to Jupiter and also to maintain and monitor operations.

Next to providing a framework for custom services, Jupiter also provides some common modules:

  • LRU-Cache: An size constraint cache with an intelligent refresh strategy which can be used to maintain low latency response times by employing a coordinated asynchronous cache update pattern (see LRU.XGET). It also supports cache-coherence (flushes) by secondary keys (see LRU.PUTS / LRU.REMOVES).
  • InfoGraphDB: Provides a fast and flexible static database for master data. Using the Repository this can be used to load master data from e.g. an S3 Bucket or a git repository into fast lookup tables or code sets. These permit to perform all kinds of lookups, reverse-lookups, "search as you type" searches and automatic translation management (even for tables with thousands of rows / structured documents).
  • Repository: The repository is used to fetch files from various sources and invoking appropriate loaders so that the data can be used (e.g. as IDB table or set). More info on loaders: repository

More info and a detailed description can be found in the Documentation.

Deployment

Deployment-wise we heavily settle on Docker, therefore we simply log to stdout and use SIGHUP to detect a shutdown request.

Although, Jupiter is intended to be used as library to build custom services, a standalone docker image is provided under Jupiter IO - IO as in the moon of jupiter, not I/O operations :).

Jupiter IO has all modules (as listed above) enabled. Therefore, you can have a go by calling:

> docker run -p 2410:2410 scireum/jupiter-io:latest &
> redis-cli -p 2410

> PING or
> SYS.COMMANDS

Note that the two volumes /jupiter/config and /jupiter/repository should be mounted to docker volumes or external directories so that the data is kept alive once the container is updated.

Configuration

The configuration is loaded from settings.yml - modification of this file are detected and distributed within the framework. Also, the application specific config can be pushed in by sending SYS.SET_CONFIG.

A basic configuration would only specify the ip and port to bind to (if the default settings aren't feasible):

server:
    host: "0.0.0.0"
    port: 2410

Repository

As a single repository might be shared by multiple applications which always sync all files into their Jupiter server, files can be put into namespaces and the config can determine which namespaces are enabled:

repository:
    namespaces: ["core", "test", "foo" ]

LRU Cache

Each cache can be configured to limit its size and control its lifetime parameters:

caches:
    my_cache:
        # Specifies the maximal number of entries to store
        size: 1024
        # Specifies the maximal amount of memory to use (in bytes).
        # Supports common suffixes like: k, m, g, t
        max_memory: 1g
        # Specifies the soft time to live. After this period, an entry is considered stale
        # and will not be delivered by LRU.GET. However, LRU.XGET will deliver this entry
        # but mark it as stale. Supports common suffixes like: s, m, h, d
        soft_ttl: 15m
        # Specifies the hard time to live. After this persiod, neither LRU.GET nor LRU.XGET
        # will deliver this entry.
        hard_ttl: 1d
        # Specifies the refresh interval for LRU.XGET. If this command delivers a stale entry
        # (as defined by soft_ttl), it indicates that the entry is stale an should be
        # refreshed. However, once this has to be signalled to a client, it will no longer
        # request a refresh from other clients until either the entry has been refresehd or
        # this refresh interval has elapsed.
        refresh_interval: 30s

Commands

If all modules are enabled, the following commands are available.

Core Module

  • SYS.COMMANDS lists all available commands.
  • SYS.CONNECTIONS lists all active client connections.
  • SYS.KILL terminates the connection to the client with the given ip.

Repository

  • REPO.SCAN re-scans the local repository contents on the local disk. This automatically happens at startup and is only required if the contents of the repository are changed by an external process.
  • REPO.FETCH file url instructs the background actor to fetch a file from the given url. Note that the file will only been fetched if it has been modified on the server since it was last fetched.
  • REPO.STORE file contents stores the given string contents in a file.
  • REPO.FETCH_FORCED file url also fetches the given file, but doesn't perform any "last modified" checks as REPO.FETCH would.
  • REPO.LIST lists all files in the repository. Note that this will yield a more or less human-readable output whereas REPO.LIST raw will return an array with provides a child array per file containing filename, filesize, last modified.
  • REPO.DELETE file deletes the given file from the repository.
  • REPO.INC_EPOCH immediately increments the epoch counter of the foreground actor and schedules a background tasks to increment the background epoch. Calling this after some repository tasks have been executed can be used to determine if all tasks have been handled.
  • REPO.EPOCHS reads the foreground and background epoch. Calling first REPO.INC_EPOCHand then REPO.EPOCHS one can determine if the background actor is currently working (downloading files or performing loader tasks) or if everything is handled. As INC_EPOCH is handled via the background loop, the returned epochs will differ, as long as the background actors is processing other tasks. Once the foreground epoch and the background one are the same, one can assume that all repository tasks have been handled.

LRU Cache

  • LRU.PUT cache key value will store the given value for the given key in the given cache.
  • LRU.PUTS cache key value secondary_key1 .. secondary_keyN will store the given value for the given key in the given cache. Note that the value can only be queried using the given key, but it cann be purged from the cache using one of the given secondary key using LRU.REMOVES.
  • LRU.GET cache key will perform a lookup for the given key in the given cache and return the value being stored or an empty string if no value is present.
  • LRU.XGET cache key will behave just like LRU.GET. However, its output is a bit more elaborate. It will always respond with three values: ACTIVE, REFRESH, VALUE. If no value was found for the given key, ACTIVE and REFRESH will be 0 and VALUE will be an empty string. If a non-stale entry was found, ACTIVE is 1, REFRESH is 0 and VALUE will be the value associated with the key. Now the interesting part: If a stale entry (older than soft_ttl but younger than hard_ttl) was found, ACTIVE will be 0. For the first client to request this entry, REFRESH will be 1 and the VALUE will be the stale value associated with the key. For all subsequent invocations of this command, REFRESH will be 0 until either the entry was updated (by calling LRU.PUT) or if the refresh_interval has elapsed since the first invocation. Using this approach one can build "lazy" caches, which refresh on demand, without slowing the requesting client down (stale content can be delivered quickly, if the application accepts doing so) and also without overloading the system, as only one client will typically try to obtain a fresh value instead of all clients at once.
  • LRU.REMOVE cache key will remove the value associated with the given key. Note that the value will be immediately gone without respecting any TTL.
  • LRU.REMOVES cache secondary_key will remove all values which were associated with the given secondary key using LRU.PUTS.
  • LRU.FLUSH cache will wipe all contents of the given cache.
  • LRU.STATS will provide an overview of all active caches. LRU.STATS cache will provide detailed metrics about the given cache.
  • LRU.KEYS cache filter can be used to retrieve all keys which contain the given filter (in their key). Note that the filter can also be omitted. However, only the first 100 matches will be returned in either case.

InfoGraphDB

  • IDB.LOOKUP table search_path filter_value path1 path2 path3 Performs a lookup for the given filter value in the given search path (inner fields separated by ".") within the given table. If a result is found, the values for path1..pathN are extracted and returned. If no path is given, the number of matches is returned. If multiple documents match, only the first one is returned. Note that if a path matches an inner object (which is especially true for "."), the result will be wrapped as JSON. Note that IDB.LOOKUP is case-sensitive by default. However, if a fulltext index is placed on the field being queried, a case-insensitive lookup can be performed if the given filter_value is already lowercase. This might be used e.g. for reverse lookups to find a code for a given text in a certain (or any) language. See repository for a description of loaders which ultimately define the tables in IDB and their indices.
  • IDB.ILOOKUP table primary_lang fallback_lang search_path filter_value path1 Behaves just like IDB.LOOKUP. However, of one of the given extraction paths points to an inner map, we expect this to be a map of translation where we first try to find the value for the primary language and if none is found for the fallback language. Note that, if both languages fail to yield a value, we attempt to resolve a final fallback using xx as language code. If all these attempts fail, we output an empty string. Note that therefore it is not possible to return an inner map when using ILOOKUP which is used for anything other than translations. Note however, that extracting single values using a proper path still works. See IDB.LOOKUP for details when this is case-sensitive and when it isn't.
  • IDB.QUERY table num_skip max_results search_path filter_value path1 Behaves just like lookup, but doesn't just return the first result, but skips over the first num_skip results and then outputs up to max_result rows. Not that this is again limited to at most 1000. See IDB.LOOKUP for details when this is case-sensitive and when it isn't.
  • IDB.QUERY table primary_lang fallback_lang num_skip max_results search_path filter_value path1 Provides essentially the same i18n lookups for IDB.QUERY as IDB.ILOOKUP does for IDB.LOOKUP. See IDB.LOOKUP for details when this is case-sensitive and when it isn't.
  • IDB.SEARCH table num_skip max_results search_paths filter_value path1 Performs a search in all fields given as search_paths. This can either be comma separated like "path1,path2,path3" or a "*" to select all fields. Note that for a given search value, this will match case-insensitive and also for prefixes of a detected word within the document (the selected fields). Everything else behaves just like IDB.QUERY. Also note that a fulltext index has to be present for each field being queried.
  • IDB.ISEARCH table primary_lang fallback_lang num_skip max_results search_paths filter_value path1 Adds i18n lookups for the generated results just like IDB.IQUERY or IDB.ILOOKUP.
  • IDB.SCAN table num_skip max_results path1 path2 path3 Outputs all results by skipping over the first num_skip entries in the table and then outputting up to max_resultsrows.
  • IDB.ISCAN table primary_lang fallback_lang num_skip max_results path1 path2 path3 Again, behaves just like IDB.SCAN but provides i18n lookup for the given languages.
  • IDB.LEN table reports the number of entries in the given table
  • IDB.SHOW_TABLES reports all tables and their usage statistics.
  • IDB.SHOW_SETS reports all sets and their usage statistics.
  • IDB.CONTAINS set key1 key2 key3 reports if the given keys are contained in the given set. For each key a 1 (contained) or a 0 (not contained) will be reported.
  • IDB.INDEX_OF set key1 key2 key3 reports the insertion index for each of the given keys using one-based indices.
  • IDB.CARDINALITY set reports the number of entries in the given set

PyRun Kernels

  • PY.RUN kernel json sends the given JSON to the given kernel and returns the result.
  • PY.STATS provides an overview of all active kernels.

Sources

  • The library parts can be found in jupiter-rs
  • The example runtime Jupiter IO can be found in jupiter-io

Contributions

Contributions as issues or pull requests are always welcome. Please sign-off all your commits by adding a line like "Signed-off-by: Name " at the end of each commit, indicating that you wrote the code and have the right to pass it on as an open source.

License

Jupiter is licensed under the MIT License:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Dependencies

~22–36MB
~624K SLoC