Large-scale AI/ML functionality served at low cost with caching

THE proliferation of AI model driven the decisions on THE pass little years come with A myriad of major production challenges, A of which East enriching AI models with context has high ladder (that's to say., weak latency, high Speed) has A weak cost.

For example, This challenge East confronted by model makers create recommendation systems that personalize feeds itself And need features For millions of users And hundreds of thousands of some products has TO DO >10^5 predictions by second.

HAS tackle THE challenge of in large scale functionality portion has weak cost, were excited has announce THE Tecton Portion Hidden, A on the server side hidden designed has radically reduce Infrastructure costs of functionality portion For machine learning models has high ladder. By simplify functionality caching through THE Tecton Portion Hidden, model makers get A effortlessly path has booster both performance And cost efficiency as their systems ladder has deliver always bigger impact.

In This job, GOOD discuss THE different to use case For caching features, give A preview of how has to use THE Tecton Portion Hidden, And share a few comparative analysis results that to show up has 80% p50 latency reduction And up has 95% cost reduction compared with has THE Baseline of A common AI functionality recovery pattern.
Tecton Portion Hidden to use case
Common AI applications, such as recommendation systems, personalized research, customer Targeting, And forecast, would be significantly advantage Since caching. These in large scale apps are willing has trade slightly stale functionality values Since THE hidden For major discounts In latency And cost.

For example, In A e-commerce sites product recommendation model, A of THE features could be THE average number of every day Cart additions by clients For A given product on A week. This functionality would be be unlikely has change radically on A hour, SO he could be cached For A hour.

More in general, caching features would be be beneficial In THE following scenarios:
High traffic, low cardinality key bed : Caching East ideal For to use case with high traffic (>1k QPS) Or THE even keys are Many times read. Repeat to access of THE even keys means A high hidden hit rate, which reduced answer times And THE to access cost For A request. Complex Functionality Queries: Caching would be be beneficial For features that are slow Or Dear has calculate, such as features with big aggregation intervals that require long calculation times (>100ms). Activation THE Tecton Portion Hidden In two simple not
HAS to use THE Tecton Portion Hidden, simply add two pieces of configuration: A In THE Functionality See And A In THE Functionality Service that contains THE cached Functionality See. Model makers can configure THE Tecton Portion Hidden has to recover pre-calculated functionality values Since memory.
from tecton import CacheConfig, batch_feature_view, FeatureService cache_config = CacheConfig(max_age_seconds=3600) @batch_feature_view() def my_cached_feature_view(cache_config=cache_config): back fs = FeatureService( feature_views=[my_cached_feature_view, ...], name="cached_feature_service", online_serving_enabled=True, activate_online_caching=True, ) THE max_age_seconds setting In THE CacheConfig determined THE maximum number of seconds A functionality will be cached Before he East expired. THE activate_online_caching setting determined if THE Functionality Service will attempt has to recover A cached value Since cached Functionality Views. If A Functionality See with hidden choice together East part of A Functionality Service with caching disabled, SO that Functionality See will not to recover cached values. THE include_serving_status=true metadata option In A request help check that A value East be recovered Since THE hidden. THE server answer metadata will include A Status field that noted if THE value was recovered Since THE hidden Or not. Landmarks
HAS inasmuch as...

Technology Dec 6, 2023 0 20 Add to Reading List

Large-scale AI/ML functionality served at low cost with caching

THE proliferation of AI model driven the decisions on THE pass little years come with A myriad of major production challenges, A of which East enriching AI models with context has high ladder (that's to say., weak latency, high Speed) has A weak cost.

For example, This challenge East confronted by model makers create recommendation systems that personalize feeds itself And need features For millions of users And hundreds of thousands of some products has TO DO >10^5 predictions by second.

HAS tackle THE challenge of in large scale functionality portion has weak cost, were excited has announce THE Tecton Portion Hidden, A on the server side hidden designed has radically reduce Infrastructure costs of functionality portion For machine learning models has high ladder. By simplify functionality caching through THE Tecton Portion Hidden, model makers get A effortlessly path has booster both performance And cost efficiency as their systems ladder has deliver always bigger impact.

In This job, GOOD discuss THE different to use case For caching features, give A preview of how has to use THE Tecton Portion Hidden, And share a few comparative analysis results that to show up has 80% p50 latency reduction And up has 95% cost reduction compared with has THE Baseline of A common AI functionality recovery pattern.

Tecton Portion Hidden to use case

Common AI applications, such as recommendation systems, personalized research, customer Targeting, And forecast, would be significantly advantage Since caching. These in large scale apps are willing has trade slightly stale functionality values Since THE hidden For major discounts In latency And cost.

For example, In A e-commerce sites product recommendation model, A of THE features could be THE average number of every day Cart additions by clients For A given product on A week. This functionality would be be unlikely has change radically on A hour, SO he could be cached For A hour.

More in general, caching features would be be beneficial In THE following scenarios:

High traffic, low cardinality key bed : Caching East ideal For to use case with high traffic (>1k QPS) Or THE even keys are Many times read. Repeat to access of THE even keys means A high hidden hit rate, which reduced answer times And THE to access cost For A request. Complex Functionality Queries: Caching would be be beneficial For features that are slow Or Dear has calculate, such as features with big aggregation intervals that require long calculation times (>100ms). Activation THE Tecton Portion Hidden In two simple not

HAS to use THE Tecton Portion Hidden, simply add two pieces of configuration: A In THE Functionality See And A In THE Functionality Service that contains THE cached Functionality See. Model makers can configure THE Tecton Portion Hidden has to recover pre-calculated functionality values Since memory.

from tecton import CacheConfig, batch_feature_view, FeatureService cache_config = CacheConfig(max_age_seconds=3600) @batch_feature_view() def my_cached_feature_view(cache_config=cache_config): back fs = FeatureService( feature_views=[my_cached_feature_view, ...], name="cached_feature_service", online_serving_enabled=True, activate_online_caching=True, ) THE max_age_seconds setting In THE CacheConfig determined THE maximum number of seconds A functionality will be cached Before he East expired. THE activate_online_caching setting determined if THE Functionality Service will attempt has to recover A cached value Since cached Functionality Views. If A Functionality See with hidden choice together East part of A Functionality Service with caching disabled, SO that Functionality See will not to recover cached values. THE include_serving_status=true metadata option In A request help check that A value East be recovered Since THE hidden. THE server answer metadata will include A Status field that noted if THE value was recovered Since THE hidden Or not. Landmarks

HAS inasmuch as...