Why WOS왜 WOS인가

Memory that works in every language — and cuts your LLM bill ~100×.모든 언어에서 작동하는 기억 — 그리고 LLM 비용을 ~100× 절감.

Store and recall user memories with one API. Pure semantic retrieval — no keyword matching, no LLM in the loop. Same accuracy in any language, at a fraction of the tokens.하나의 API로 사용자 기억을 저장하고 회수합니다. 순수 의미 검색 — 키워드 매칭도, 루프 속 LLM도 없습니다. 어떤 언어에서도 같은 정확도, 토큰은 일부만.

~100×

lower LLM costLLM 비용 절감

every language모든 언어

identical recall동일한 검색 품질

~320ms

p50 retrievalp50 검색 지연

85.2%

LongMemEval-S

~1,200

tokens / query쿼리당 토큰

Two pillars competitors can't structurally copy경쟁사가 구조적으로 따라올 수 없는 두 축

Multilingual.다국어. Pure semantic retrieval, zero keyword/BM25 — identical recall in 한국어 · 日本語 · 中文 · English. Systems built on English BM25 break on CJK.순수 의미 검색, 키워드/BM25 전혀 없음 — 한국어 · 日本語 · 中文 · English에서 동일한 검색 품질. 영어 BM25 기반 시스템은 CJK에서 무너집니다.
Cost.비용. Bounded retrieval — your LLM gets ~1,200 relevant tokens, not the full history. That's where the ~100× saving comes from.제한된 검색 — LLM에 전체 히스토리가 아니라 ~1,200개의 관련 토큰만 전달. 여기서 ~100× 절감이 나옵니다.

Pick a tab on the left왼쪽에서 탭을 선택하세요 to see each advantage in detail — start with the cost story.— 각 강점을 자세히 볼 수 있어요. 비용 스토리부터 보세요.

Cost비용

You pay us $1. We save you ~$100 on your LLM.당신은 $1을 내고, LLM에선 ~$100을 아낍니다.

WOS feeds your LLM ~1,200 tokens per query — a bounded, relevant slice — instead of stuffing the full history into every prompt. The gap is enormous, and it grows with your history.WOS는 매 프롬프트에 전체 히스토리를 욱여넣는 대신, 쿼리당 ~1,200개의 관련 토큰만 LLM에 전달합니다. 그 격차는 막대하고, 히스토리가 커질수록 더 벌어집니다.

LLM cost per 1,000 queries1,000 쿼리당 LLM 비용

User history사용자 히스토리100K

Queries / month월 쿼리 수1,000

Your LLM사용 LLM

45× cheaper — you save $244/mo더 저렴 — 월 $244 절감

Without WOSWOS 없이$250.00

With WOSWOS 사용$5.50

Every $1 spent on WOS saves ~$98 on the LLM. Bigger history or a pricier model → bigger ROI.WOS에 쓴 $1마다 LLM에서 ~$98을 아낍니다. 히스토리가 크거나 모델이 비쌀수록 ROI가 커집니다.

Where the savings come from절감이 나오는 곳

Without WOS you stuff the whole history into each prompt — 100K tokens × $2.50/1M = $0.25 every single query.WOS 없이는 매 프롬프트에 전체 히스토리를 넣습니다 — 100K 토큰 × $2.50/1M = $0.25, 매 쿼리마다.
With WOS you ingest once ($1/1M), then each query is a tiny retrieve ($2/1M × 1,200) plus your LLM on just ~1,200 tokens.WOS는 한 번만 적재($1/1M)하고, 이후 각 쿼리는 작은 검색($2/1M × 1,200)과 ~1,200 토큰에 대한 LLM 호출뿐입니다.
The fewer tokens your LLM reads, the less you pay — and WOS keeps that number flat as memory grows.LLM이 읽는 토큰이 적을수록 비용이 줄고, WOS는 기억이 커져도 그 수치를 일정하게 유지합니다.

Reduction factor = history ÷ 1,200. 25K → 21× · 100K → 83× · 200K → 167×.감축 배수 = 히스토리 ÷ 1,200. 25K → 21× · 100K → 83× · 200K → 167×.

Multilingual다국어

Every language, the same accuracy.모든 언어, 같은 정확도.

Retrieval is pure semantic — embeddings only, zero keyword or BM25 matching. So recall quality is identical whether your users write in 한국어, 日本語, 中文, or English.검색은 순수 의미 기반입니다 — 임베딩만 쓰고 키워드나 BM25 매칭은 전혀 없습니다. 그래서 사용자가 한국어, 日本語, 中文, English 무엇으로 적어도 검색 품질이 동일합니다.

Most memory systems lean on BM25 or entity matching to boost their numbers. Those are English-shaped: they break on CJK morphology, spacing, and script. WOS never touches keywords, so there's nothing to break.대부분의 기억 시스템은 점수를 올리려고 BM25나 엔터티 매칭에 기댑니다. 그건 영어에 맞춰진 방식이라 CJK의 형태소·띄어쓰기·문자 체계에서 무너집니다. WOS는 키워드를 아예 건드리지 않으므로 무너질 것이 없습니다.

LongMemEval is English-only — it literally can't measure this. The multilingual edge is structural, not a benchmark trick.LongMemEval은 영어 전용이라 이 점을 측정조차 못 합니다. 다국어 강점은 벤치마크 트릭이 아니라 구조적인 것입니다.

Latency지연

~320ms — and it stays there.~320ms — 그리고 그대로 유지됩니다.

Retrieval latency barely moves as memory grows, because there's no language model in the path — just embeddings and a reranker.검색 경로에 언어 모델이 없고 임베딩과 리랭커만 있기 때문에, 기억이 커져도 지연이 거의 변하지 않습니다.

320ms

at 144K memories stored14.4만 기억 저장 시

400ms

at 1.4M memories stored140만 기억 저장 시

A 10× larger store adds about 80ms. Your users feel a snappy assistant, not a database scan.저장량이 10배 늘어도 약 80ms만 추가됩니다. 사용자는 데이터베이스 스캔이 아니라 빠릿한 어시스턴트를 경험합니다.

Architecture구조

No LLM to store or search.저장에도 검색에도 LLM이 없습니다.

Storing and searching use embeddings only — cheap, fast, and deterministic. A language model never sits in the retrieval path, so there's nothing to hallucinate, rate-limit, or run up a bill.저장과 검색은 임베딩만 사용합니다 — 저렴하고 빠르고 결정론적입니다. 언어 모델이 검색 경로에 들어가지 않으므로 환각도, 레이트 리밋도, 비용 폭증도 없습니다.

Deterministic. The same query returns the same memories every time — which is why our benchmark variance comes only from the reader model.결정론적. 같은 쿼리는 매번 같은 기억을 반환합니다 — 그래서 벤치마크 편차가 리더 모델에서만 발생합니다.
Cheap at scale. No generation cost to store or retrieve, so your bill tracks storage — not model usage — as memory grows.대규모에서도 저렴. 저장·검색에 생성 비용이 없어, 기억이 커져도 비용이 모델 사용량이 아니라 저장량을 따라갑니다.

Proof근거

An honest, reproducible 85.2%.정직하고 재현 가능한 85.2%.

85.2% on LongMemEval-S, averaged over 5 independent runs (σ 1.1%, none cherry-picked), graded by the benchmark's canonical GPT-4o judge.LongMemEval-S에서 85.2%, 5회 독립 실행 평균(σ 1.1%, 체리피킹 없음), 벤치마크 표준 채점자인 GPT-4o로 채점.

Competitors' headline numbers are usually self-graded — the same model writes and grades the answer — with a benchmark-fitted prompt and English keyword retrieval. Under the canonical judge, strong systems land in the 57–73% range, so an honest 85.2% is genuinely high. We publish the scoring code and reader prompt so anyone can reproduce it.경쟁사의 대표 수치는 대개 자가 채점입니다 — 같은 모델이 답을 쓰고 채점하죠 — 벤치마크에 맞춘 프롬프트와 영어 키워드 검색까지 더해서요. 표준 채점자로 보면 강력한 시스템도 57–73% 범위에 머무릅니다. 그래서 정직한 85.2%는 실제로 높은 값입니다. 채점 코드와 리더 프롬프트를 공개해 누구나 재현할 수 있습니다.

See the full report전체 리포트 보기

For developers개발자

Three calls: store, recall, answer.세 번의 호출: 저장, 회수, 답변.

One API. The recall() call is the differentiator — it returns short-term, long-term, and surrounding context in a single round-trip, ready to drop into your prompt.하나의 API. 핵심은 recall() 호출입니다 — 단기·장기 기억과 주변 컨텍스트를 한 번의 왕복으로 돌려줘, 프롬프트에 바로 넣을 수 있습니다.

Store저장

add() your user's facts and turns. Embedded on the way in — no LLM.add()로 사용자의 사실과 대화를 저장합니다. 적재 시 임베딩 — LLM 없음.

Recall회수

recall() returns short-term + long-term + context in one call — about ~1,200 tokens.recall()이 단기 + 장기 + 컨텍스트를 한 번에 반환합니다 — 약 ~1,200 토큰.

Answer답변

Feed that bounded context to your LLM — any provider, your key.그 제한된 컨텍스트를 당신의 LLM에 전달하세요 — 어떤 제공사든, 당신의 키로.

from wontopos import Client
mem = Client(api_key="wos-...")
mem.add("she prefers tea over coffee", user_id="alice")
ctx = mem.recall("what does alice drink?", user_id="alice")  # short + long + context

SDKsSDK

Five languages, one surface.5개 언어, 하나의 인터페이스.

The same simple methods everywhere: add · search · recall · update · delete.어디서나 같은 단순한 메서드: add · search · recall · update · delete.

Python — pip install wontopos
TypeScript / JavaScript — npm install wontopos
Go — go get github.com/wontopos/go-sdk
Rust — cargo add wontopos

The differentiator is recall() — one call gives short-term + long-term + context, ready for the prompt.차별점은 recall()입니다 — 한 번의 호출로 단기 + 장기 + 컨텍스트를, 프롬프트에 바로 쓸 수 있게.

Scale확장

Beyond the context window.컨텍스트 윈도를 넘어서.

WOS recalls from histories of 1.4M tokens — far larger than any LLM context window — and still hands back a tight ~1,470-token slice.WOS는 어떤 LLM 컨텍스트 윈도보다도 큰 140만 토큰 히스토리에서도 회수하고, 여전히 ~1,470 토큰의 짧은 조각만 돌려줍니다.

Your agent's memory isn't capped by what fits in a prompt. It keeps everything and retrieves only what matters — the first published LongMemEval-M numbers back this up.에이전트의 기억은 프롬프트에 들어가는 양에 갇히지 않습니다. 전부 보관하고 중요한 것만 회수합니다 — 최초 공개된 LongMemEval-M 수치가 이를 뒷받침합니다.

Privacy프라이버시

Private, and yours.비공개, 그리고 당신의 것.

Your data stays in your store. We never train on it, view it, or reuse it — we only organize it so you can retrieve it.데이터는 당신의 저장소에 머뭅니다. 저희는 그것으로 학습하지도, 열람하지도, 재사용하지도 않습니다 — 회수할 수 있게 정리만 합니다.

BYOK. Your LLM key is sent per request and never stored.BYOK. LLM 키는 요청마다 전달되며 저장되지 않습니다.
Isolated. Memories are scoped per account, then per user_id.격리. 기억은 계정별, 그리고 user_id별로 분리됩니다.
GDPR delete & self-host. One call wipes a user; run the engine in your own environment if you prefer.GDPR 삭제 & 셀프 호스팅. 한 번의 호출로 사용자를 삭제하고, 원하면 엔진을 자체 환경에서 운영할 수 있습니다.