Memory engine · live · p50 320ms메모리 엔진 · 가동중 · p50 320ms

Long-term memory
for AI agents,
in every language. AI 에이전트를 위한
장기 기억,
모든 언어에서 동일하게.

Same recall in any language, in ~320ms, no matter how much you store. No keyword tricks, no LLM in the retrieval path — an honest 85.2% on LongMemEval-S. 어떤 언어든 동일하게, ~320ms에, 저장량과 무관하게 검색합니다. 키워드 꼼수도, 검색 경로의 LLM도 없이 — LongMemEval-S에서 게임 안 한 정직한 85.2%.

We never train on, view, or use your data — we only organize it.당신의 데이터를 절대 학습·열람·사용하지 않습니다 — 정리만 합니다.

Get your API keyAPI 키 받기 Read the docs문서 보기

curl -X POST api.wontopos.com/api/v1/memory/recall한·일·중·영 동일 품질 · 5개 언어 SDK · 자체호스팅 가능

Identical recall quality across동일한 검색 품질을 모든 언어에서

한국어日本語中文EnglishEspañolFrançaisDeutsch

Why teams choose WOS왜 WOS인가

Three things that don't break
as your memory grows.기억이 늘어나도
무너지지 않는 세 가지.

Pure semantic recall순수 의미 검색

No keyword or BM25 matching, so recall quality is identical across languages. Mix 한국어, 日本語 and English freely — retrieval doesn't care.키워드·BM25 매칭이 없어 언어가 달라도 검색 품질이 동일합니다. 한국어·일본어·영어를 섞어도 검색은 신경 쓰지 않습니다.

CJK recall stays on par with EnglishCJK도 영어와 동등한 검색 품질

No LLM in the path검색 경로에 LLM 없음

Store, search and recall only call the embedding model — never a language model. Retrieval is deterministic, fast, and cheap. You pay embeddings, not generation.store·search·recall은 임베딩 모델만 호출합니다 — 언어 모델은 절대 부르지 않아요. 검색은 결정적이고 빠르고 저렴합니다. 생성이 아니라 임베딩 비용만 냅니다.

Embeddings only · deterministic임베딩만 호출 · 결정적

Bounded retrieval고정 크기 검색

A query returns a small, fixed-size slice — a median of ~1,200 tokens, never more than 1,700 — regardless of how much is stored. Input cost stays flat as memory grows.저장량과 무관하게 작고 고정된 크기만 돌려줍니다 — 중앙값 ~1,200토큰, 최대 1,700토큰. 기억이 늘어도 입력 비용은 그대로입니다.

<1% of stored history per query저장 이력의 쿼리당 <1%만

Try it live · runs on your key직접 시연 · 당신의 키로 작동

Don't take our word for it.
Ask the engine yourself.우리 말만 믿지 마세요.
엔진에게 직접 물어보세요.

This guide runs on your own LLM key — not ours. WOS injects the persona and everything it knows about the product and this page as memory; the model just talks. So we're showing you the engine by using it.이 가이드는 우리 키가 아니라 당신의 LLM 키로 작동합니다. WOS가 페르소나와 제품·이 페이지에 대한 모든 정보를 메모리로 주입하고, 모델은 말만 합니다. 즉, 엔진을 직접 써서 엔진을 보여드리는 거예요.

Persona페르소나

Product guide — factual, developer-friendly. We hold it; you don't write it.제품 가이드 — 사실 위주, 개발자 친화. 우리가 보유하며 당신이 작성하지 않습니다.

Memory메모리

WOS system info + this page's content, pre-loaded as recall context.WOS 시스템 정보 + 이 페이지의 내용을 검색 컨텍스트로 미리 적재.

Session세션

Not saved. This chat lives in your browser only — refresh and it's gone. Only real users get persistent memory. Max 25 turns.저장 안 됨. 이 대화는 브라우저에만 있고 새로고침하면 사라집니다. 영구 메모리는 실제 사용자만. 최대 25턴.

Your key당신의 키

Stays in this tab only, never stored on our servers. Available after login.이 탭에만 머물고 서버에 저장되지 않습니다. 로그인 후 사용 가능.

WOS Guide offline오프라인

memory: loaded · persona: product guide메모리: 적재됨 · 페르소나: 제품 가이드

recall ready검색 준비

Hi — I'm the WOS product guide, running on a connected LLM with everything about WOS loaded as memory. Ask me anything, or tell me to take you somewhere on the page.안녕하세요 — WOS 제품 가이드입니다. WOS에 대한 모든 정보를 메모리로 적재한 LLM으로 작동해요. 무엇이든 물어보거나, 페이지의 특정 위치로 데려가 달라고 해보세요.

Live demo · responds via WOS-injected context라이브 데모 · WOS 주입 컨텍스트로 응답 0 / 25 turns턴

Connect your LLM key to start시작하려면 LLM 키를 연결하세요

The demo runs on your own key. Available after login — your key stays in this tab and is never stored.데모는 당신의 키로 작동합니다. 로그인 후 사용 가능 — 키는 이 탭에만 머물고 저장되지 않습니다.

320ms

Median retrieval latency (p50)검색 지연 중앙값 (p50)

~1.2k

Tokens returned per query쿼리당 반환 토큰

LLM calls in the retrieval path검색 경로의 LLM 호출

Native SDKs · Py / TS / JS / Go / Rust네이티브 SDK · Py / TS / JS / Go / Rust

Retrieval latency — engine only, no answer model검색 지연 — 엔진 단독, 답변 모델 제외 170ms — 580ms · range범위

100ms320ms600ms

Benchmark · LongMemEval-S벤치마크 · LongMemEval-S

WOS 1, measured in
the open.WOS 1, 투명하게
측정했습니다.

WOS 1 · 85.2% 5-run mean5회 평균 0 cherry-picks체리픽 0

85.2%

5-run mean · σ 1.1% · runs: 86.2 / 84.2 / 85.0 / 83.8 / 86.65회 평균 · σ 1.1% · 측정값: 86.2 / 84.2 / 85.0 / 83.8 / 86.6

Reader model답변 모델Claude Opus 4.8

Judge채점GPT-4o · temp 0

Reproducible재현 가능scripts + prompts public스크립트·프롬프트 공개

Single-session user단일 세션 · 사용자

98.9

Single-session assistant단일 세션 · 어시스턴트

96.4

Knowledge update지식 갱신

91.3

Preference inference선호 추론

86.0

Temporal reasoning시간 추론

80.8

Multi-session멀티 세션

73.8

We report the average of 5 runs, every run shown — not a single best result. The retrieval engine is deterministic; run-to-run variation comes only from the reader model. Full prompts, raw per-run scores, and the method are on the report.단일 최고값이 아니라 5회 평균과 모든 측정값을 공개합니다. 검색 엔진은 결정적이며, 회차 편차는 답변 모델에서만 발생합니다. 전체 프롬프트·회차별 원시 점수·측정 방법은 리포트에 있습니다.

View the full report전체 리포트 보기

Built for developers개발자를 위해

Three lines to give
your agent a memory.세 줄이면
에이전트에 기억이 생깁니다.

One-call recall한 번의 recall

short-term + long-term + context, ready for your prompt.단기+장기+문맥을 한 번에, 프롬프트에 바로.

Same surface, 5 languages동일 인터페이스, 5개 언어

add · recall · search · supersede · forget.add · recall · search · supersede · forget.

Self-host or hosted자체호스팅 또는 호스팅

your host, your keys, your data residency.당신의 서버, 당신의 키, 당신의 데이터.

    
from wontopos import Client

mem = Client(api_key="wos-...")
mem.add("she prefers tea over coffee", user_id="alice")

# one call → short-term + long-term + context
ctx = mem.recall("what does alice drink?", user_id="alice")
import { Client } from "wontopos";

const mem = new Client({ apiKey: "wos-..." });
await mem.add("she prefers tea over coffee", { userId: "alice" });

// short + long + context, ready for the LLM
const ctx = await mem.recall("what does alice drink?", { userId: "alice" });
# store a memory
curl -X POST https://api.wontopos.com/api/v1/memory/store \
  -H "X-API-Key: wos-..." -H "Content-Type: application/json" \
  -d '{"user_id":"alice","content":"she prefers tea over coffee"}'

# recall — one call, ready for your prompt
curl -X POST https://api.wontopos.com/api/v1/memory/recall \
  -H "X-API-Key: wos-..." -H "Content-Type: application/json" \
  -d '{"user_id":"alice","query":"what does alice drink?"}'
mem := wontopos.New("wos-...")
mem.Add(ctx, "she prefers tea over coffee", wontopos.User("alice"))

// short + long + context in one call
res, _ := mem.Recall(ctx, "what does alice drink?", wontopos.User("alice"))
use wontopos::Client;

let mem = Client::new("wos-...");
mem.add("she prefers tea over coffee", "alice").await?;

// short + long + context, one call
let ctx = mem.recall("what does alice drink?", "alice").await?;

Trust & transparency신뢰와 투명성

Your memory is yours. We hold it — we never look inside.당신의 기억은 당신의 것입니다. 우리는 보관할 뿐, 들여다보지 않습니다.

Never viewed열람 안 함

No human and no model of ours reads your stored memories. Ever.우리 직원도, 우리 모델도 당신의 기억을 읽지 않습니다. 절대로.

Never trained on학습 안 함

Your data is never used to train any model — ours or anyone else's.당신의 데이터는 어떤 모델 학습에도 쓰이지 않습니다 — 우리 것이든 남의 것이든.

Forget on command즉시 삭제

One call erases a memory or an entire user. GDPR-ready, no traces.한 번의 호출로 기억 하나 또는 사용자 전체를 삭제. GDPR 대응, 흔적 없음.

BYOK & isolatedBYOK · 격리

LLM keys are sent per request, never stored. Memories isolated per account & user.LLM 키는 요청마다 전달되고 저장되지 않습니다. 기억은 계정·사용자별로 격리됩니다.

For vibe coders & agents바이브 코더 & 에이전트를 위해

One file your AI
can read instantly.당신의 AI가 바로
이해하는 파일 하나.

Download a compact, LLM-friendly spec of the whole API. Drop it into your IDE or any coding agent and start building — no docs spelunking.전체 API를 LLM이 이해하기 쉬운 형태로 압축한 파일을 받으세요. 사용하는 에디터나 코딩 에이전트에 넣으면 바로 시작 — 문서를 헤맬 필요 없습니다.

▢ wontopos-llms.txt · 1.4 KB

# Wontopos (WOS) — Long-term memory for AI agents
> Pure semantic retrieval. No LLM in the path.
> Bounded, fixed-size recall. We never train on,
> view, or use your data.

## Core endpoints
- /memory/store    {user_id, content}
- /memory/search   {user_id, query}
- /memory/recall   {user_id, query}  # one-call ctx
...

Give your agent a memory
that doesn't forget.잊지 않는 기억을
당신의 에이전트에게.

Free to start. ~320ms recall, every language, bounded cost. Your data stays yours.무료로 시작하세요. ~320ms 검색, 모든 언어, 고정 비용. 데이터는 당신의 것입니다.