Large language models are getting very good at understanding words. They can retrieve documents, call tools, reason over context, summarize long threads, write code, and produce fluent responses. But there is a layer of human communication that most AI systems still handle poorly: timing.

A message after five seconds does not mean the same thing as a message after five hours. A quick reply during a debugging session should not be treated the same way as a user returning to a task the next day. A short “ok” after a long technical explanation might mean understanding, overload, fatigue, dismissal, or simply “continue.”

Humans interpret these timing signals naturally, even if imperfectly. We notice pauses. We notice when someone is replying quickly. We notice when a conversation has gone cold. We know when to recap and when a recap would be annoying.

Most AI systems do not.

That is the problem TiM is built to explore.

What Is TiM?

TiM stands for Timing-to-Directive Modeling.

It is a small runtime layer that looks at recent conversation timing and UI interaction signals, then produces a single lightweight directive for the main language model.

TiM does not decide facts. TiM does not choose tools. TiM does not replace retrieval, reranking, memory, or safety policy. TiM does not control the model.

Instead, TiM lightly nudges the model’s conversational posture.

For example, TiM may tell the model:

  • Continue directly, but keep the pace steady and focused.
  • Begin with a brief recap, then continue directly.
  • Keep the reply short and conversational.
  • Reduce branching and give one clear next step.
  • Slow down and prioritize clarity; the user may be processing.
  • Keep the answer compact and avoid introducing new threads.

The idea is simple: same intelligence, better entrance.

The downstream model still decides what to say. TiM helps decide how the model should enter the next turn.

Why Timing Matters

Human communication is not just semantic. It is temporal.

A fast back-and-forth exchange usually rewards brevity. A long gap may require context restoration. A user who starts typing quickly but sends much later may be processing, editing, or hesitating. A user who backgrounds the browser and returns hours later may need a different response than a user who stayed active the entire time.

This is especially important in AI systems that feel conversational. Without timing awareness, models often make awkward choices.

  • They recap when the user is still live and moving quickly.
  • They fail to recap after a long break.
  • They over-explain when the user is fading.
  • They give too many branches when the user needs one clear next step.
  • They respond with full detail when the user’s pace suggests they want speed.
  • They respond too briefly when the user is clearly doing deep work.

TiM is an attempt to give the runtime a small amount of chronemic awareness: awareness of timing, pauses, rhythm, interruption, and resumption.

The Core Design

The current TiM prototype uses observable runtime signals. These include:

  • Assistant message timestamp.
  • User message timestamp.
  • Time between assistant message and user reply.
  • Assistant message length and estimated burden.
  • Estimated reading-adjusted latency.
  • Typing start time.
  • Typing duration.
  • Edit and correction behavior.
  • Tab blur and focus behavior.
  • Browser or app backgrounding.
  • Device class.
  • Recent conversation rhythm.
  • Local latency baseline.
  • Resume and silence indicators.

Those signals are converted into a recent timing window. A small classifier then predicts one directive from a controlled set of runtime behaviors.

The final output is intentionally simple.

Directive ID: 6
Directive text: Continue directly, but keep the pace steady and focused.
Confidence: 0.94

That directive can then be injected into the final response prompt as a hidden runtime instruction.

TiM directive:
Continue directly, but keep the pace steady and focused.

The main model does not need to see the raw timing data. It only receives the behavioral cue.

Where TiM Belongs in the Stack

TiM is most effective when placed after context assembly and immediately before final response generation.

In my current runtime architecture, the flow looks like this:

  1. User message.
  2. Session state update.
  3. Retrieval, embeddings, and reranking.
  4. Tool selection and execution.
  5. Final context assembly.
  6. TiM timing inference.
  7. Directive injection.
  8. Final LLM response.

This placement matters.

If TiM runs too early, it could accidentally influence retrieval or tool planning. For example, “keep the answer compact” should not mean “skip the relevant documents” or “do not call the tool.” Retrieval and tool use should remain truth-oriented and task-oriented.

If TiM runs too late, it can only post-process an answer that has already been written.

The best place for TiM is right before generation, where it can influence the shape of the response without interfering with the substance.

Current Deployment

The current TiM deployment is running inside a local LAN inference stack.

The system includes:

  • A Gemma 4 31B IT model instance.
  • Qwen3-Reranker-8B.
  • Qwen3-Embedding-8B.
  • Approximately 4K-dimensional embedding configuration.
  • A four-node local system.
  • 10GbE networking.
  • Django backend.
  • React frontend.
  • Custom UI instrumentation for TiM signals.

The custom frontend collects the timing and interaction signals TiM needs, including typing behavior, timestamps, focus state, and backgrounding. Those signals are sent to the backend, where TiM builds the timing window and emits a directive before final model generation.

This is not a cloud benchmark setup. It is a working local runtime deployment designed to test whether timing-to-directive modeling improves the feel of a real AI assistant.

Early Observations

The early results are promising.

In live use, TiM is doing what it was designed to do: lightly nudge the model without taking over.

The assistant feels more aligned with my workflow and intent, especially after multiple turns. It adapts more naturally to pace and tempo. When I am moving quickly, it tends to give shorter, faster answers without me having to ask. When I am working through something more deeply, it is more willing to provide longer, detailed responses without me explicitly prompting for detail.

The recap and resume behavior is also behaving in the desired direction. After longer gaps, the assistant is more likely to re-anchor or briefly restore context. During active flow, it is less likely to waste time recapping what is already obvious.

That is the real goal: not to make the assistant artificially terse or artificially verbose, but to make its response style feel better timed.

I am currently auditing TiM outputs manually. My approximate real-world estimate is that TiM is choosing the right timing directive around 85% of the time.

That number is important. I do not actually want 100%.

Human timing interpretation is imperfect. If TiM were perfectly deterministic, it would probably feel unnatural. A system that always “knows” exactly what a pause means would be overconfident in a domain where even humans disagree. On the other hand, if accuracy drops much below roughly 80%, the timing layer starts to feel awkward.

The current behavior is close to the target: imperfect, but useful.

Why Imperfection Is Part of the Design

Timing interpretation is ambiguous by nature.

  • A long pause can mean the user was interrupted.
  • It can mean the user was reading.
  • It can mean the user was thinking.
  • It can mean the user was overloaded.
  • It can mean the user left and came back.
  • It can mean the user is fading out of the conversation.
  • A short reply can mean agreement.
  • It can mean impatience.
  • It can mean clarity.
  • It can mean confusion.
  • It can mean dismissal.
  • It can mean “keep going.”

Because of that, TiM is not built as a hard-coded rule table. It is trained as a probabilistic classifier. Internally, it produces a distribution over possible directives. The runtime still emits one directive, but the model can represent uncertainty through confidence, margin, entropy, and neighboring directives.

That uncertainty is not a bug. It is part of the point.

For example, these directive boundaries are naturally fuzzy:

  • Steady continuation versus live conversational flow.
  • Concise answer versus compact close-out.
  • Overload versus careful processing.
  • Soft resume versus strong resume.
  • Task re-anchor versus stable continuation.

The model should not collapse these into brittle rules. It should learn the neighborhood of plausible responses and nudge the downstream LLM toward the best fit.

TiM and Agentic AI

Although TiM started as a conversational timing layer, the same idea applies to agentic AI.

Agentic systems also operate over time. They call tools, wait for results, retry failed actions, run multi-step workflows, and sometimes continue working after the user has stopped actively participating.

In that setting, timing is not only about user replies. It can also include:

  • Tool-call duration.
  • Retry count.
  • Time since the last user-visible update.
  • Elapsed task duration.
  • Stalled progress.
  • Loop depth.
  • Handoff delay.
  • Long-running operation gaps.
  • User interruption during an active task.

A timing-to-directive layer could help an agent decide how to communicate its next step.

For example:

  • If a tool chain has stalled, give one clear recovery path.
  • If a long task has completed, summarize progress before presenting the result.
  • If the agent has retried several times, stop branching and ask for confirmation.
  • If the user returns mid-task, re-anchor the task state.
  • If progress is steady, avoid unnecessary interruption.
  • If the task has drifted, collapse scope and restate the objective.

This generalizes TiM from conversational chronemics to task chronemics.

In both cases, the underlying idea is the same: treat time as runtime context.

TiM does not need to control the agent. It only needs to help the agent decide how to enter the next step.

What TiM Is Not

TiM is not a replacement for a better base model. It is not a memory system. It is not a reranker. It is not a tool router. It is not a safety filter. It is not a personality layer.

TiM is a small runtime policy model that interprets timing and emits a behavioral cue.

That distinction matters. If TiM starts deciding content, it becomes too powerful and too risky. Its value comes from being narrow.

It helps the model answer questions like:

  • Should I recap?
  • Should I stay brief?
  • Should I slow down and clarify?
  • Should I avoid opening new threads?
  • Should I give one next step instead of several options?
  • Should I maintain live flow?

That is enough.

Why This Matters

Most AI systems are optimized around content: better answers, better retrieval, better reasoning, better tools. Those are important. But the feel of an AI system also depends on timing.

A technically correct response can still feel wrong if it enters the conversation poorly.

A user who is moving quickly does not always want a long explanation. A user returning after hours may need context restored. A user who has faded out probably does not need new branches. A user who is carefully processing may need clarity rather than speed.

TiM gives the runtime a way to respond to those signals.

It is a small layer, but it changes the texture of the interaction.

Current Status

TiM is currently deployed in my local assistant stack and is being audited in live use.

The early conclusion is that timing-to-directive modeling is viable. It does not need to be perfect to be useful. In fact, some imperfection is desirable because timing interpretation itself is naturally uncertain.

The most important observation so far is qualitative: the model feels more aligned with my pace.

That is the outcome TiM was built for.

Not a smarter model. A better-timed one.

References and Background Reading

The TiM concept is informed by research in chronemics, computer-mediated communication, typing behavior, reading burden, and interruption science.

Key references include:

  • Kalman, Ravid, Raban, and Rafaeli on response latency in asynchronous computer-mediated communication.
  • Stivers et al. on human turn-taking and timing in conversation.
  • Lew et al. on response latency and conversational contingency in online chat.
  • Brysbaert on adult reading rates.
  • Dhakal et al. on desktop typing behavior from large-scale keystroke data.
  • Palin et al. on mobile typing behavior.
  • Altmann and Trafton on memory for goals.
  • Monk, Trafton, and Boehm-Davis on interruption and resumption.
  • Leroy on attention residue after task switching.

TiM is an attempt to turn those timing insights into a practical runtime layer for AI systems.