# AI == Compression

<br>

# [Mayday 2025](https://www.cbs8.com/article/news/local/ucsd-health-workers-set-to-walk-off-the-job-wednesday-morning/509-e5472039-79d9-4e2f-b9bd-beb775ebb8c2)

<br>

#### Hobson Lane

#### UCSD RET

--v--

## [clueo.net](https://indica.clueo.net/)

#### Job Crowder

and

#### Alicia Chen

<br>

## Engineers

- Taylor Kirk
- Jason O'Dell

--v--

## RET - Research Experience for Teachers

Teaching AI to San Diego teachers and students

### [Gary Cottrell]((https://cseweb.ucsd.edu/~gary/))

### UCSD CS Dept

with NSF support

---

## Agenda

1. **[Assignment](#student-assignment)**
2. [Get serious](#get-serious)
3. [Decision trees](#decision-trees)
4. [Hybrid code networks](#hybrid-code-networks)
5. [State-space models](#state-space-models)
6. [Python tricks](#python-tricks)

--v--

## Student assignment

1. Text adventures<br>
1.1 **[Why](#why-text-adventure)**<br>
1.2 **[Example](#plan)**<br>
1.3 **[Natural language](#natural-language)**<br>
1.4 **[Auto-grade](#auto-grade)**<br>
1.5 **[REPL-grade](#repl-grade)**<br>

--v--

## [Text adventures](#student-asignment)
<br>

#### Build a **Text Adventure (Chatbot)** from scratch
<br>

- keywords: `if`, `print()`, `input()`, `=`
- bonus keywords: `def`, `in`, `str.*()`

--v--

## [Why text adventure?](#agenda)

GenX got their start w/ text adventures

- _DnD_
  - Character sheets
  - Die rollers
  - Dungeon Masters
- _Oregon Trail_
- _Colossal Cave Adventure_
- _Rogue_

--v--

## [Plan](#agenda)

#### Student design
<a href="student-text-adventure-diagram-tree.drawio.svg"><img src="student-text-adventure-diagram-tree.drawio.svg" height="800" /></a>

#### [Game plan == dialog plan]()

--v--

## Example

```python
print('You are a student at Mesa and you want to go swimming!')
resp = input("You're in SD, which way do you go (N/S)?")
if resp[0] == 'N':
    resp = input("You're in LA, Which way now (E/W)?")
    if resp[0] == 'E':
        resp = input("You're in Vegas. You lose ;(")
    elif resp[0] == 'W':
        resp = input("You're at the beach! You win! :-)")
    print('Invalid direction in LA. You lose!')
elif resp[0] == 'S':
    resp = input("You're in the TJ. Which way now (E/W)?")
    if resp[0] == 'E':
        resp = input("You're in the desert. You lose ;(")
    elif resp[0] == 'W':
        resp = input("You're at the coast! You win! :-)")
    print('Invalid direction in TJ. You lose!')
print('Game Over')
```

--v--

## Natural language
<br>

**More natural UX**
<br><br>

`.lower()`

`.strip()`

<br>

`.replace(' ', '')`

`.startswith(answer)`

<br>

`answer in message`

**[Keyword search]()**

--v--

## Auto-grade

- **Pylint** code quality score
- Open loop integration test

```bash
$ python -c 'print('N') ; print('W') ; | python game.py
$ cat player_input.txt | python game.py
```

--v--

## Close the loop with `pexpect`

```python
import pexpect

child = pexpect.spawn('python game.py')

while child:
    child.expect('.*: ')
    command = select_action(text)  # 'north' 
    child.sendline(command)
```

--v--

## REPL-Grade

### <h3 style="color:red;">R</h3>
### <h3 style="color:white;">E</h3>
### <h3 style="color:white;">P</h3>
### <h3 style="color:white;">L</h3>

--v--

## Thoughtful feedback

### **E**
### **P**
### **L**

--v--

## Thoughtful feedback

### **R** ead

### **P**
### **L**

--v--

## Thoughtful feedback

### **R** ead
### **E** val

### **L**

--v--

## Thoughtful feedback

### **R** ead
### **E** val
### **P** rint

--v--

## Mindful grading

### **R** ead
### **E** val
<h2 style="color:red;">**P** lay</h2>

---

## Agenda

1. [Assignment](#student-assignment)
2. **[Get serious](#get-serious)**
3. [Decision trees](#decision-trees)
4. [Hybrid code networks](#hybrid-code-networks)
5. [State-space models](#state-space-models)
6. [Python tricks](#python-tricks)

--v--

## Getting serious

2. Business applications<br>
2.1 **[Serious examples](#examples)**<br>
2.2 **[Vibin examples](#vibin)**<br>
2.3 **[Serious vibin](#serious-vibin)**<br>
2.4 **[Serious grounding](#serious-grounding)**<br>

--v--

## Examples

- Phone tree
- Autocomplete
- Therapy - e.g. ELIZA, Wobot
- Chat ops
- Search

--v--

## Vibin

Full monty naked LLMs

- _Grok_ - famously guardrail free
- _ChatGPT_ - relaxing guardrails
- _Deep Seek_ - breached all customer data within days of launch

--v--

## Serious vibin

- RAG & LAS - LLM augmented search
- CLI assistant (``shy-sh``)
- Vibe assistant
   - _Copilot_
   - _Cursor_
   - _Aider_
- Homework copypasta
    - _ClaudeCode_
    - _Gemma_

--v--

## Generalization & abstraction

- Lossy concept compression
- Concept representation matters
- Stereotypes
- Biases

--v--

## JPG compression is not AI

#### Transformer compression is not AI

--v--

## Serious grounding

Grounding with NLP & Data Science

- Classification
- Class-specific templates
- Entity extraction
- Slot filling
- Templates with slots filled

--v--

## Grounding Approaches

- RLHF (Reinforcement Learning - Human Feedback)
- Fine tuning (LoRA) - forgets common sense
- More context

#### Better approaches

- Less context + small models
- Hybrid

--v--

## More context
- RAG
- Few shot interpolation
- Examples of what you want
- Examples of what you don't want

--v--

## Therapist training

--v--

## Artificial Intelligence?

There's only one problem with AI...

--v--

## Artificial Intelligence?

There's only one problem with AI...

It's trained to not be dumb:

- Only knows about tokens NOT:

- Numbers
- Math
- Logic
- Counting
- Physical objects
- Common sense logic
- Geometry
- Rules

--v--

## Why?

- Incorrect generalizations
- Representation problem

--v--

## Generalization

Generalization == lossy compression

- False negatives (metadata, forensics)
- False positives (watermarks)

--v--

## Good compression

<a href="tornado-before-after.png"><img src="tornado-before-after.png" height=768 width=2048/></a>
Mayfield, KY Candle factory - Before and after a Tornado

--v--

## Over generalization

- Ignores critical details
- Retains irrelevant noise
- Retains incorrect generalizations (biases)

--v--

## LLM reasoning

- Guesses wildly - "out-of-distribution" sampling
- Fuzzifies user intent
   - Adds ambiguity
   - Fuzzy database (semantic search)

---

## Agenda

1. [Assignment](#student-assignment)
2. [Get serious](#get-serious)
3. **[Decision trees](#decision-trees)**
4. [Hybrid code networks](#hybrid-code-networks)
5. [State-space models](#state-space-models)
6. [Python tricks](#python-tricks)

--v--

## 3. Decision trees

3. Critical decisions<br>
3.1 **[Attention to NL](#attention-to-NL)**<br>
3.2 **[Vibin examples](#vibin)**<br>
3.4 **[Decision tree](#decision-tree)**<br>
3.5 **[Bayes net](#bayes-net)**<br>

--v--

## Attend to NL?

[![(decision-fork-ignore-message-from-larissa.JPG)](decision-fork-ignore-message-from-larissa.JPG)

--v--

## Wrong NL attention

--v--

## Decision tree
<a href="Decision-trees-for-the-Start-up-milk-classification-task-with-a-low-A-medium-B.png"><img src="Decision-trees-for-the-Start-up-milk-classification-task-with-a-low-A-medium-B.png" height="1000"/></a>

FDA classification of milk nutrition

--v--

## Bayes net

#### Expert system for diagnosing cancer
<a href="tuberculosis-cancer-baysean-network.png"><img src="tuberculosis-cancer-baysean-network.drawio.png" height="800" /></a>

--v--

## Learned decision trees

- Random forest
- Neuromorphic programming (deep learning)
    - Decision root ball
- Bayesean belief networks, the book of why

---

## Agenda

1. [Assignment](#student-assignment)
2. [Get serious](#get-serious)
3. [Decision trees](#decision-trees)
4. **[Hybrid approach](#hybrid-code-networks)**
5. [State-space models](#state-space-models)
6. [Python tricks](#python-tricks)

--v--

## 4. Hybrid approach

4. Decision logic + deep learning<br>
4.1 **[Hybrid code networks](#hybrid-code-networks)**<br>
4.2 **[Bamba](#bamba)**<br>
4.3 **[State space model](#state-space-model)**<br>

--v--

## Hybrid Code Networks

--v--

## Bamba

### Bamba-v2-9B <sup>[8](#links)</sup>

- State Space Model (SSM) & Transformer layers
- IBM, Princeton, CMU, UIUC
- Fully open source (data, code, weights)
- Based on Mamba2
- Subquadratic scaling

### VS Llama-3.1-8B

- 2.5x faster (latency & throughput)
- 5x less data (3T vs 15T tokens)

--v--

## State-space Model

- Mimic the "time cells" in the hippocampus
- [Better representation (AIMA)](https://csd.cmu.edu/course/15281/s24)

- Predict sequences
- Feedback control systems
- Scale nearly linearly with sequence length

--v--

## Control system

### Discrete SSM <sup>[9](links#)</sup>

</div>

### `x[t+1] = A * x[t] + B * u[t]`
### `y[t]   = C * x[t] + D * u[t]`

</div>

--v--

## NLPiA Hybrid Networks with Feedback

---

## Convomeld

- Merge conversation logs to create a dialog tree
- Plot the dialog plan network (graph)
- Load dialog plan to `networkx`
- Create drawio dialog plan diagrams w/ drawpy
- Execute the dialog plan

#### [gitlab.com/tangibleai/community/convomeld](https://gitlab.com/tangibleai/community/convomeld)

--v--

## [indica.clueo.net](https://indica.clueo.net/)

- Django admin wrapper
- Conversation logs
- Exercises
- LLM prompts

### TODO:

- API for ConvoMeld

--v--

## Reflect-a-bot

--v--

## ipython aliases

#### `~/.ipython/profile_default/startup/my_aliases.ipy`
```python
%alias meld meld
%alias subl subl
%alias which which
%alias wc wc
%alias find find
%alias curl curl
%alias grep grep
```

--v--

### Links

1. ["PAIR: ... Counselor Reflection Scoring in Motivational Interviewing", Min et. al.](https://aclanthology.org/2022.emnlp-main.11.pdf)
2. ["Building a Motivational Interviewing Dataset.pdf", Pérez-Rosas et. al.](https://aclanthology.org/W16-0305.pdf)
3. ["Explaining Bayesian Networks in Natural Language using Factor Arguments." Oct 2024, by Jaime Sevilla et al](https://www.researchgate.net/publication/385176693_Explaining_Bayesian_Networks_in_Natural_Language_using_Factor_Arguments_Evaluation_in_the_medical_domain/fulltext/6719c020edbc012ea138df93/Explaining-Bayesian-Networks-in-Natural-Language-using-Factor-Arguments-Evaluation-in-the-medical-domain.pdf)
4. [Decision-trees-for-the-Start-up-milk-classification](https://www.researchgate.net/profile/Daniel-Lefebvre-2/publication/238725014/figure/fig3/AS:669294299971611@1536583604771/Decision-trees-for-the-Start-up-milk-classification-task-with-a-low-A-medium-B.png)
5. [Indica Django App by Job Crowder & Jason](https://indica.clueo.net/)
6. [On the Biology of LLMs by Anthropic](https://transformer-circuits.pub/2025/attribution-graphs/biology.html)
7. [DeepSeek breach - customer data public ClickHouse DB](https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepseek-database-leak)
8. [Bamba-v2-9B on huggingface](https://huggingface.co/blog/ibm-ai-platform/bamba-9b-v2)
9. [en.Wikipedia.org/wiki/State-space_representation](https://en.Wikipedia.org/wiki/State-space_representation)

---