ML Incubation Palo Alto

Fredrick Chew

Camille Girabawe

Joanna Li

Hameesh Manadath

Alexander Schaefer

28.07.2017

Machine Learning - Approaches

Supervised Learning
Learning with a labeled training set
Unsupervised Learning
Discovering patterns on unlabeled data
Reinforcement Learning
Learning based on reward/feedback and inputs/actions into the environmental

What is Reinforcement Learning

System/Agent

Environment

OBSERVATIONS

ACTIONS

From [numbers] to Actions

Like DeepMind, we use a gaming environment for experimentation.
DeepMind's Atari code is overwhelming to experiment. We use REINFORCEjs (on JS Games)
Goal is to simply maximized score
One system/algo that learns to operate
on 3 different environments.

JS Pong

Environment State:

pong coord X, Y
paddle's Y

Action Space:

Move up
Move down
Stop

Goal/Rewards:

Lose a ball: -1
Score a goal: +1
Zero reward

What to cache ?

HOT

COLD

Workload: 1, 2, 0, 4, 3, 5 ...1, 2, 0, 4...

fast access

reward = +10

slow access

reward = -10

data movement

reward = -1

Environment State:

Last query data element

Emulator inspired by Data-Aging

Tuning HANA's parameter

Single Query Template: analytical query that is not cached

Adjusting HANA's max_concurrency based on predicted workload as a POC

Environment State:

current max_concurrency# [1 to 63]

Action Space:

increase cpu by 1 core
do nothing
decrease cpu by 1 core

Goal/Rewards:

punish agent if it violates CPU boundary:
- 0.2 x

Learning based on Rewards/Goals

Dopamine the reward signal for the human brain
Intuition: RL mimics this process

DQN Formula:

Key take away:
We are formalize "Learning through failures" via rewarding signals

Be careful on how you reward !:
Unintended consequences may happen. Next game show this effect

RL Hyerparameters

Epsilon: Exploration vs Exploitation
Alpha: Learning rate (NN)
Gamma: Greediness

Just one year ago

Google DeepMind published a paper in 2015 Nature

Human-level control through deep reinforcement learning

Google bought DeepMind for 500 million

JS Breakout

Environment State:

ball's X,Y
paddle's X

Action Space:

Move left
Move right
Stop

Goal/Rewards:

paddle missed ball: -1
paddle bounce ball: +1
zero reward for rest

Agent learnt a nice strategy to maximized reward with least movement

ML Incubation Palo Alto

Pong

BreakOut

What2Cache

ReinforceJS

fredrick.chew@sap.com

Demos: