trecvid 2016 : concept localization
Post on 20-Jan-2017
48 Views
Preview:
TRANSCRIPT
TRECVID-2016
Concept Localization : Overview
George Awad
NIST
Dakota Consulting, Inc
2TRECVID 2016
• Goal
• Make concept detection more precise in time and space than
current shot-level evaluation.
• Encourage more reusable concept detectors design that is
independent from the context.
• Task
• For each of the 10 new test concepts, NIST provided set of ~1000
shots.
• Any shot may or may not contain the target concept.
• For each I-Frame within the shot that contains the target, return
the x,y coordinates of the (UL,LR) vertices of a bounding
rectangle containing all of the target concept and as little more as
possible.
• Systems were allowed to submit more than 1 bounding box per I-
frame but only the one with maximum f-score were scored.
3TRECVID 2016
• Animal
• Bicycling*
• Boy
• Dancing*
• Explosion_fire*
• Instrumental_musician*
• Running*
• Sitting_down*
• Skier*
• Baby
4
10 New evaluated concepts
* dynamic/action concepts
TRECVID 2016
NIST Evaluation framework
• Testing data• IACC.2.A-C (600 h, used between 2013-2015 in SIN task)
• About 1000 shots per concept were sampled from the G.T (with TPclips of max = 300, avg = 178, min = 12)
• Total of 9,587shots and 2,205,140 i-frames were distributed to systems
• Human assessors were given all the i-frames (total of 55,789 images) of all TP shots to create the ground truth (drawing bounding box around the concept if it exists).
• Human assessors had to watch the video clips of the images to verify the concepts.
5TRECVID 2016
Evaluation metrics
• Temporal localization: precision, recall and f-score
based on the judged I-frames.
• Spatial localization: precision, recall and f-score
based on the located pixels representing the
concept.
• An average of precision, recall and f-score for
temporal and spatial localization across all I-frames
for each concept and for each run.
6TRECVID 2016
Participants (Finishers: 3 out of 21)
• 3 teams submitted 11 runs
• TokyoTech Tokyo Institute of Technology
• NII_Hitachi_UIT Nat’l. Inst of Info.; Hitachi, Ltd; Univ. of Info. Tech
• UTS_CMU_D2DCRC Univ. of Technology, Sydney; CMU; D2DCRC
7TRECVID 2016
Temporal localization results by run (sorted by F-score)
00.10.20.30.40.50.60.70.80.9
1M
ean
per
run
acro
ss a
ll c
on
cep
ts
I-frame F-score
I-frame Precision
I-frame Recall
8TRECVID 2016
TRECVID 2016 9
0
0.2
0.4
0.6
0.8
1
Me
an
per
run
acro
ss a
ll c
on
ce
pts
2013
0
0.2
0.4
0.6
0.8
1
Mean
per
run
acro
ss a
ll c
on
cep
ts
2014
00.10.20.30.40.50.60.70.80.9
1
CC
NY
_su
b1
.re
sult.t
xt
CC
NY
_su
b2
.re
sult.t
xt
CC
NY
_su
b3
.re
sult.t
xt
CC
NY
_su
b4
.re
sult.t
xt
insig
htd
cu.D
CU
_L
o…
Me
dia
Mill
_Q
ua
lco…
Me
dia
Mill
_Q
ua
lco…
Me
dia
Mill
_Q
ua
lco…
Me
dia
Mill
_Q
ua
lco…
Pic
SO
M.P
icS
OM
_…
Pic
SO
M.P
icS
OM
_…
Pic
SO
M.P
icS
OM
_…
Pic
SO
M.P
icS
OM
_…
To
kyoT
ech
.run
_to
k…
To
kyoT
ech
.run
_to
k…
To
kyoT
ech
.run
_to
k…
To
kyoT
ech
.run
_to
k…
Tri
mp
s_1
.txt
Tri
mp
s_2
_N
EG
_04
…
Tri
mp
s_3
_N
EG
_N
…
Tri
mp
s_3
_N
OC
_0
1…
Me
an
per
run
acro
ss a
ll c
on
ce
pts
2015 2016 (mainly action) >> 2013 & 2014
(mainly objects)
ONLY TP shots were given
to systems to localize.
Temporal Localization results
Spatial Localization results by run (sorted by F-score)
00.10.20.30.40.50.60.70.80.9
1M
ean
per
run
acro
ss a
ll c
on
cep
ts
Mean Pixel F-score
Mean Pixel Precision
Mean Pixel Recall
10
Harder than
temporal
localization
TRECVID 2016
TRECVID 2016 11
00.20.40.60.8
1M
ea
n p
er
run
ac
ros
s a
ll c
on
ce
pts
2013
0
0.2
0.4
0.6
0.8
1
Me
an
pe
r ru
n a
cro
ss
all
co
nce
pts
2014
0
0.2
0.4
0.6
0.8
1
CC
NY
_su
b1
.re
sult.t
xt
CC
NY
_su
b2
.re
sult.t
xt
CC
NY
_su
b3
.re
sult.t
xt
CC
NY
_su
b4
.re
sult.t
xt
insig
htd
cu.D
CU
_L
o…
Me
dia
Mill
_Q
ua
lco…
Me
dia
Mill
_Q
ua
lco…
Me
dia
Mill
_Q
ua
lco…
Me
dia
Mill
_Q
ua
lco…
Pic
SO
M.P
icS
OM
_L…
Pic
SO
M.P
icS
OM
_L…
Pic
SO
M.P
icS
OM
_L…
Pic
SO
M.P
icS
OM
_L…
To
kyoT
ech
.run
_to
k…
To
kyoT
ech
.run
_to
k…
To
kyoT
ech
.run
_to
k…
To
kyoT
ech
.run
_to
k…
Tri
mp
s_1
.txt
Tri
mp
s_2
_N
EG
_04
…
Tri
mp
s_3
_N
EG
_N
…
Tri
mp
s_3
_N
OC
_0
1…
Me
an
pe
r ru
n a
cro
ss
all
co
nce
pts
20152016 (actions) > 2013 (objects)
2016 (actions) ~ 2014 (objects)
ONLY TP shots were given
to systems to localize.
Spatial Localization results
Results per concept
top 10 runs
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
F-s
co
re
Median
10
9
8
7
6
5
4
3
2
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Me
an
F-s
co
re
Median
10
9
8
7
6
5
4
3
2
1
Temporal localization Spatial localization
Most concepts perform better in temporal compared to spatial localization
A lot of resemblance between same concepts
12TRECVID 2016
Results per concept across all runs
00.10.20.30.40.50.60.70.80.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precision
Temporal localization
00.10.20.30.40.50.60.70.80.9
1
0 0.10.20.30.40.50.60.70.80.9 1
Me
an
Rec
all
Mean precision
Spatial localization
13
submitted bounding boxes
approximate G.T boxes in size with
some overlap. Many systems are
good in finding the real box sizes.
Many systems submitted a lot of non-target I-frames, while few found a good balance.
TRECVID 2016
baby
Inst_musi
bicycling
General Observations• Consistent observations in the last 4 years
Temporal localization is easier than spatial localization.
Systems report approximate G.T box sizes.
• Performance of action/dynamic concepts are higher
than object concepts tested in 2013-2014.
• Assessment of action/dynamic concepts proved to be
challenging in many cases to the human assessors.
• Lower finishing% of teams compared to signups
14TRECVID 2016
Next team talks
• TokyoTech
• UTS_CMU_D2DCRC
TRECVID 2016 15
top related