close

Se connecter

Se connecter avec OpenID

Capturing social representa,ons But the world is changing Today

IntégréTéléchargement
Capturingsocialrepresenta,ons
Jointextrac,onoftopicsandsen)ments:
anapplica,ontoprobabilis,ctopicmodels
JulienVelcin1
jointworkwithS.Loucher1,L.Khouas2andM.Dermouche1,2
1UniversitédeLyon,ERICLab(Lyon2),France
2AMISoPwareR&D,Montpellier,France
Buttheworldischanging
20billionsofphotos
sharedun,ltoday
36Mpages
RisksandbenefitsofBigData:
1,28billions
ac,veusers
>50Mpages
255Mac,veusers
500Mtweetsaday
Today
-Volume
-Variety
-Velocity
etc.(Veracity…)
50billionsof
indexedpages
60Mar,cles
61Mcomments
permonth
125prof.Emails
sent/received
aday
in
Fortextualdata:
187Mac,veusers
44kapplica,onsaday
Note:thesesta*s*csdatebackto2014
-seman,cgap
-languageisalivingthing
-curseofdimensionality
+1billionusers
100hofvideo
addedeveryminute
40Mac,veusers
Representa,onandcategories
•  Philosophy,logic
–  NecessaryandSufficientCondi,ons(Aristotle,an,q.)
–  Familyresemblance(Wiggenstein,1958)
Mainassump,on
Machinelearning(inpar,cular,clustering)willplay
acrucialroleforstudyingtheserepresenta,ons
•  Psychology,linguis,cs
–  Cogni,verepresenta,ons(Rosch,1973-1978)
–  Linguis,cs(Lakoff,1987)
•  Socialsciences
What
How
Who
Where
When
topiclearning
opinion
mining
roleiden,fica,on
communitydetec,on
evolu,onary
models
–  Socialrepresenta,ons
(Lippmann,1922)(Moscovici,1961)
èLet’sdodatascience!
Outlineofmytalk
Outlineofmytalk
•  Backgroundontopiclearningandopinionmining
•  Backgroundontopiclearningandopinionmining
•  Jointmodelingoftopicandopinion
•  Jointmodelingoftopicandopinion
•  Anapplica,ontostockmarketpredic,on
•  Anapplica,ontostockmarketpredic,on
•  Conclusionandperspec,ves
•  Conclusionandperspec,ves
GoogleNews
Pulsewebproject:hgp://pulseweb.cortext.net
Whytopiclearning?
Discoveringlatentstructures
(Blei,2011)
Topicmodelingprovidesmethodsforautoma,cally
organizing,understanding,searching,and
summarizinglargeelectronicarchives.
•  discoverthehiddenthemesthatpervadethecollec,on
•  annotatethedocumentsaccordingtothosethemes
•  useannota,onstoorganize,summarize,andsearchthe
texts
13
14
Opinionmining
Newspaperheadlines
•  Computa,onalstudyofopinions,sen,ments
andemo,onsexpressedintexts
SomeexcerptsofNYSKdataset1:
«IMFdirectorgeneralDominiqueStrauss-Kahnstandsbefore
judgeMelissaJacksoninManhaganCriminalCourt…»
«…AndherSpanishcounterpart,ElanaSalgado,gaveher
supportforthevic,mofhisallegedsexualassault.»
«ThoughStrauss-Kahnmaybeinnocent-andinsistsheis»
•  Inmytalk,Iwillreducethescopetopolarity
detec,on(e.g.,nega,vevs.posi,ve)
1hgps://archive.ics.uci.edu/ml/datasets/NYSK
Blogs
Bulle,nboards
Aujourd'huiEDFpousseàallongerladuréed'exploita,ondescentrales:40,
50,60ans.Lasitua,ondesrisquess'aggraveainsisuiteauvieillissementdes
installa,ons,desmatériaux,descircuitsélectriquesetdescapteursetdonc
lescentralesfonc,onnentdansdescondi,onsquis'éloignentdecelles
prévusparlesconcepteurs.Leprincipedepréven,onn'estdoncpas
appliqué,puisquelespouvoirspublics,sansl'officialiserpubliquement,
admegentqu'onpuisseexploiterunetelletechnologieavecdetelsrisques.
Nousapprochonsdu25èmeanniversairedelacatastrophedeTchernobyl,
souvenonsnous!
Jenesaispasoùvousavezinventéçà.Ilyauneaugmenta,onde2,9%au1er
juilletprochain.Augmenta,onquiseravalable,quiserapoursoldedetout
comptejusqu'au1erjuillet2012,cequimeparaitraisonnable.Etcequipermet
àEDFdefairefaceàsescontraintes,auxcontraintesdeservicepublicquilui
ontétéimposées,commeàuncertainnombred'inves,ssements.Au-delànous
avonsengagécetravailsurlasûretédenoscentrales,auregarddecequivient
desepasserauJapon.Ceseraévidemmentl'occasiondehausserleniveaude
sécuritédel'ensembledesinstalla,onsnucléairesetilyauradoncdes
inves,ssementsàréaliser.Ilyadanscecontexte,uneobliga,oneuropéenne
d'ouvriràlaconcurrencelesecteurdel'électricité.
Productreviews
explicitsen,ment
sen,mentexpressedinnaturallanguage
Dealingwithbothissues
Topic-Sentiment dynamics
+ Sentiments
Strength
Topics
Strength
Strength
Hiddenforobviousreasons
pLSA[Hofmann,99]
LDA[Bleietal.,03]
NMF[Lee&Seung,99]
…
TSM[Meietal.,07]
JST[Lin&He,09]
ASUM[Jo&Oh,11]
…
Usually,post-processingw.r.t.
‘‘,me’’èmissingtopic-,me
andsen,ment-,mecorrela,on
exam
p
data
c,on
–  stemming
–  removestopwords
–  removenumbers,punctua,onmarks
introdu
•  Bagofwordsassump,on
•  Classicalpreprocessingsteps
Outlineofmytalk
les
Textrepresenta,on
•  Backgroundontopiclearningandopinionmining
•  Jointmodelingoftopicandopinion
•  Anapplica,ontostockmarketpredic,on
•  Ourinputdata:a“simple”countmatrix
•  Conclusionandperspec,ves
Jointanalysisoftopic,sen,mentand,me
Time-awareTopicxSen,mentmodel
Posi,ve
Nega,ve
z:topics w:words s:sen,mentlabels
t:,mestamps
TTS–graphicalmodel
Differentmodels,differentassump,ons
topic
Time
modality
Sentiment
modality
opinion
TTS(Dermoucheetal.,2014)
s
LDA
z
ASUM(Joetal.,2011)
z:topics w:words s:sen,mentlabels
t:,mestamps
Reverse-JST(Linetal.,2009)
TTS–graphicalmodel
Sentiment
modality
LDA
z:topics w:words Es)mate:
φ : p (w | s, z)
π : p (s | z)
ψ : p (t | s, z)
Time
modality
èGibbsSamplingprocess
s:sen,mentlabels
t:,mestamps
s
TTS–parameterses,ma,on(sketch)
•  Jointprobability:
•  Useitforderivingthemarginalprobability
p(s,z/.)andtheparametersupdates:
a.s.o.
•  Integrateopinionatedlexiconknowledge
•  Weightthetemporaldimensionby1/nd
Evalua,onscheme
BuildingthegoldstandardQs
•  Twodatasets
‒  MDS(Amazonreviews)è≈29,000reviews
‒  NYSK(DominiqueStrauss-Kahncase)è≈10,000newswires
•  Accuracyofsen,mentpredic,onatdocumentlevel
ènotthemainpurposeofTTSmodel
z4
z1
z3
z2
z2
z2
z2
z3
z1
z4
z1
z4
z1
z3
z1
z4
z1
z1
•  KLdistancebetween‘‘es,ma,on’’and‘‘reality’’
–  Qs=distancebetweentopicdistribu,onsoversen,ments(π)
–  Qt=distancebetweentopic-sen,mentdistribu,onsover
,me(ψ)
π
p(POS|z1)=5/7
p(NEG|z1)=2/7
z1é*quetéeposi*ve
30
Results–accuracyonMDS
Results–accuracyonMDS
ASUMisbestfor
topic-sen)mentmodeling
Notethatγcanbees,mateddynamically(Dermoucheetal.,2015)
Results–accuracyonMDS
Results–examplesonMDS
ASUMisbestfor
topic-sen)mentmodeling
TTSisbestfor
topic-sen)ment-)memodeling
Results–examplesonMDS
Onthesecondcasestudy:NYSK
Outlineofmytalk
Stockmovementpredic,on
•  JointworkwithT.H.NguyenandK.Shirai(Japan
AdvancedIns,tuteofScienceandTechnology)
•  Backgroundontopiclearningandopinionmining
•  Jointmodelingoftopicandopinion
•  Anapplica)ontostockmarketpredic)on
•  Conclusionandperspec,ves
Intheliterature
•  Someauthorsclaimthatthetaskisunfeasible…
–  EfficientMarketHypothesis(EMH)(Famaetal.,1969)
–  Nopossiblepredic,onabove50%(Walczak,2001)
•  andothersclaimtheopposite(Bollenetal.,2011)(Vu
etal.,2012)withaccuracyat~56%oPenreported
assa,sfying(Schumaker&Chen,2009)(Sietal.,2013)
•  Mostresearchisfocusedononlyonestockand
thenumberoftransac,ondatesisusuallylow
•  Someagemptstointegrateinforma,onfromthe
socialmedia(Zhangetal.,2011)(Xietal.,2013)
(Nguyenetal.,2015)
•  Ourgoal:predictthestockpricemovement(up
ordown)=classifica,onproblem
•  Byusing:
–  machinelearning(here,SVM)
–  classicalhistoricaldatafromstockmarkets
–  text/opinionminingforextrac,ng“mood”
informa,onfrombulle,nboards
Thien’sproposals
•  BasedontheJSTmodel(LinandHe,2009)
–  useJSTmodeltoinclude(topic,opinion)featuresinto
theclassifica,onscheme
–  features:pricet-1,pricet-2,jsti,j,t-1,jsti,j,t
(opinions{-,o,+},topicz{1…50})
•  Simpleaspect-based:
–  explicitextrac,onof(topic,opinion)pairs,byusing
POStagging,Sen,WordNet…
–  features:pricet-1,pricet-2,
Asentj,t-1,Asentj,t,Ij,t,Ij,t-1
Rela,veimportanceoftopicjatt-1
Averageofsen,menttowardtopiczatt(transac,ondatelevel)
Messageboarddataset
Results
18stocks,historical(adjustedclose)pricesfromYahooFinance
andmessageboardsfromJuly23,2012toJuly19,2013
Outlineofmytalk
Conclusion
•  Besttopic-sen,ment-,memodelingatthecorpuslevel
•  TTS:amodelforjointtopic-sen,mentdynamics
•  Backgroundontopiclearningandopinionmining
•  Jointmodelingoftopicandopinion
•  Anapplica,ontostockmarketpredic,on
•  Conclusionandperspec)ves
(Dermoucheetal.,ICDM2014)
•  TS:asta,cversionofTTSwithadynamices,mateofγ
(Dermoucheetal,SAC2015)
•  Dataareavailable(NYSKdatasetsontheUCI)
•  TheC++sourcecodeisavailableonrequest
•  Futureworkonparameterseång:
–  improvinghyper-parametersseång
–  es,ma,ngthenumberkoftopics
–  morecomplexinterac,onsbetween(topic,sen,ment)pairs
Somereferences
Dermouche,M.,J.Velcin,S.Loudcher,L.Khouas.Ajointmodelfortopic-sen,mentevolu,on
over,me.ProceedingsoftheIEEEInterna*onalConferenceonDataMining(ICDM),2014.
Dermouche,M.,L.Khouas,J.Velcin,S.Loudcher.AJointModelforTopic-Sen,mentModeling
fromText.Proceedingsofthe30thACM/SIGAPPSymposiumOnAppliedCompu,ng(SAC),
dataminingtrack,2015.
Thankyou
[julien.velcin@univ-lyon2.fr]
Jo,Y.,Oh,A.H.Aspectandsen,mentunifica,onmodelforon-linereviewanalysis.
ProceedingsofthefourthACMinterna*onalconferenceonWebSearchandDataMining
(WSDM),ACM,2011.
Lin,C.,Y.He,R.Everson,S.Ruger.Weaklysupervisedjointsen,ment-topicdetec,onfrom
text.IEEETransac*onsonKnowledgeandDataEngineering(TKDE),24(6):1134–1145,2012.
Nguyen,T.H.,K.Shirai,J.Velcin.Sen,mentanalysisonsocialmediaforstockmovement
predic,on.ExpertSystemswithApplica,ons(ESWA),42(24):9603–9611,2015.
Wang,X.,McCallum,A.Topicsover,me:anon-markovcon,nuous-,memodeloftopical
trends.DansProceedingsofthe12thACMSIGKDDinterna*onalconferenceonKnowledge
DiscoveryandDatamining(KDD),pages424–433.ACM,2006.
Auteur
Документ
Catégorie
Без категории
Affichages
13
Taille du fichier
5 792 Кб
Étiquettes
1/--Pages
signaler