- Have spent most of last couple of weeks writing my Confirmation Report / Preparing for Confirmation / Writing CM Talk etc
- CM talk was yesterday, felt it went fairly well, reaction was good!
- Have my interviewy thing this afternoon for confirmation
- Have also kept the vertex refinement work ticking over, having significant success with running the refinement on every candidate. This is pretty much ready to open at least a draft Pandora PR I think! Only thing I really intend to do with it now is do a little hand-wavey tuning of some of the numbers I hard-coded in.
- CI cried when the geometry changed, SBN then took a week to move to the new root after larsoft so lots of angry error messages! As of yesterday we’re back though. Might actually get around to looking at the event-by-event stuff soon so will take a look at Ryan’s macro.
- Presented at TPC Reco, lots of useful comments on things to try. Major thing I wanted to try this week was using the same ideas from the refinement in candidate creation.
- I developed a method where for each 5cm region in which there are candidates I use the refinement method to create one single candidate for that region. This has worked well in massively reducing the number of candidates but I think its still having slightly too much effect on performance. Note I haven’t tested this through the selection yet (my metric is just vertex error for the best candidate). My feeling is that some performance will be gained back in selection purely by reducing the number of ‘bad’ candidates but this might be wrong, and I don’t think it’ll be enough.
- This also then gave me the idea of running the refinement immediately after the selection. This would then mean the improved vertex would be used in all the downstream reco and therefore have more effect. Began testing this this morning (plots are numu then nue):
- This looks good to me, I will clean it up and use the CI for a quick test but I think this is probably the way forward though. And its implementation would be pretty simple and easily testable.
- Have started an outline for my 10month review report but should really start focusing on that now as a priority!
- Got my first LArSoft PR merged which was fun. Andy C and Maria got in touch about trying the new features in a retraining for DUNE FD so I’ve given them the information they asked for.
- On the vertex refinement stuff, big thanks to Dom for his suggestion of trying a matrix technique rather than the pair-wise iteration I was doing before. This has worked really well as its allowed for me to weight the impact of different clusters in a better way which has made quite a difference to the performance (numu then nue).
- I’ve been trying out the same approach in candidate creation but there are obviously more challenges. Firstly, the directions of smaller clusters are far less informative of the vertex position so deciding which clusters to use is difficult. Secondly, unlike for refinement, you don’t have a “current” vertex position to use a base. Finally, this method would just produce one candidate so I’m going to try a “drop out” style method where you only use a certain number of clusters and try different combinations. I’m hoping this might get around the outlier problem as well.
- I haven’t actually tried reducing the number of candidates again since my naff first attempt a few weeks ago but I will go back to that later this week.
- Presenting at TPC Reco today so will hopefully get some useful input.
- Its been a while! Been on holiday and have been attending Warwick Week, WIN2021, RAL Advanced Grad Lectures etc etc
- BDT changes have been approved and from my understanding will be in this week’s larsoft release. They won’t be active in SBND yet but the functionality will be there.
- The delay on activating them is because I wanted to spend a little longer on the two other vertexing strands I’ve been working on.
- Firstly I’ve been looking at methods to rationalise the final vertex based on the 3D reconstruction. Have tried a series of methods, of which quite a few look promising in the numu context…
- Not so much in the nue…
- Will keep working on this. Want to look at what the problem is with nue, can I be more careful with which clusters I use? Is there much overlap between methods, i.e. are they improving the same events? They all share the same functionality for events with just one 3D object, briefly looked and this seems weaker, what can I do with that?
- Have also been using some of the same ideas in the candidate creation stage. No plots here I’m afraid but good progress, am able to make extra candidates that are better than previous ones in a quite a few events but also currently producing more wrong candidates as well so want to look at reducing this before seeing what affect it has.
- PR has been submitted for LArContent. Recieved some helpful review comments from Andy C this morning which I have begun responding to. I want to produce training samples with the crossing candidates in them. I will wait until the review process on this PR is done though, so that I know the code being used is final.
- Andy C & John also requested that I write out the procedure I used for training and put some of the extra python functions I made into a PR for LArMachineLearningData so I will need to put a couple of hours into that soon.
- Have started writing a simple algorithm that uses the final PFOs in a slice to try and nudge the vertex into a position that agrees with all of them. Andy mentioned using this to ensure the PFOs all share a vertex object as well, currently they all have subtly positions.
- My hope is the ideas in this will be usable in candidate creation as well (clearly in 2D not 3D).
- Spent a lot of the week fighting CI fires following the events of last Thursday ;) Everyone seemed to chose the week when all the automation was down to want lots of checks doing! Anyway as of yesterday most of the automation is back, praise be to Vito. The reco validation has been getting lots of genuine use now though which is really nice to see (thanks Dom!)
- Gray has asked me to help draft some of the new SBN Young bye-laws, this is what you get for speaking in meetings… lesson learnt!
- Marking marking marking, one more week to go…
- A combination of presentations, marking and the bank holiday, so not masses of code work!
- Was working on automating the production of the input files for the validation workflow, this has hit some stumbling blocks with a new jobsub version not playing nicely with POMS/project_py.
- Presented the BDT work at the SBN and Pandora meetings. Am starting to prepare a PR with the added functionality so far. This will be a significant amount of work I think.
- Started looking at whether we can reduce the candidates produced initially. Tried a simple method (trying to chose the “best” candidate in a certain area and chucking the others). This, probably unsurprisingly, worked well in cutting candidate numbers but hit the performance a bit too, so will look at something more sophisticated.
- In the process found a setting we currently have turned off in SBND which makes candidates at crossing points as well as at end points. Turning this on gave a performance improvement of ~2.3% for the nue sample. This is without retraining the BDT to account for the new candidates either. I’m hoping there is little overlap with the BDT improvements but will test this out this week at somepoint.
- Mainly a week of consolidating and understanding the results I showed last week.
- Have validated my best BDT so far. Total improvement 68.0% –> 72.1% for nueCC and 71.7% –> 74.8% for numuCC. About 2% of this is from new variables, the rest from changing architecture and improvements in upstream reconstruction.
- More details at the SBN TPC Reco meeting later, be there or be square.
- Planning on spending a week or so with my head out of BDT-land looking at other vertexing jobs we’ve talked about like rationalising the list of candidates before selection.
- Added the ability to weight the data points such that each physics event contributes “1” to the training. While I was doing this I scaled up my dataset for the proper training (meant to be 100k each, nue I only got 81.3k successfully so I matched that with numu). The validation accuracy scores are only useful so far so I wanted to check them against the smaller nue sample I had from earlier testing. Also would show any gains from just retraining.
- Results probably not surprising. Gain of 0.8% events within 1cm from retraining and just 0.5% from adding weighting so if anything weighting has slightly negatively impacted.
- Went away and added a few more things to the training files, including some relational information between the two candidates (distance between them, hits & energy in a box between them).
- Have now trained all sorts of combinations on the full dataset. Highlights are: relational info definitely helps (~0.4% gain in validation accuracy), energy variables do make a good change (can be >1%), structures with more trees and larger depth obviously learn more from combinations of variables (shared variables seem to drive down the overtraining based on KS test scores).
- First image is the original BDT just retrained. Second is my best not overtrained combination so far. Still have a little more tinkering to do and will then need to train an equivalent region BDT but will attempt to do a validation run soon to really see what the changes in validation accuracy mean per event.
- Have finished a CI workflow that uses all the reco modules we have so far. Presented it at the CI & Validation meeting last week. Is usable in a feature branch and should be usable in develop soon!
- Updated some variables to use different clusters for different tasks based on the work I showed last week.
- Found some further quirks in the event shape variables. 1. the updated version is still questionable as its using the “z” coord interchangeably between the three views. 2. it uses all hits so is very susceptible to outlying hits completely ruining the event shape.
- Produced a test training sample (20k + 20k) to start messing around with the BDT. Wrote a few new python methods into the Pandora MVA helper functions to do things like remove features. First pass of trying configurations shows virtually no effect from any of my changes / new variables. Little bit deflating…
- Using Ed’s architecture (100 trees, max depth 2) which he chose to eliminate overtraining issues. Ran with 1000 trees and max depth 3 to try and give it more chance to learn from the new variables, will come back to overtraining later.
- Slightly more variation but nothing to write home about… the reverse really…
- Did, however, look into something I mentioned to Andy a couple of weeks ago. There is a massive misweighting in the training samples inherent in the number of candidates in the event (which is very tied to the event type & energy). The two events below have 1085 and 29 candidates respectively.
- The number of candidates looks like this for nue events (much lower for numu but similar shape).
- I then wanted to see how much effect this might be having…
- This definitely needs looking at. Two pronged approach I think, one is to look at reducing candidate numbers, there is no way we need > 1000 candidates to represent the possible vertex locations in an event. The other is to introduce some sampling mechanism to at least somewhat rebalance the data.
- CI work was pushed along nicely by the need to use Ed’s validation modules. That particular workflow is now up and running. Going forward I think its worth integrating the other reco validation wfs such that it can be operated as one trigger with one set of files.
- Presented at TPC Reco. Useful discussion about what to include in the training data set so I will begin creating that this week.
- Other useful suggestions on variables which I have / will pursue. Ed’s charge ratio suggestion unfortunately didn’t gain us much.
- Found the reason that I was struggling to find local clusters for some vertices was to do with the use of the sliding fits. Sliding fits are only built for clusters with 12 or more hits. These are then the only clusters used for most of the tools. This makes sense for getting reliable directions but perhaps could be loosened in other scenarios.
- Looked at what this looks like in event displays…
- Based on that I wanted to try and quantify this a little…
- Variables using clusters with 4+ hits are on the way, hopefully they will show some marginal improvement.
- CI PFP Valdidation workflow is up and running and I’m testing the ZeroMode change with it today.
- Most of the week spent digging into some of the “features” of some of the BDT variables.
- Lots of the asymmetry variables have spikes at 1 & 2 as well as 3 (they are sums across the three planes). These occur due to “bad planes” where the calculation fails. Attempted a couple of methods to smooth this effect out.
- Verified that the cap on the vertex energy is already present at the point pandora reads in the hit information
- Discovered that the reason the “event showeryness” and “shower asymmetry” variables were always returning the same value was to do with a small error in the neutrino pass xml
- Still tinkering with ways of better calculating a dE/dx variable as I’m not overly happy with it. Have kept different versions so that once I have a testing file I can try the BDT with the various permutations.
- The xml error has sent me down a slight rabbit hole getting Ed’s pfp validation analyzers plugged into the CI which is taking slightly longer than I intended but will be useful to have done.
- Presented at DUNE UK Software meeting last week. Feel it went well and there were useful suggestions of avenues to pursue. One of which was looking more at charge-based information which was something Andy and I had been discussing too.
- Came up with some very quick and easy variables to test if this has legs. These first plots are for the “regional” stage of the BDT which is already good. Like most variables they lose a lot of their separation when you look at them in the context of the “vertex” stage of the BDT but the dEdx asym still looks like it might have some potential.
- Clearly its very bumpy. These features are due to the fact that the final value is a sum of the value for each of the three planes. These features are common in the other variables that use this base “asymmetry” method. I want to look at whether we can improve these at all by only considering “valid planes” etc.
- Plan to be spending a chunk of time this week digging into some of the odd features of these plots and thinking about more sophisticated ways of calculating something similar.
- Have updated my feature branch for the event shape bug fix re: the discussion last Tuesday. Has raised quite a few questions that will crop up down the road if we do end up significantly changing any variables/which variables go into the BDT.
- Some more CI work pootling along in the background.
- Have Ed’s scripts for the vertex BDT set up and working. Have tinkered with some of the parameters etc to get a bit of intuition on how it works.
- Also means I have plots of the variables that go into the BDT now which helps with understanding.
- Spent a while working out exactly how the variables are calculated, will start playing with the inputs this week (turn off / add variables etc).
- CI workflow now set up to use detsim files as the input. Will do a full stats run this week but that was the last major thing I wanted to get sorted for that.
- Went to lots of NuTel talks last week.
- Started last week by correcting the cheating issue Dom pointed out last week.
- Presented vertexing work at TPC Reco, lots of discussion so plenty of things for me to look at in response.
- Ed sent me a load of info about the BDT and various automation scripts which I’m planning on spending today and tomorrow getting that setup and having a little tinker to make sure I understand it.
- Managed to fit a little bit more CI work in, the weird genie issues seem to have been resolved so I can go back to actually making sure the tests work as we want them to.
- Listening in to bits of Neutrino Telescope when I can.
- Highlight of my week was, of course, bumping into Chris twice on our ‘daily exercises’ at the weekend!
- Based on the themes I saw in the event displays I made some quick (and ugly) plots that confirmed that the themes I mentioned last week are present across the whole sample.
- Had a look at the numu sample. Less obvious themes, often the moderate errors are due to some level of merging around the vertex and there are also events in which vertex ends up at the end of a proton track but less than in nue.
- Realised that a couple of the events I looked at in the numu sample could be salvaged by the high angle tracking as I was still using the MCP2020A sample. They were, which was nice! This might widen the performance gap between numu and nue slightly.
- Setup a reconstruction workflow which uses the cheated vertex selection to check how much is recoverable by selecting the right candidate. Checked this on some of the events we looked at last week, noticeably salvaged a lot of them. Including improving downstream reconstruction.
- Have simulated a 20k sample for nue (and doing the same for numu) with and without cheating to compare across a whole sample.
- Spent Friday getting a standalone Pandora build working, will start looking at a few simple things like like turning off the z-prior and seeing what happens.
- Got a bit sick of fighting larsoft/pandora event displays to compare truth reco. Inspired by Chris, I’ve made a similar “home-made” event display to his to speed my event checking up.
- One of the most common topologies I found in the nue sample was the vertex being placed at the other end of a proton’s trajectory. Often these were relatively short protons although there were examples of longer protons as well. Proton reinteraction / scatter points were also a relatively common occurrence.
- Definitely possibilities to thing about – introduce some charge based information to recognise Bragg peaks? Or similarly recognise jumps in charge profile at the vertex? Could very short proton tracks cause trouble though?
- Also saw some backwards going wiggly electrons with vertices at the end of the electron “track”
- Also found a sigma(c)++ displaced vertex @Chris @Niam
- Next steps are to do the same for the numu sample and find some ways to quantify the occurrence rates of common failure topologies.
- CI work is also ticking along, should get to run a full stats test in the next few days to check it actually recognises a change (will use high angle tracking)
- There were issues in the initial vertex error plots due to the simulation of the beam spill. This was resulting in an x-error corresponding to the neutrino’s position in the spill. Corrected for this in the numu sample but not in the nue sample where the error is too large to purely be as a result of the spill width. Resimulated nue to remove this issue.
- Lots of plots but takeaways are:
- Nue is now much closer matched with numu (68.7% success vs. 71.3% success)
- Breaking into three energy bins doesn’t point to any particular weaknesses
- Definite correlation between vertex error and completeness, dE/dx etc
- Have started leafing through event displays to try and find some common topologies. Starting with the worst events (error > 5cm)
- Have also spent a chunk of time working on implementing some reconstruction test plots into the SBND CI. Machinery of it is working, tinkering with the details on the sample, exactly which plots etc.
- more marking…
- Have started looking into the performance of Pandora’s vertexing.
- Using the bnb and nue official samples from last year (no cosmics)
Started by just looking at whether we’re picking the correct vertex most of the time or not.
- Looking through the results event-by-event I realised there were quite a lot of events where technically the closest vertex wasn’t being picked but the margin was very small. Thought I’d make some more nuanced categories…
- Is 1cm sensible? Thoughts on “success 2” ?
- Went through some metrics for reco further down the chain that show improvement when the vertex error is better (slice completeness, muon track length, electron dE/dx). Big question mark over whether this is correlation or causation…
- Will look at a few more metrics and then have a look at cheating to see if I can begin digging into the correlation vs. causation question.
- Other things: got my new laptop on Friday (exciting!), went to the SBND collaboration meeting last week, PGTAing again this term.
- Bit of a slow week, had a test and travelled back home. Worksheet marking took an absolute age this time but that’s the last one until Feb.
- Did a surface look into the effects on the unambiguous cosmic pass as brought up in discussion last week. No glaring negative effects to worry us.
- The PR has been submitted now and I’ll be presenting a summary of all the hit width clustering work at the PAT meeting on Thursday. Then *fingers crossed* this will be done.
- On the vertexing front I made a start on trying to reproduce some of Dom Barker & Ed’s plots. First one was not what I wanted to see but I have found the reasons for that…
- Started looking at calculating a true dE/dx in the way Ed did. Main questions are on sim::SimChannel object. From what I understand it represents true energy deposits on a single wire? I’m currently trying to get from this to which wire it actually represents, the algorithm keeps getting angry with me, something to do with geometry I think.
- Have finished working on the high angle tracking implementation. Final plots using the official production samples.
- Next, will be starting a study of the impact of the vertex reconstruction on the later phases of reconstruction. Have started looking at re-making some of Ed Tyley’s plots to check I can show the same effects going on.
- It’s been a while! Mainly been working on the high angle tracking implementation.
- Presented at a couple of TPC reco meetings, main takeaways:
- efficiency improvements for both muons and electrons
- also improved completeness with a very small drop in purity
- to try and combine these efficiency and quality improvements I used a definition of reconstructed that had completeness and purity requirements
- Currently producing equivalent analysis for a bnb+cosmics sample for the PAC meeting (sample is 300k so I’m hoping I can get a rough electron plot from that, as well as the muons)
- Also been vaguely flicking through pi0 reconstruction performance in free moments but don’t have anything particularly coherent on that front yet
- Have been working more on the high angle tracking implementation
- Improvements in single particle gun sample translated well to the bnb sample
- Presented the work so far to the TPC reco group on Wednesday and am currently working on some of the points that came out of that.
- Need to look a bit deeper at the shower side of things and have started looking at protons in the bnb sample as well.
- More welcome talks and training sessions
- Spent most of the week trying to work out why the hit width cluster merging algorithm was having literally zero effect despite claiming to be running… Turns out if the “UseHitWidth” flag is set to false then the hit width algorithm doesn’t really work… who’d have guessed?!
- Once we had found the flag then things started working and there are definite improvements:
- Integrated reconstruction efficiency for muons 96.1% –> 96.8%. Checked it across other variables, e.g. momentum:
- Wanting to look at completeness as well, just waiting on my analyzer to finish. There were noticeable cases of a muon track split into multiple clusters (this would still count as ‘reconstructed’ in the efficiency calculation) so hopefully the completeness will also have improved!
- Will also look at showers this afternoon/tomorrow
- Am officially a PhD student now! :D
- We had a kind of welcome talk thing on Friday and have PGTA training tomorrow.
- In between welcome-y things I’ve been working on implementing a high angle tracking algorithm into sbndcode. Made a couple of silly errors that cost me a chunk of time but should have the first sample finished by the end of the day
- Remade the pi zero reconstruction plots with the PhotonOne & PhotonTwo definitions updated to representing leading and subleading respectively. This shows the expected effect with significantly more leading showers reconstructed. Currently updating my code to add hit completeness and purity to my trees.
- Just wanted to check I have the definitions correct for these:
- If I have a Shower, X, truth matched to a MCParticle, Y
- Completeness = (No. of hits in X attributed to Y)/(No. of hits in whole event attributed to Y)
- Purity = (No. of hits in X attributed to Y)/(Total no. of hits in X)
- Began having a look at the spike in the muon reconstruction graph. If I bin more finely then it looks more like a wobble, a bit like in my MPhys plot.
- Not much to say, spent all of last week finishing off the C++ course which I think was worthwhile. So haven’t done any more on the physics work, will work on last week’s discussion points this week.
- Have started working my way through the Fermilab C++ course from August. I’ve only ever learned C++ from google and online tutorials before, so its proving very useful!
- Did some more work on the CC1pi0 selection in August
- Thought it would be reasonable to relax the truth momentum thresholds to 0.15 GeV/c
- Also tinkered with the fiducial volume requirements
- Pre-selection cuts on showers don’t need changing, if we’re maximising for eff*pur
- Started looking into the weird track scatter values, think it might just be a result of the big differences in the distance between trajectory points in truth vs. reco. Will look further into that to confirm.
- As a starting point to looking into the limitations on the pi0 side, I produced these plots but I haven’t gone any further with this yet.
- (Sorry for the lack of plots, I have exported them as pdfs and can’t upload them onto this, I’ll sort that for next time).
- Most of the week spent dealing with technical issues. Had some events missing when I updated my analyzer. Then had a set that all had RunID=1 and SubRunID=1, this made looking at event displays etc impossible but I have managed to do some physics…
- Some tinkering with the cuts/MVAs hasn’t created any noticeable changes in the selection.
- I haven’t tuned the shower pre-selection cuts yet but despite the improvements in reconstruction they still seem to be worth keeping (removes 56.8% of ‘proton’ showers and only 2.6% of signal).
- I have (a small numbers of) muon candidates with a truth matched pdg code of 22? Haven’t had a chance to dig into this yet but photons shouldn’t be leaving tracks….
- My invariant mass distribution shows a larger energy underestimation than in the previous analysis. The fit gives 95.3MeV/c^2 compared to 106.6MeV/c^2. Using reconstructed direction and true energy still produces a distribution centred on 135MeV/c^2
- Presenting tomorrow at SBN Event Selection Meeting (I think?) In the process of making slides and need to decide how much to focus on the old data and how much to focus on the new stuff. (Old data has better logic for cut decisions etc).
- Taking a week off next week (WC 17th) to see my parents.
- Made the 100,000 events
- Ran my MPhys analyzer on the new sample
- Only got the trees on Friday so haven’t done anything detailed yet
- Headline changes are efficiency is down (57.5%->50.2%) but purity is up (66.0%->70.9%). Probably the right direction for SBND!
- Can’t read too much into this or the cuts breakdown as I haven’t tinkered with any of the cuts or the MVAs for this sample
- The reconstruction efficiency studies show some drastic changes though. The total reconstruction efficiency for most particles is pretty similar to the old sample. However, the track-shower discrimination for protons and charged pions has drastically increased!
- All been technical stuff
- Finished setting up my Fermilab account
- Installed ubuntu on my laptop (gulp!)
- Dom (thanks!) led me through using the batch system for submitting jobs and I have done a test run of 200 events through GENIE, g4, detsim and reco (including a new shower approach)
- Copied over my analysis module from my MPhys – took a bit of tinkering to get it working again on the new files, but it seems to now
- Next steps
- Create proper pools of events?
- Run the analysis and compare results to those with the previous reconstruction