- Back to FNAL since the 4th Jan.
- CRT#s were removed from the pit a week and a bit ago (for attaching strain gauges to the cryostat).
- So CRT team have moved onto other commissioning tasks. In my case this is working on finishing the bottom CRT installation/commissioning.
- I’m trying to wrap up the CRT clustering work in the next few weeks. I have a few key checkpoints I want to achieve with it.
- Similar to the bucket resolution plot from last week. Lan and I have been working on showing that we can achieve this kind of resolution but with different timing routes.
- We have got to similar plots but we now need to work on how to incorporate a clock drift correction, which we hope will get us down to a similar resolution as before.
- Have been racking up the statistics with the #s. We had slightly more intense beam last week, and the DAQ was behaving itself a little more too so I’ve been taking lots of runs.
- For the clustering workflow. I’ve rewritten the workflow somewhat to reflect discussion in the reco meeting. I’ve implemented ways of checking the purity and completeness of these slices (as you’d expect both are very high). Am currently working on a few different instances of hit reconstruction for the next stage (e.g. for clusters with 1, 2 and >2 respectively).
- Ran a version of the cosmic simulation for the 2017-2018 BT data with a more advanced neutron model to try and investigate a low energy bump we have in the data but not the MC. So far this hasn’t pointed us in the right direction.
CRT, CRT, CRT
- Working on 3 fronts, all CRT related
- CRT## Project
- Have hooked the sharps data up to sbndcode reco. Hopefully the decoder should be the same as needed for future full system data.
- Despite not having enough stats, have been able to show we have the resolution to resolve the spill substructure by overlapping the buckets.
- We want to do this now with a couple of different timing setups. Lan & I intend to work on that this week.
- Don’t have the #s for much longer so trying to quickly get what we can done before they’re gone. Unfortunately we still don’t have full strength BNB which has hampered this.
- 2017-2018 BT Data
- Have started doing some MC/Data comparisons for the cosmic part of this data.
- Pointed us in the direction of a few things to investigate.
- CRT Reconstruction
- Have been working on some changes to the reconstruction, focused around creating actual “strip hit” data products and clustering them in a manner akin to slicing.
- Mostly conversations so far, although I’ve drawn up some skeleton modules to run the basic concepts. Got working on it properly last week and looking forward to getting stuck in more.
- Heading to the CM in Edinburgh starting 5th Dec, will then be in the UK over Christmas and back to FNAL on Jan 3rd.
- All things CRT related, similar to what you saw last week…
- CRT# version is running but we’ve now found an issue with the geometry file so we need to back to Gustavo/sort it out ourselves
- I’ve done some pedestal calculations for when we do get running.
- [Beam Telescope] Indication that the weird behaviour in the t1 plot we discussed last week was due to t0 reset events being somewhat sync’ed with the beam at least some of the time.
- Need to go back to the original data to confirm this as we dropped the t0 resets in the early processing
- [CRT#s] been doing runs over the weekend to try and understand whether the thresholds affect a data corruption issue we’ve been seeing.
- Long talk “Reconstructing SBND Beam Telescope Events in sbndcode”
- Too big a file to upload, ping me if you want the slides!
- Moved to FNAL in August!
- Have mainly been working on CRT-related things
- Working on the CRT# project at ND
- Helping on the effort to look at the beam telescope data from 2017-2019.
- Have a CORSIKA simulation now working for it
- As of last week have the data in an artroot form :party:
- Doing a large refactor/rewrite of the CRT reconstruction as
- it is quite old
- relies on lots of simulation assumptions
- doesn’t allow us to input data-type parameters like cable delays or pedestals
- The main plan here is that a lot of this work is relevant to CRT# and to CRT commissioning.
- Did some little tidying up of CRUMBS & merged a few stray ends
- Have been trying to scale down my service commitments (CI, Validation & Production) in order to have a little more time for the above!
- Was at Fermilab last week for the SBND CM. Was a fantastic week, really nice to finally meet lots of the people I chat to on zoom every week but have never met! Seeing the detector & the detector building was also really exciting.
- Gave two talks:
- CRUMBS: Went well, people are pretty used to it by now. A few interesting comments and questions. Still a fair bit of chat around the choice of flash matcher & whether improvements are coming in that regard. To be honest, I can just wait and see how some of that pans out on its own before I worry about the effect on CRUMBS. Was able to show some new plots that suggest that giving CRUMBS different signal categories (CCNuMu, CCNuE and NC) might get us a little extra performance. Nicola made an interesting query on whether we’d be able to use it in a more absolute way, this would also allow us to determine pileup events.
- NCPiZero: Again I felt this went well, although comments & questions went on almost 20mins… Tonnes of suggestions and tonnes of things to pursue over the next few weeks. Some of these were on my list anyway but its definitely grown in length big time.
- SBN Young events also went really well, I think people enjoyed the pub quiz and the Oscillation Analysis workshop was super well attended and incredibly useful for me and for others by all accounts.
- Preparing for my LTA. VISA appointment next week (apologies for this meeting), sorting flights etc. Accommodation is done, Ornella has apparently sorted me a desk in WH.
- I’m not intending to be doing masses of work the next couple of weeks. Trying to tie up some of the outstanding questions from the above talks ^ . Then I”m taking a week off for my MPhys graduation (ha, class of 2020) and then I’m off to FNAL.
- My last week living in Lancaster, lots of good-byes and lasts after 6 years! Moved my stuff out on Sunday and I’m getting a coach (thanks rail strikes) south on Friday.
- Off to Fermilab next week for the SBND Collaboration Meeting then back at my parents for a few weeks before my LTA begins. Still got some faff to finish off (have VISA appointment booked, accommodation is confirmed, medical insurance seems to be closerrrr but Deborah is handling that).
- Had a couple of weeks of blitzing through stuff that has been on the back burner for a while.
- Two presentations at last week’s SBND Reco meeting
- CRUMBS talk on analysing the effect of multiple subsystems, no one seemed to care too much but it puts those questions to bed a little bit.
- CRT talk on analysing the reco effects of Marco’s detsim changes & adding some corrections into the CRT hit time reconstruction to account for effects we simulate (propagation delay and time walk).
- PR for this CRT work has been merged.
- Presentation at this week’s SBND Physics & ES meeting, quite last minute. Now we have the new files with working rollup I dug back out my NCPiZero selection code (showed in this meeting in December) and did some work on this. Garnered lots of interest and plenty of questions to explore.
- Two presentations at last week’s SBND Reco meeting
- Organising the SBN-Young events for the CM. Oscillation Analysis workshop being run by Chris Backhouse with some content from Rhiannon. Plus an online social event for those not attending in person. Not running the flash talks session this time as the main agenda is longer so gives more space for all students, its also only been 4months since the last one so people don’t have as much to show.
- Gave long talk on CRUMBS.
- Attending Neutrino 2022. Presented my poster this morning at 7am, got some interest.
- Have an uploaded technote with updated validation plots using the MCP2022A sample.
- Gave a talk at the PAT meeting last week. Lots of positive reaction as well as quite a few ideas of where to go next.
- Have thrown a cycle at showing the effect each subsystem has on the results. Have retrained 7 permutations (TPC, PDS, CRT, TPC+PDS, TPC+CRT, PDS+CRT, TPC+PDS+CRT) and created the relevant files. Currently writing a plotting script to show the effect of this.
- Finished all samples.
- LTA / CM trips to Fermilab
- Have applied for ESTA for CM and have my DS-2019 form on its way so I can apply for VISA.
- Deborah has taken over sorting medical insurance. Haven’t heard anything back on that front yet.
- Have produced mini-samples to demonstrate the effect of Marco’s CRT detsim changes. Plots for 2D hit creation look fine, will do 3D tomorrow.
- Technote written, planning on uploading it today.
- Giving a talk at the PAT tomorrow focused on showing it performs well across different true variables (hopefully allay model-dependent concerns) and will also make clear how to use it on MCP2022A files.
- Made and submitted poster for Neutrino22 next week.
- Finished my 3 samples a couple of weeks ago. Took over the main “rockbox” sample (the big standard BNB sample) as it wasn’t going anywhere and I want those files as much as anyone else. Just doing final cleaning up of this sample (duplicates etc) but its essentially done.
- LTA / CM trips to Fermilab
- Continuing to sort things with Fermilab for my LTA. Have sorted financial information with the visa office, final thing seems to be medical insurance, so much back and forth between the company, FNAL and the Uni but I think we’re close now.
- Also planning on attending the CM in person. Am the only Lancs person going so will book flights / accom this week. Any recommendations from anyone who’s been to FNAL before would be appreciated.
- Marco has finished his CRT Sim updates so I’m taking a look at the effect of those updates on the reco and then hopefully implementing the changes I made to the old reco and seeing the differences.
- Am organising booking rooms for in-person watching of some sessions next week. Will email everyone later today.
- Had some time off which was nice, lots of Lakes hiking and a trip to Edinburgh to see a Lancs alumni.
- CRUMBS note pretty much done, just adding a tutorial-y section at the end, when its done I will
- ask a couple of people (Dom? Andy?) to take a look
- present at Event Selection & Physics meeting to encourage people / show how to use
- Poster ticking over, Speakers’ Committee want draft by 6th May, will aim to have it before that
- Production has begun, spent last couple of days setting up and testing campaigns, should hopefully be mainly monitoring now.
- Did various “trainings” for LTA
- (Still on holiday)
- Setting up production configs for the 2022A productions
- Writing CRUMBS note (1st draft almost done)
- Made a start on CRUMBS poster for Neutrino22, will have that done mid next week.
- Started ball rolling on LTA with forms for Fermilab (hopefully starting ~August 1st)
- Added 2D multivar histograms to CAFAna
- Added CI functionality for selecting references from the command line (standard CI)
- Currently writing up a technote for CRUMBS. In the process I’ve been doing bits and bobs like cleaning up the training scripts so I can provide examples for everything.
- In the process I discovered CAFs don’t let you plot 2D histograms for multivars so I’ve been working on adding that functionality to CAFAna as it seems a little bit of a gap.
- Helping to debug test files for the 2022A production, currently we’re going back and forth on an issue with the Proton chi2 PID.
- Developing functionality in the standard CI to request a difference reference than the most modern one (came up as an issue when testing 2022A PRs).
- Abstract accepted for a Neutrino2022 poster on CRUMBS
- Have validated the performance of CRUMBS on a small validation set.
- Showed that performance is better than it was for using traditional cuts. Example is this simple selection shown below but I have quite a few validation plots. Also have a few ideas for trying a few changes to the tool this week.
- This is clearly just one way of using it, its an analysis tool, the choice of what is important comes down to the analyzer!
- Am in the process of preparing various things for next week, CRUMBS Talk, CI Talk, SBN Young talk and SBN Young sessions. Had to turn down a CRT talk for the sake of my sanity!
- All 5 of my production samples are now finished on the grid! Some small amount of offline work to sort out flattened CAF files still needs doing but we’re pretty much done.
- Got named on 3/5 critical updates for the next production (starting next week supposedly). The two I actually have a hand in I’ve opened PRs for.
- First CRUMBS talk went really well, good reception and interest. I’ve been implementing some suggestions from Michelle (on CRT & FM), Iker (on FM) and also including a Bragg peak tool Ed pointed me too. Needed a bit of reworking to do what I needed it to but I have that working now.
- Started running grid jobs to produce a sample for larger training. Tests ran fine and then got 50% failure rate in the full sample, thankfully this was a grid issue not me and has apparently been fixed overnight.
- Production samples are now in full run, just monitoring. Hopefully have all samples (maybe bar the largest) by mid next week.
- CRT work continues. It doesn’t seem like any validation of the CRT Sim/Reco was done when it was reactivated after the geometry and larg4 changes. I have been digging into why the outputs are complete garbage and have found two main issues…
- Problem: Every single readout value from the SiPMs was recording a threshold value. Fix: The new larg4’s implementation for making AuxDetSimChannels (GenericCRT_module) outputs the energy in MeV, the old larg4 (and almost all of larsoft) gave energies in GeV so our DetSim code was interpreting these values as 1000x their intended value thus the thresholding effect. I have a PR in larsim which will fix this.
- Problem: The new geometry has swapped coordinates in the way CRT volumes (both strips and modules BUT NOT taggers) are created. x used to define the width of a strip (distance between SiPMs) and y defined as the length of the strip. This has been inverted, however, the x coordinate is still used to create a “width” variable and the y coordinate used to make a “height” variable. The way these variables are used by our reconstruction assumes this convention is correct. Fix: Simply put, change the way the variables are used but at the earliest possible point in the chain. I have tested a fix and a PR is on its way.
- I have also done some further minimal work on CRUMBS my cosmic removal tool. However, I was waiting on these fixes to the CRT to make the CRT variables in CRUMBS not utter garbage. Am running some files now with these changes outlined above to see if we now have more reasonable variables.
- Have made a PR to LArContent to implement the required persistency of the MVA features from the sliceID. Am collaborating with Bruce now as he wants to implement the same idea for other pandora MVAs.
- Lots of marking for PHYS366
- In more exciting news its the University Brass Band contest (UniBrass) this weekend so I’ll stop having to spend every evening/weekend & lunch break preparing for that soon!
- Production rattles on…
- Had a meeting last night about RollUp. Hans’ plan is to implement a new module after the current larg4 before the ionisation and scintillation modules which will filter the MCParticles and edeps with new trackIDs to match this. Wes has asked me to mock up the memory effect of this by running g4 without rollup but then dropping the MCParticles.
- Presented some work on CRT hits to SBND Reco. Got some feedback, especially from Michelle post-meeting so have plenty to work on, on that front.
- Have got a quick first pass of the cosmic removal tool working. Uses the 10inputs from the pandora BDT, a few of the scores from the flash matching (Iker & Michelle) and a couple of scores from the CRT track matching (Tom Brooks). As would be expected the distribution improves when you add in these variables.
- There are quite a few caveats with this
- Clearly a proper analysis will require looking event by event not slice by slice
- Very small sample (BNB+Overlay & Intime) and not properly accounting for POT
- Have not done anything sophisticated with the signal definition (Signal = 80%+ pure nu slice, Background = Everything else)
- Need to cap the CRT variables for them to have better effect
- As a proof of concept, however, this works!
- Lots of production work ongoing, we’ve fixed most bugs and files are now being made. Progress is slow (large reason for this is the fact we’re simulating all shower daughters).
- CI is thankfully quiet for a change, the push to get all developments into the production release seems to have resulted in a lull in PRs & changes.
- Finally presenting a few thoughts on CRT hit reconstruction that Ivan asked me to look at a while back at the SBND Reco meeting today.
- Have made some small improvements to the NCpizero selection by tinkering with the strength of the shower requirements.
- Started work on a cosmic rejection tool using information from all three subsystems. Taking the inputs used in the slice BDT in pandora (TPC) and combining them with flash match scores (PDS) and some CRT metrics (as yet undefined) to hopefully improve the score we use to determine whether a slice is neutrino or cosmic.
- More marking on its way from PHYS366 in the new term.
- Giving long talk on the status of NCpizero selection
- sbndutil -> sbnutil is basically done
- Ended up responsible for providing non-rollup fcls for production (done) and helping Ivan fix a CRT bug (almost done), not really sure how that happened.
- There’s still a few things that need doing / patching before we begin production but I’m trying to avoid getting landed with any other tasks
- CAF checks exist in CI now
- NC pi zero selection ticking along, using Ed’s MVAs in a vaguely sensible way was able to get much better eff & pur than the box cuts. Started investigating the backgrounds (largest is numuCC pi0 not rejecting muon) and reasons for signal inefficiencies (being stymied by rollup again).
- sbndutil -> sbnutil migration for production has been less effort than expected (without tempting fate :/ ) there is a set of jobs running now using the sbn setup which so far are working
- CI was super busy last week with production PRs but for once it actually worked as meant to and didn’t take too much time
- Have started some work on a NC pizero selection, have got a pretty basic selection working. Plans for next steps are a) looking at the effect of swapping basic box cuts for Ed’s track & shower MVA PIDs and b) then looking at remaining backgrounds / rejected signal particularly with a view to looking at necessary reconstruction work.
- Production stuff…
- CI stuff… mainly documenting everything so it can be used easily
- Reco Validation is almost completely clear again after weeks of angry red plots.
- Basically since the refactored larg4 there have been some horrible plots in the CI
- (look at CI dash)
- Most of them ended up being metric issues
- Vertexing was to do with electron drift (thanks Dom!)
- Last remaining issue is muons, need to fire up an event display and see what’s going on there!
- The fact that the vertex plots are clear means I can retrain my vertex BDTs knowing they’re working on a sensible base, so I’ve initiated that this morning. Should have a PR for it by Friday.
- Have played around further with the CAFs. Made up a mock selection to practice using ‘Cut’ functionality.
- Andy & I are going to look at hit & track uncertainties for the CRT. I’ve started plotting things from the CRT data objects and have some follow ups to do from this.
- Attended HEP Summer School, super intense but great fun & lovely people
- Joined SBND’s production group and have spent a fair chunk of time getting up to speed. Have now run my first full sample.
- Plenty of CI work
- Lots of work keeping the normal CI running, especially with the mrb change last week and the continuing refactored larg4 fun and games.
- My automatic sample generation has been finished and merged into the lar_ci repo, presented this to the CI&Validation group on Friday and it went down well.
- On the back of that they want one extra workflow and I still want to write a script to move reference files for the validation but neither of these should be large amounts of work.
- Have been helping to organise SBN Young’s “documentation initiative” to produce more comprehensive material particularly for brand new students. Also contributing my own small amount by updating the CI documentation.
- Wanted to get to grips with the CAFs to be able to do more analysis work so have worked through Ed & Gray’s tutorials from the April workshop. Andy & I had also had a chat to Ivan about doing some simple CRT work and so I’ve been trying to kill two birds with one stone by producing some initial plots to understand the CRT better using the CAF framework to understand that better.
- Took some holiday
- Gave New Perspectives Talk
- Have had Vertex Refinement Alg merged into recent larsoft release, am working on integrating this for SBND. Requires another retraining of the BDT because the one used before was plagued by the wire geometry bug (mainly just needs me to trigger grid jobs so can do during the Summer School).
- Have joined the production group, in the process of learning all about POMS and trying to run a mini test campaign.
- CI has been busy with all the development for the production. Caught two significant bugs so its doing its job.
- Have also been tying up some dev work with the CI (fcl file checks & automated sample production). Both are essentially done, will PR once Vito is back from holiday to check them.
- Very much in tying up loose ends mode, in time for the STFC Summer School next two weeks.
- Had confirmation, then took a week off.
- Haven’t presented refinement algorithm to Pandora yet, am doing so next Tuesday but am progressing assuming its essentially finished.
- Producing a new set of training files using the new refinement alg + the crossing mode + my BDT changes so that I can validate the combined changes.
- Should have this in time to put into my New Perspectives talk + poster for HEP school
- Am working on a couple of CI projects while these grid jobs run. One is to automate the production of input and reference files for validation jobs, other is to add basic fcl checks (essentially compare fhicl-dumps for standard workflow files).
- Have spent most of last couple of weeks writing my Confirmation Report / Preparing for Confirmation / Writing CM Talk etc
- CM talk was yesterday, felt it went fairly well, reaction was good!
- Have my interviewy thing this afternoon for confirmation
- Have also kept the vertex refinement work ticking over, having significant success with running the refinement on every candidate. This is pretty much ready to open at least a draft Pandora PR I think! Only thing I really intend to do with it now is do a little hand-wavey tuning of some of the numbers I hard-coded in.
- CI cried when the geometry changed, SBN then took a week to move to the new root after larsoft so lots of angry error messages! As of yesterday we’re back though. Might actually get around to looking at the event-by-event stuff soon so will take a look at Ryan’s macro.
- Presented at TPC Reco, lots of useful comments on things to try. Major thing I wanted to try this week was using the same ideas from the refinement in candidate creation.
- I developed a method where for each 5cm region in which there are candidates I use the refinement method to create one single candidate for that region. This has worked well in massively reducing the number of candidates but I think its still having slightly too much effect on performance. Note I haven’t tested this through the selection yet (my metric is just vertex error for the best candidate). My feeling is that some performance will be gained back in selection purely by reducing the number of ‘bad’ candidates but this might be wrong, and I don’t think it’ll be enough.
- This also then gave me the idea of running the refinement immediately after the selection. This would then mean the improved vertex would be used in all the downstream reco and therefore have more effect. Began testing this this morning (plots are numu then nue):
- This looks good to me, I will clean it up and use the CI for a quick test but I think this is probably the way forward though. And its implementation would be pretty simple and easily testable.
- Have started an outline for my 10month review report but should really start focusing on that now as a priority!
- Got my first LArSoft PR merged which was fun. Andy C and Maria got in touch about trying the new features in a retraining for DUNE FD so I’ve given them the information they asked for.
- On the vertex refinement stuff, big thanks to Dom for his suggestion of trying a matrix technique rather than the pair-wise iteration I was doing before. This has worked really well as its allowed for me to weight the impact of different clusters in a better way which has made quite a difference to the performance (numu then nue).
- I’ve been trying out the same approach in candidate creation but there are obviously more challenges. Firstly, the directions of smaller clusters are far less informative of the vertex position so deciding which clusters to use is difficult. Secondly, unlike for refinement, you don’t have a “current” vertex position to use a base. Finally, this method would just produce one candidate so I’m going to try a “drop out” style method where you only use a certain number of clusters and try different combinations. I’m hoping this might get around the outlier problem as well.
- I haven’t actually tried reducing the number of candidates again since my naff first attempt a few weeks ago but I will go back to that later this week.
- Presenting at TPC Reco today so will hopefully get some useful input.
- Its been a while! Been on holiday and have been attending Warwick Week, WIN2021, RAL Advanced Grad Lectures etc etc
- BDT changes have been approved and from my understanding will be in this week’s larsoft release. They won’t be active in SBND yet but the functionality will be there.
- The delay on activating them is because I wanted to spend a little longer on the two other vertexing strands I’ve been working on.
- Firstly I’ve been looking at methods to rationalise the final vertex based on the 3D reconstruction. Have tried a series of methods, of which quite a few look promising in the numu context…
- Not so much in the nue…
- Will keep working on this. Want to look at what the problem is with nue, can I be more careful with which clusters I use? Is there much overlap between methods, i.e. are they improving the same events? They all share the same functionality for events with just one 3D object, briefly looked and this seems weaker, what can I do with that?
- Have also been using some of the same ideas in the candidate creation stage. No plots here I’m afraid but good progress, am able to make extra candidates that are better than previous ones in a quite a few events but also currently producing more wrong candidates as well so want to look at reducing this before seeing what affect it has.
- PR has been submitted for LArContent. Recieved some helpful review comments from Andy C this morning which I have begun responding to. I want to produce training samples with the crossing candidates in them. I will wait until the review process on this PR is done though, so that I know the code being used is final.
- Andy C & John also requested that I write out the procedure I used for training and put some of the extra python functions I made into a PR for LArMachineLearningData so I will need to put a couple of hours into that soon.
- Have started writing a simple algorithm that uses the final PFOs in a slice to try and nudge the vertex into a position that agrees with all of them. Andy mentioned using this to ensure the PFOs all share a vertex object as well, currently they all have subtly positions.
- My hope is the ideas in this will be usable in candidate creation as well (clearly in 2D not 3D).
- Spent a lot of the week fighting CI fires following the events of last Thursday ;) Everyone seemed to chose the week when all the automation was down to want lots of checks doing! Anyway as of yesterday most of the automation is back, praise be to Vito. The reco validation has been getting lots of genuine use now though which is really nice to see (thanks Dom!)
- Gray has asked me to help draft some of the new SBN Young bye-laws, this is what you get for speaking in meetings… lesson learnt!
- Marking marking marking, one more week to go…
- A combination of presentations, marking and the bank holiday, so not masses of code work!
- Was working on automating the production of the input files for the validation workflow, this has hit some stumbling blocks with a new jobsub version not playing nicely with POMS/project_py.
- Presented the BDT work at the SBN and Pandora meetings. Am starting to prepare a PR with the added functionality so far. This will be a significant amount of work I think.
- Started looking at whether we can reduce the candidates produced initially. Tried a simple method (trying to chose the “best” candidate in a certain area and chucking the others). This, probably unsurprisingly, worked well in cutting candidate numbers but hit the performance a bit too, so will look at something more sophisticated.
- In the process found a setting we currently have turned off in SBND which makes candidates at crossing points as well as at end points. Turning this on gave a performance improvement of ~2.3% for the nue sample. This is without retraining the BDT to account for the new candidates either. I’m hoping there is little overlap with the BDT improvements but will test this out this week at somepoint.
- Mainly a week of consolidating and understanding the results I showed last week.
- Have validated my best BDT so far. Total improvement 68.0% –> 72.1% for nueCC and 71.7% –> 74.8% for numuCC. About 2% of this is from new variables, the rest from changing architecture and improvements in upstream reconstruction.
- More details at the SBN TPC Reco meeting later, be there or be square.
- Planning on spending a week or so with my head out of BDT-land looking at other vertexing jobs we’ve talked about like rationalising the list of candidates before selection.
- Added the ability to weight the data points such that each physics event contributes “1” to the training. While I was doing this I scaled up my dataset for the proper training (meant to be 100k each, nue I only got 81.3k successfully so I matched that with numu). The validation accuracy scores are only useful so far so I wanted to check them against the smaller nue sample I had from earlier testing. Also would show any gains from just retraining.
- Results probably not surprising. Gain of 0.8% events within 1cm from retraining and just 0.5% from adding weighting so if anything weighting has slightly negatively impacted.
- Went away and added a few more things to the training files, including some relational information between the two candidates (distance between them, hits & energy in a box between them).
- Have now trained all sorts of combinations on the full dataset. Highlights are: relational info definitely helps (~0.4% gain in validation accuracy), energy variables do make a good change (can be >1%), structures with more trees and larger depth obviously learn more from combinations of variables (shared variables seem to drive down the overtraining based on KS test scores).
- First image is the original BDT just retrained. Second is my best not overtrained combination so far. Still have a little more tinkering to do and will then need to train an equivalent region BDT but will attempt to do a validation run soon to really see what the changes in validation accuracy mean per event.
- Have finished a CI workflow that uses all the reco modules we have so far. Presented it at the CI & Validation meeting last week. Is usable in a feature branch and should be usable in develop soon!
- Updated some variables to use different clusters for different tasks based on the work I showed last week.
- Found some further quirks in the event shape variables. 1. the updated version is still questionable as its using the “z” coord interchangeably between the three views. 2. it uses all hits so is very susceptible to outlying hits completely ruining the event shape.
- Produced a test training sample (20k + 20k) to start messing around with the BDT. Wrote a few new python methods into the Pandora MVA helper functions to do things like remove features. First pass of trying configurations shows virtually no effect from any of my changes / new variables. Little bit deflating…
- Using Ed’s architecture (100 trees, max depth 2) which he chose to eliminate overtraining issues. Ran with 1000 trees and max depth 3 to try and give it more chance to learn from the new variables, will come back to overtraining later.
- Slightly more variation but nothing to write home about… the reverse really…
- Did, however, look into something I mentioned to Andy a couple of weeks ago. There is a massive misweighting in the training samples inherent in the number of candidates in the event (which is very tied to the event type & energy). The two events below have 1085 and 29 candidates respectively.
- The number of candidates looks like this for nue events (much lower for numu but similar shape).
- I then wanted to see how much effect this might be having…
- This definitely needs looking at. Two pronged approach I think, one is to look at reducing candidate numbers, there is no way we need > 1000 candidates to represent the possible vertex locations in an event. The other is to introduce some sampling mechanism to at least somewhat rebalance the data.
- CI work was pushed along nicely by the need to use Ed’s validation modules. That particular workflow is now up and running. Going forward I think its worth integrating the other reco validation wfs such that it can be operated as one trigger with one set of files.
- Presented at TPC Reco. Useful discussion about what to include in the training data set so I will begin creating that this week.
- Other useful suggestions on variables which I have / will pursue. Ed’s charge ratio suggestion unfortunately didn’t gain us much.
- Found the reason that I was struggling to find local clusters for some vertices was to do with the use of the sliding fits. Sliding fits are only built for clusters with 12 or more hits. These are then the only clusters used for most of the tools. This makes sense for getting reliable directions but perhaps could be loosened in other scenarios.
- Looked at what this looks like in event displays…
- Based on that I wanted to try and quantify this a little…
- Variables using clusters with 4+ hits are on the way, hopefully they will show some marginal improvement.
- CI PFP Valdidation workflow is up and running and I’m testing the ZeroMode change with it today.
- Most of the week spent digging into some of the “features” of some of the BDT variables.
- Lots of the asymmetry variables have spikes at 1 & 2 as well as 3 (they are sums across the three planes). These occur due to “bad planes” where the calculation fails. Attempted a couple of methods to smooth this effect out.
- Verified that the cap on the vertex energy is already present at the point pandora reads in the hit information
- Discovered that the reason the “event showeryness” and “shower asymmetry” variables were always returning the same value was to do with a small error in the neutrino pass xml
- Still tinkering with ways of better calculating a dE/dx variable as I’m not overly happy with it. Have kept different versions so that once I have a testing file I can try the BDT with the various permutations.
- The xml error has sent me down a slight rabbit hole getting Ed’s pfp validation analyzers plugged into the CI which is taking slightly longer than I intended but will be useful to have done.
- Presented at DUNE UK Software meeting last week. Feel it went well and there were useful suggestions of avenues to pursue. One of which was looking more at charge-based information which was something Andy and I had been discussing too.
- Came up with some very quick and easy variables to test if this has legs. These first plots are for the “regional” stage of the BDT which is already good. Like most variables they lose a lot of their separation when you look at them in the context of the “vertex” stage of the BDT but the dEdx asym still looks like it might have some potential.
- Clearly its very bumpy. These features are due to the fact that the final value is a sum of the value for each of the three planes. These features are common in the other variables that use this base “asymmetry” method. I want to look at whether we can improve these at all by only considering “valid planes” etc.
- Plan to be spending a chunk of time this week digging into some of the odd features of these plots and thinking about more sophisticated ways of calculating something similar.
- Have updated my feature branch for the event shape bug fix re: the discussion last Tuesday. Has raised quite a few questions that will crop up down the road if we do end up significantly changing any variables/which variables go into the BDT.
- Some more CI work pootling along in the background.
- Have Ed’s scripts for the vertex BDT set up and working. Have tinkered with some of the parameters etc to get a bit of intuition on how it works.
- Also means I have plots of the variables that go into the BDT now which helps with understanding.
- Spent a while working out exactly how the variables are calculated, will start playing with the inputs this week (turn off / add variables etc).
- CI workflow now set up to use detsim files as the input. Will do a full stats run this week but that was the last major thing I wanted to get sorted for that.
- Went to lots of NuTel talks last week.
- Started last week by correcting the cheating issue Dom pointed out last week.
- Presented vertexing work at TPC Reco, lots of discussion so plenty of things for me to look at in response.
- Ed sent me a load of info about the BDT and various automation scripts which I’m planning on spending today and tomorrow getting that setup and having a little tinker to make sure I understand it.
- Managed to fit a little bit more CI work in, the weird genie issues seem to have been resolved so I can go back to actually making sure the tests work as we want them to.
- Listening in to bits of Neutrino Telescope when I can.
- Highlight of my week was, of course, bumping into Chris twice on our ‘daily exercises’ at the weekend!
- Based on the themes I saw in the event displays I made some quick (and ugly) plots that confirmed that the themes I mentioned last week are present across the whole sample.
- Had a look at the numu sample. Less obvious themes, often the moderate errors are due to some level of merging around the vertex and there are also events in which vertex ends up at the end of a proton track but less than in nue.
- Realised that a couple of the events I looked at in the numu sample could be salvaged by the high angle tracking as I was still using the MCP2020A sample. They were, which was nice! This might widen the performance gap between numu and nue slightly.
- Setup a reconstruction workflow which uses the cheated vertex selection to check how much is recoverable by selecting the right candidate. Checked this on some of the events we looked at last week, noticeably salvaged a lot of them. Including improving downstream reconstruction.
- Have simulated a 20k sample for nue (and doing the same for numu) with and without cheating to compare across a whole sample.
- Spent Friday getting a standalone Pandora build working, will start looking at a few simple things like like turning off the z-prior and seeing what happens.
- Got a bit sick of fighting larsoft/pandora event displays to compare truth reco. Inspired by Chris, I’ve made a similar “home-made” event display to his to speed my event checking up.
- One of the most common topologies I found in the nue sample was the vertex being placed at the other end of a proton’s trajectory. Often these were relatively short protons although there were examples of longer protons as well. Proton reinteraction / scatter points were also a relatively common occurrence.
- Definitely possibilities to thing about – introduce some charge based information to recognise Bragg peaks? Or similarly recognise jumps in charge profile at the vertex? Could very short proton tracks cause trouble though?
- Also saw some backwards going wiggly electrons with vertices at the end of the electron “track”
- Also found a sigma(c)++ displaced vertex @Chris @Niam
- Next steps are to do the same for the numu sample and find some ways to quantify the occurrence rates of common failure topologies.
- CI work is also ticking along, should get to run a full stats test in the next few days to check it actually recognises a change (will use high angle tracking)
- There were issues in the initial vertex error plots due to the simulation of the beam spill. This was resulting in an x-error corresponding to the neutrino’s position in the spill. Corrected for this in the numu sample but not in the nue sample where the error is too large to purely be as a result of the spill width. Resimulated nue to remove this issue.
- Lots of plots but takeaways are:
- Nue is now much closer matched with numu (68.7% success vs. 71.3% success)
- Breaking into three energy bins doesn’t point to any particular weaknesses
- Definite correlation between vertex error and completeness, dE/dx etc
- Have started leafing through event displays to try and find some common topologies. Starting with the worst events (error > 5cm)
- Have also spent a chunk of time working on implementing some reconstruction test plots into the SBND CI. Machinery of it is working, tinkering with the details on the sample, exactly which plots etc.
- more marking…
- Have started looking into the performance of Pandora’s vertexing.
- Using the bnb and nue official samples from last year (no cosmics)
Started by just looking at whether we’re picking the correct vertex most of the time or not.
- Looking through the results event-by-event I realised there were quite a lot of events where technically the closest vertex wasn’t being picked but the margin was very small. Thought I’d make some more nuanced categories…
- Is 1cm sensible? Thoughts on “success 2” ?
- Went through some metrics for reco further down the chain that show improvement when the vertex error is better (slice completeness, muon track length, electron dE/dx). Big question mark over whether this is correlation or causation…
- Will look at a few more metrics and then have a look at cheating to see if I can begin digging into the correlation vs. causation question.
- Other things: got my new laptop on Friday (exciting!), went to the SBND collaboration meeting last week, PGTAing again this term.
- Bit of a slow week, had a test and travelled back home. Worksheet marking took an absolute age this time but that’s the last one until Feb.
- Did a surface look into the effects on the unambiguous cosmic pass as brought up in discussion last week. No glaring negative effects to worry us.
- The PR has been submitted now and I’ll be presenting a summary of all the hit width clustering work at the PAT meeting on Thursday. Then *fingers crossed* this will be done.
- On the vertexing front I made a start on trying to reproduce some of Dom Barker & Ed’s plots. First one was not what I wanted to see but I have found the reasons for that…
- Started looking at calculating a true dE/dx in the way Ed did. Main questions are on sim::SimChannel object. From what I understand it represents true energy deposits on a single wire? I’m currently trying to get from this to which wire it actually represents, the algorithm keeps getting angry with me, something to do with geometry I think.
- Have finished working on the high angle tracking implementation. Final plots using the official production samples.
- Next, will be starting a study of the impact of the vertex reconstruction on the later phases of reconstruction. Have started looking at re-making some of Ed Tyley’s plots to check I can show the same effects going on.
- It’s been a while! Mainly been working on the high angle tracking implementation.
- Presented at a couple of TPC reco meetings, main takeaways:
- efficiency improvements for both muons and electrons
- also improved completeness with a very small drop in purity
- to try and combine these efficiency and quality improvements I used a definition of reconstructed that had completeness and purity requirements
- Currently producing equivalent analysis for a bnb+cosmics sample for the PAC meeting (sample is 300k so I’m hoping I can get a rough electron plot from that, as well as the muons)
- Also been vaguely flicking through pi0 reconstruction performance in free moments but don’t have anything particularly coherent on that front yet
- Have been working more on the high angle tracking implementation
- Improvements in single particle gun sample translated well to the bnb sample
- Presented the work so far to the TPC reco group on Wednesday and am currently working on some of the points that came out of that.
- Need to look a bit deeper at the shower side of things and have started looking at protons in the bnb sample as well.
- More welcome talks and training sessions
- Spent most of the week trying to work out why the hit width cluster merging algorithm was having literally zero effect despite claiming to be running… Turns out if the “UseHitWidth” flag is set to false then the hit width algorithm doesn’t really work… who’d have guessed?!
- Once we had found the flag then things started working and there are definite improvements:
- Integrated reconstruction efficiency for muons 96.1% –> 96.8%. Checked it across other variables, e.g. momentum:
- Wanting to look at completeness as well, just waiting on my analyzer to finish. There were noticeable cases of a muon track split into multiple clusters (this would still count as ‘reconstructed’ in the efficiency calculation) so hopefully the completeness will also have improved!
- Will also look at showers this afternoon/tomorrow
- Am officially a PhD student now! :D
- We had a kind of welcome talk thing on Friday and have PGTA training tomorrow.
- In between welcome-y things I’ve been working on implementing a high angle tracking algorithm into sbndcode. Made a couple of silly errors that cost me a chunk of time but should have the first sample finished by the end of the day
- Remade the pi zero reconstruction plots with the PhotonOne & PhotonTwo definitions updated to representing leading and subleading respectively. This shows the expected effect with significantly more leading showers reconstructed. Currently updating my code to add hit completeness and purity to my trees.
- Just wanted to check I have the definitions correct for these:
- If I have a Shower, X, truth matched to a MCParticle, Y
- Completeness = (No. of hits in X attributed to Y)/(No. of hits in whole event attributed to Y)
- Purity = (No. of hits in X attributed to Y)/(Total no. of hits in X)
- Began having a look at the spike in the muon reconstruction graph. If I bin more finely then it looks more like a wobble, a bit like in my MPhys plot.
- Not much to say, spent all of last week finishing off the C++ course which I think was worthwhile. So haven’t done any more on the physics work, will work on last week’s discussion points this week.
- Have started working my way through the Fermilab C++ course from August. I’ve only ever learned C++ from google and online tutorials before, so its proving very useful!
- Did some more work on the CC1pi0 selection in August
- Thought it would be reasonable to relax the truth momentum thresholds to 0.15 GeV/c
- Also tinkered with the fiducial volume requirements
- Pre-selection cuts on showers don’t need changing, if we’re maximising for eff*pur
- Started looking into the weird track scatter values, think it might just be a result of the big differences in the distance between trajectory points in truth vs. reco. Will look further into that to confirm.
- As a starting point to looking into the limitations on the pi0 side, I produced these plots but I haven’t gone any further with this yet.
- (Sorry for the lack of plots, I have exported them as pdfs and can’t upload them onto this, I’ll sort that for next time).
- Most of the week spent dealing with technical issues. Had some events missing when I updated my analyzer. Then had a set that all had RunID=1 and SubRunID=1, this made looking at event displays etc impossible but I have managed to do some physics…
- Some tinkering with the cuts/MVAs hasn’t created any noticeable changes in the selection.
- I haven’t tuned the shower pre-selection cuts yet but despite the improvements in reconstruction they still seem to be worth keeping (removes 56.8% of ‘proton’ showers and only 2.6% of signal).
- I have (a small numbers of) muon candidates with a truth matched pdg code of 22? Haven’t had a chance to dig into this yet but photons shouldn’t be leaving tracks….
- My invariant mass distribution shows a larger energy underestimation than in the previous analysis. The fit gives 95.3MeV/c^2 compared to 106.6MeV/c^2. Using reconstructed direction and true energy still produces a distribution centred on 135MeV/c^2
- Presenting tomorrow at SBN Event Selection Meeting (I think?) In the process of making slides and need to decide how much to focus on the old data and how much to focus on the new stuff. (Old data has better logic for cut decisions etc).
- Taking a week off next week (WC 17th) to see my parents.
- Made the 100,000 events
- Ran my MPhys analyzer on the new sample
- Only got the trees on Friday so haven’t done anything detailed yet
- Headline changes are efficiency is down (57.5%->50.2%) but purity is up (66.0%->70.9%). Probably the right direction for SBND!
- Can’t read too much into this or the cuts breakdown as I haven’t tinkered with any of the cuts or the MVAs for this sample
- The reconstruction efficiency studies show some drastic changes though. The total reconstruction efficiency for most particles is pretty similar to the old sample. However, the track-shower discrimination for protons and charged pions has drastically increased!
- All been technical stuff
- Finished setting up my Fermilab account
- Installed ubuntu on my laptop (gulp!)
- Dom (thanks!) led me through using the batch system for submitting jobs and I have done a test run of 200 events through GENIE, g4, detsim and reco (including a new shower approach)
- Copied over my analysis module from my MPhys – took a bit of tinkering to get it working again on the new files, but it seems to now
- Next steps
- Create proper pools of events?
- Run the analysis and compare results to those with the previous reconstruction