Jan23 Bloomington
From Polargrid
Meeting Notes: Main Session
- Camp Hardware and Networking
- Rich has ordered equipment (pan toughbook laptops). Some will arrive today. Need info on GPS system to use.
- 26 TB/mission. UAV mission is postponed to 2009, so ~12TB per mission.
- Typically make redundant copies of data and ship in two separate ways.
- Rich: need to know the workflow process for shipping.
- Overview of the system: 10 laptops, storage server. Laptops have external drives, copied over to the storage server.
- What is purpose of the GPS? Don't need to know where the processing station is. Need location of base camp.
- Need ground differential gps for the base station.
- Je'Aime: can we go through the whole workflow to make sure we have the entire process defined?
- Linda: Are we shipping the data or can we send it more quickly? Shipping hard drives or is there network? Network connection from base camps is not really going to be fast enough.
- Rick: 9600 baud increments with iridium phones. What are the band width requirements for logging in remotely from L48 into the field machine.
- Can do all of this in the field and log in remotely from outside. Will need 256Kb to get a decent interaction.
- VSAT or BGAN can be used for networking. KU Ban services. In any case, interactions need to be planned for remote login to base camp.
- BGAN cost: $4000 in equipment, plus pay per bit sent.
- VSAT: $25,000 but less charge for data transmission.
- Campaign length: 2 months. Collecting data about 1 month. So need to maximize bandwidth at lowest cost for this time.
- Tele**Greenland is the provider for both vsat and bgan.
- What is the importance of having others view data during collection? Important to get feedback to field as soon as possible.
- For example, can make pre**processed jpg images of data in the field and send these back.
- Can't rely on transporting hard drives during the mission. Only at the end.
- Vico Polar is the logistics provider? They provide COMMS (communication voice and data). NSF paying for this. Rick will find out what they have been contracted to provide.
- Can we use BGAN/VSAT for backup to l48? No, doesn't compress well.
- Main interaction is to get back images to l48 and simple analysis back to the l48.
- Matt: you should push images back from the field to the l48 stations for the portal. The only 'portal' functions that should run in the field would be adhoc analysis on the data.
- JPG takes 1.5 minutes to send over iridium. <1MB per image typically. Could compress even more.
- GCF: probably worth spending $10,000 extra for good bandwidth.
- How to communicate with the field? Skype**like interactions? Chat, etc.
- Would you alter the campaign based on some new finding. Possibly from the initial airborne survey, this would change your plans for the ground survey. But probably not from the ground survey.
- May also need to review previous day's work, see if it is screwed up and needs to be redone.
- Matt: would like to order all the equipment by Monday.
- Field camp versus base camp? BC has more computers. Where is BC? BC is fixed and FCs move. But is this true? Need to make sure that this is true. May be embellished for proposal, so need to know what we need for May 2008.
- What is the diff between field and base? Field kit has laptops and one real machine. BC has 64 core cluster. This makes sense for a traverse, which is depicted in the figure. But for the current 2008 campaign, will only have the base camp, or actually have a combination of the FC and BC equipment.
- Linda: will you spend the night out in the field? Possibly. Depends on the campaign. Long traverse versus grid.
- JP: Networking between field and base camps? Line of sight is not workable. UAV was planned to be used but this may not happen now.
- GCF: Do you need just laptops when you go out, with backups at the Base Camp? Depends on type of campaign. Laptop plus base camp model is preferable.
- JP: 26 TB of data, so at least 52 TB for redundant data. Probably more like 30 TB this year. Main issue is power to drive the storage. Need to set system requirements to understand data options. But you don't need all 30TB on power all the time. 1/2 of it is backup, so doesn't need to be online all the time.
- Matt: in Antarctica, what were your power requirements? If going to a place (permanent camp) with established power then it doesn't matter. But if not, then we need to worry about the power consumption for the storage system. Offloading 2/day.
- FTP issues: ftp'd directly from radar to laptop. Won't do this this time.
- How to field test? KU has a temperature chamber to test.
- How many laptops? We have budget for 10. Rich or Matt will be the point of contact.
- Are clusters rack mounted units? Yes. They are in a crate pre**packaged. Safety orange.
- Any other software? No, just linux stuff****diff, md5, etc.
- What about Windows? Can do this, or can do dual boot.
- 4 or 5 laptops total.
- Is toughbook way to go? Probably want both state of the art laptop and a toughbook.
- Logistics Discussion with David at KU
- Will be based in a town in Greenland: Ilulissat.
- Ground based adventures are still being discussed. Two options for the field: one is to go to Nene (?) as guests of Danish team. Test the radars. Team in this case is very small (1**3 people). Second option is transverse trips.
- What is available in the base camp? Assume zero computing and zero networking. Think along lines of a hotel room. Will be power but not much else. Will need to bring everything else.
- Rick: Vico Polar has done a logistics for NSF. Worked with them? Yes. Rick: can we talk to Vico and SRI about networking? KU: sure, but all official requests should go through KU.
- Matt: definitely will be an air campaign. Will take this back to the hotel for processing. What are the hardware requirements for that? KU: Carl will help here.
- KU: assume you have a tent with a generator for the base camp, and that's it.
- KU: we will provide generators and ship. East Antarctica next year, so generator will need to work at **20 Celsius. KU using their own generators for several years. Honda etc are fine. KU will vet any selections that we go through.
- 300 KM trip is 10 days each way. If they go on a traverse trip, will need some computing power in the field. Vehicle, generator, snow mobiles. Fuel is an issue. When will we know if you will do the transverses? Will know soon, have to check with the Danes.
- A traverse may involve how much data? 20 days, 300 GB per day. 6TB for a traverse. May only take data one way, though.
- Craig: what is the smallest number of processing cores you need? Carl: 1 is minimum, 8 cores would be best. Toughbooks have 2 GB memory, 2 cores, are a generation behind. Can get laptops with 8 cores, more memory, but won't be Toughbooks.
- Customs issues? No, the US Air National guard ships.
- Data Analysis (Matlab) Software
- Need to understand disk and memory requirements for analysis.
- What has been spec'd? External TB drives that can plug into the server.
- Two separate systems: archival and processing. 10 TB RAID. Process 300 GB data at a time.
- William: would like 60 GB of memory.
- How many licenses? Will need a copy on each laptop as well as copies on the cluster. Given the costs are not too high, all machines should have matlab installed. Matlab has a special distributed version for clusters.
- Matlab licenses include group licenses.
- For field grid, only one person would use.
- Who owns title to the physical equipment? IU owns it. Then we can easily use IU matlab agreements to install.
- Laptops, 8 core mobile "cluster", 64 core machines: what do we need for each? Only signal processing and filter design are used. SIP is most important.
- Need a complete list of toolboxes from KU.
- Although IU owns the hardware, KU can possess laptops, etc for configuring. IU can then clone.
- Should we have a license key for each node and avoid the license manager? No, not necessary any more.
- Cloning disks an issue for the Matlab licenses? No, probably not, as long as they don't talk to each other.
- JP: how about a CD/DVD with all software?
- William: will provide a list of software.
- Portal
- NWD has a .25 FTE, will be available April 1.
- Who is responsible for what? Need to set up clear set of requirements.
- Craig: committed to building the portal. Must steer clear of NSF issues, however. Use Nancy's 1/4 FTE for Cresis, use Craig's money for Polar Grid.
- Can use Nancy's time for training. This is also ECSU's priorities.
- TG money can't be used for development, but UITS funds can support this.
- Nancy's money can be shipped to support this? Need to see if she has money or a person or both.
- Requirements for communicating with the field are possibly easy to define. Need a) a training portal, and b) a field portal.
- Use Nancy's money for the training portal.
- Networking
- Optimal way of networking field and base camps to the lower 48.
- Training
- Data Management
- Not ready to discuss until the campaign is ready to go.
- "Grid" issues
- Need to make sure that all the data has been successfully transferred. Rsync.
- Je'aime: Only radar data or also seismic data? Seismic/UAV will possibly be on hold, so don't know yet if seismic data will be collected. But radar will be much greater in size. Seismic part of mission will be postponed until 2009.
- Simulation
- Delay discussions of modeling
- Central System
- Dreams deferred
- SubGroup Discussions Planning
- Breakouts: Portal and training, hardware, parallelization
- Hardware: What is a FC? It is an 8 core "desktop" plus laptop(s). No constraints on base cluster.
- General Training Issues: any independent of the portal? Yes, need to see how to matlab.
- Matlab includes built in MPI. Craig: also have licenses for StarP. Starp allows you to easily do simple things and is that is often used for parallelization.
- Craig: get TG accounts. William: we have TG accounts.
- William: can TG run matlab? Craig: we will take care of this by making you IU affiliates.
- Parallel matlab through the batch queuing system? May want to do this interactively. Craig will worry about how best to report this for the TG.
- Part of PolarGrid work will be to parallelize KU codes.
- Need to debate requirements between MPI and Condor style parallelization strategies for matlab. Can Matlab distributed computing toolbox help? Need to look into this. GCF: concerned that a single node holds the entire memory (hence extremely large memory requirements).
- Craig: Distributed computing toolbox versus StarP for parallelization?
- What is "it"? Parallelization scheme. Coarse grained parallel can be done with StarP. Distributed toolkit may do the harder fine grained case. StarP also has better licensing terms.
- Post Breakout Discussion
- Should contact Mathworks to let them know we are doing on**demand, urgent science on the TeraGrid to save the world with their software.
- There will be a second group tomorrow for parallelization work.
- Data coming out of the data archive should be an RSS feed. Matlab codes should generate these xml fragments.
- Should also be able to search the fragments for interesting things.
- Encode the metadata in microformats and wrap the microformat stuff in RSS feeds.
- Very rough estimate of data: 20 MB per day of jpeg images for 10 days. So do this on demand or just send everything back to L48 or ....
- Need to get KU to list the metadata that they need so we can design a microformat and RSS feed prototype.
- There are no real security concerns but we should provide this anyway as an option.
- Data will be processed by both the field and base servers. We need to rsync data (small jpegs and metadata) back to the L48 server.
- Would also be good to have a really small portal for the field to look at and manage the generated RSS feeds.
- Plans for Thursday
- Work on parallelization
- Portal group will focus on integrating ku, ecsu, iu resources
- Hardware group will refine choices.
- Need to make sure that polar grid mailing list includes all the appropriate people. Geoffrey will give everyone 24 hours to object to plans before approving.
- See https://cresis.ku.edu/about/people.html for a list of CRESIS suspects to add.
- Thursday, Feb 21 will be the CRESIS video conference.
- TeraGrid accounts
- Jefferson Davis will talk about Matlab.
Portal Breakout
Polar Grid Portal and Training Discussion
- Diverse group of users for training gateway.
- Cresis wants ECSU to focus on graduate students and their teragrid usage.
- Parallelizing code. - Grad students and upper level users
- ECSU + ADME: Will have novice users--faculty and undergrads. Will need training in using grid and polar science tools.
- Part of CITEAM
- Outreach: K-12 students
- Sangmi: provide data sets and capabilities or just training materials.
- Linda: will need to provide canned data sets and analysis examples. Grad level will need more sophisticated help.
- Will also need content on polar science for general outreach.
- ECSU: Has summer school for 8 weeks that includes students from several universities. Also have ECSU courses during the year.
- Little Fe: could provide examples.
- Two tracks: a) polar science track, concentrates on data and analysis; b) computer science/IT track which concentrates on Grid and MPI stuff.
- Timeframe? New group of undergraduates coming in the summer. Would also like to do the CRESIS component.
- Highest priority: CRESIS grad student and summer undergrads. Already enough K-12 training.
- Requirements:
- Content management - Content development - Grid functionality - Collaboration
- Need to evaluate tools for this.
- May be good to get Diane Baxter involved.
- TeraScan: Minjun is working with this. Captures CWIFS (?) datasets, transforms raw data to images.
- Minjun is our prototype Grid science user.
- Most users are educational. Also used by recreational boaters and fishermen.
- TeraScan runs on Linux. Can possibly do this with condor on the ECSU cluster.
- Rick: need to adopt a professional attitude for packaging and distributing the data products.
- Action Items
- Design a portal for undergrads that will provide CIMA style data access and perhaps some simple processing. ECSU, KU to provide data. Can do this with KU icesheet data. Carl and Tobie from KU are needed to help with providing sample data sets.
- Content management tool for maintaining useful information, links, etc. Use this to manage links to available resources and such. This can be done with del.icio.us, blogs, and so on. Can embed resulting RSS feeds into the portal. o Grad students should all contribute to a single blog. o Undergrads should create their own. o Training material, training events for grad students. o For undergrads, need virtual classroom.
- Grad Students: Penn State (Sridhar Anandakrishnan) and KU (Prasad G.) will be using the Grid stuff most actively, will be first users.
- Undergrads: See http://cerser.ecsu.edu/citeam/personnel.html. Will get more concrete requirements at the Feb meeting. ECSU's CITEAM partners will be there. So we will put together a prototype content management portals.
- Can also build an ADMI tutorial. 3-5 April. Prepare a 90 minute tutorial on portals and cyberinfrastructure for this.
- Feb meeting could be virtual via Polycom or so on.
