For the purposes of submitting an amicus brief to the US Supreme Court, the Puerto Rico Association of Criminal Defense Lawyers (PRACDL) compiled several indictments and docket sheets from the PACER system. Data from these documents were extracted and analyzed with sociodemographic data from the US Census. Nevertheless, there is still an opportunity to continue to analyze the remaining data to present a visual representation of not only the type of cases seen in this court but also the length of time that the case is "open", the percentage of persons represented by a court-appointed attorney, the average length of sentences, the number of persons granted bail, the number of persons with bail violations and the reasons for those violations, among others. An understanding of these data will facilitate related future social justice projects in this jurisdiction.
Dr. Li has been collecting covid-19 tweets since March 2020 and currently has about 1.2 billion tweets. She is still collecting the tweets and expects to have more in the future. This project focuses on the understanding of the impact of covid-19 pandemic through social media discussion on Twitter. The following topics will be explored: 1). What are the top topics discussed regarding covid-19? How has the discussion of the topics changed over time? 2). What is sentiment/emotion of the topic by time, location, and gender? and 3). How to identify misinformation/fake news about covid-19.
The student will work on this project from start to finish using various data analytic methodology including data exploration, topic modelling, natural language processing and machine learning.
Machine failure and downtime was considerably low for less sophisticated machines developed during the first two industrial revolutions. Modern manufacturing facilities use highly complex and advance machines that require continuous health monitoring systems. Bearings are widely used in rotating equipment and machines to support load and to reduce friction. The presence of micron sized defects on the mating surfaces of the bearing components can lead to failure through a passage of time. Bearing health can be monitored by analyzing vibration signals acquired using an accelerometer and developing a machine learning framework for feature extraction and classification of the bearing conditions. The large size defects on bearing elements can be detected/identified by time domain and frequency domain analysis of its vibration signals. However, it becomes difficult to detect local bearing defects at their initial stage either due to their smaller size or presence of noise. In the proposed project, detection of local defects like crack and pits on bearing races will be carried out using machine learning. As a pilot project, simulated data of bearing conditions will be generated from MATLAB Simulink models and used for developing machine learning based predictive maintenance and condition monitoring algorithms. The trained model will be evaluated against the real bearing data and ground truth results. The project will be first implemented on a local machine and once successfully developed, will be ported to a cluster.
The machine learning frame work will include functions for exploring, extracting, and ranking features using data-based and model-based techniques, including statistical, spectral, and time-series analysis. The health of bearings will be monitored by extracting features from vibration data using frequency and time-frequency methods. A student will learn how to organize and analyze sensor data imported from local files, cloud storage, and distributed file systems. The student will learn the complete machine learning project pipeline from data importing, filtering, feature extraction, data distribution, training, validation and testing of multiple machine learning algorithms and working with the clusters. The developed machine learning pipeline will be shared with the research community and the work will be published in a conference proceeding. The project requires MATLAB toolboxes for signal processing, machine learning, predictive maintenance, statistical analysis and deep learning. The future work of the project includes a large datasets of real bearing data and simulated data for predictive maintenance of the bearing using cluster-based machine learning framework. The estimated defect sizes will be predicted, compared and validated through measured actual crack width or pit diameter.
|CaRCC Data Facing Track||Website||data-access-protocols, data-analysis, data-compliance, data-lifecycle, data-management, data-management-software, data-provenance, data-reproducibility, data-retention, data-transfer, data-wrangling, hpc-storage, storage||Beginner, Intermediate, Advanced, Expert|