Documenting existing data files of DaanMatch about what data exists, where, who owns it, who uses it, and how to request access.
We will be connecting to our files stored on AWS S3. Please set up your AWS CLI.
- Download AWS CLI and Configure AWS Key and Secret
pip install boto3
pip install s3fs
Load data from s3 Tutorial by soumilshah1995
client = boto3.client('s3')
obj = client.get_object(Bucket='daanmatchdatafiles', Key=FILEPATH)
# Excel
df = pd.read_excel(io.BytesIO(obj['Body'].read()))
# CSV
df = pd.read_csv(io.BytesIO(obj['Body'].read()), low_memory=False)
- Follow the format in folder.
- Identify any issues with the dataset i.e. missing/invalid/duplicate values and summary statistics/distribution of each column if available. Include instructions on how to address the issues by dropping/imputing missing values, transformations (e.g. change units/dtype) etc.
- Keep the format uniform until
Columns
. You have the flexibility to present summary statistics/distributions in whichever format you think best. - Document procedure with comments/markdown.
- Help us identify whether some raw data files are duplicates of each other.
- Summarize notebook onto google doc
- Add completed notebooks and HTML folder to the repository.
- Move issue to Review column in Projects.
Each folder contains the raw data + notebook, html, pdf version of its corresponding Jupyter notebook.
- Closed_during_the_month_(Registeration_Closure)_1.xls
- Consolidated_NGO_addresses.xlsx
- Consolidated_NGO_list.xlsx
- Copy of Online Donations For COVID In Pakistan (1).xlsx
- CSR 2016_2017.xlsx COMPLETE
- CSR Spent 17-18.xlsx COMPLETE
- CSRExpenditureDetails_2015_16_29042017.xlsx
- Dadra & Nagar Haveli.xls COMPLETE
- Expenditure_Gov_India_2017-18_2019-20.csv COMPLETE
- Final_Data_csr.gov.in.xlsx COMPLETE
- Goa proforma_panchayat.xlsx COMPLETE
- RAWCosolidated NGO list.xlsx
- Andaman_Nicobar_Islands_2016.xlsx COMPLETE
- Andhra_Pradesh_2016.xlsx
- Arunachal_Pradesh_2016.xlsx
- Bihar_2016.xlsx
- Chandigarh_2016.xlsx
- Chattisgarh_2016.xlsx
- Dadar_Nagar_Haveli_2016.xlsx COMPLETE
- Daman_and_Diu_2016.xlsx COMPLETE
- Goa_2016.xlsx
- Gujarat_2016.xlsx
- Haryana_2016.xlsx
- Himachal_Pradesh_2016.xlsx
- Jammu_and_Kashmir_2016.xlsx
- Jharkhand_2016.xlsx
- Karnataka_2016.xlsx
- Kerala_2016.xlsx
- Lakshadweep_2016.xlsx
- Madhya Pradesh_2016.xlsx
- Maharastra_2016.xlsx
- Manipur_2016.xlsx
- Meghalaya_2016.xlsx
- Mizoram_2016.xlsx
- Nagaland_2016.xlsx
- Odisha_2016.xlsx
- Puducherry_2016.xlsx
- Punjab_2016.xlsx
- Rajasthan_2016.xlsx
- Tamil_Nadu_2016.xlsx
- Telangana_2016.xlsx
- Tripura_2016.xlsx
- Uttar_Pradesh_2016.xlsx
- Uttarakhand_2016.xlsx
- West_Bengal_2016.xlsx
- 2019Final_Data_ngodarpan.gov.in.xlsx
- 42621 Final_Data_ngodarpan.gov.in.xlsx COMPLETE
- Consolidated_NGO_list.csv
- FCRA - Sheet1.csv
- Final_Data_givingtuesdayindia.org.xlsx
- Final_Data_Globalgiving.org.xlsx
- Final_Data_Indiangoslist_v1.com.xlsx
- Final_Data_ngodarpan.gov.in.xlsx
- Final_Data_ngoimpact.com.xlsx
- Assam GP.xls
- Andhra Pradesh Gram Panchayat.xlsx COMPLETE
- Bihar Gram Panchayat.xlsx
- Chhattisgarh Gram Panchayat.xlsx
- biharselection.xlsx
- DarpanBihar3192020.xlsx
- Districts-07-.csv COMPLETE
- Districts--.csv COMPLETE
- Districts-20-.csv COMPLETE
- Districts-10-.csv COMPLETE