You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the example above, Amazon product data is read from Amazon S3 into a distributed `Modin data frame <https://modin.readthedocs.io/en/stable/getting_started/why_modin/pandas.html>`_.
49
+
In the example above, New York City Taxi data is read from Amazon S3 into a distributed `Modin data frame <https://modin.readthedocs.io/en/stable/getting_started/why_modin/pandas.html>`_.
52
50
Modin is a drop-in replacement for Pandas. It exposes the same APIs but enables you to use all of the cores on your machine, or all of the workers in an entire cluster, leading to improved performance and scale.
53
51
To use it, make sure to replace your pandas import statement with modin:
"[](https://github.com/aws/aws-sdk-pandas)"
8
12
]
9
13
},
10
14
{
11
15
"cell_type": "markdown",
12
-
"metadata": {},
16
+
"metadata": {
17
+
"pycharm": {
18
+
"name": "#%% md\n"
19
+
}
20
+
},
13
21
"source": [
14
22
"# 29 - S3 Select"
15
23
]
16
24
},
17
25
{
18
26
"cell_type": "markdown",
19
-
"metadata": {},
27
+
"metadata": {
28
+
"pycharm": {
29
+
"name": "#%% md\n"
30
+
}
31
+
},
20
32
"source": [
21
33
"AWS SDK for pandas supports [Amazon S3 Select](https://aws.amazon.com/blogs/aws/s3-glacier-select/), enabling applications to use SQL statements in order to query and filter the contents of a single S3 object. It works on objects stored in CSV, JSON or Apache Parquet, including compressed and large files of several TBs.\n",
22
34
"\n",
@@ -32,172 +44,28 @@
32
44
},
33
45
{
34
46
"cell_type": "markdown",
35
-
"metadata": {},
47
+
"metadata": {
48
+
"pycharm": {
49
+
"name": "#%% md\n"
50
+
}
51
+
},
36
52
"source": [
37
53
"## Read multiple Parquet files from an S3 prefix"
38
54
]
39
55
},
40
56
{
41
57
"cell_type": "code",
42
58
"execution_count": 1,
43
-
"metadata": {},
59
+
"metadata": {
60
+
"pycharm": {
61
+
"name": "#%%\n"
62
+
}
63
+
},
44
64
"outputs": [
45
65
{
46
66
"data": {
47
-
"text/html": [
48
-
"<div>\n",
49
-
"<style scoped>\n",
50
-
" .dataframe tbody tr th:only-of-type {\n",
51
-
" vertical-align: middle;\n",
52
-
" }\n",
53
-
"\n",
54
-
" .dataframe tbody tr th {\n",
55
-
" vertical-align: top;\n",
56
-
" }\n",
57
-
"\n",
58
-
" .dataframe thead th {\n",
59
-
" text-align: right;\n",
60
-
" }\n",
61
-
"</style>\n",
62
-
"<table border=\"1\" class=\"dataframe\">\n",
63
-
" <thead>\n",
64
-
" <tr style=\"text-align: right;\">\n",
65
-
" <th></th>\n",
66
-
" <th>marketplace</th>\n",
67
-
" <th>customer_id</th>\n",
68
-
" <th>review_id</th>\n",
69
-
" <th>product_id</th>\n",
70
-
" <th>product_parent</th>\n",
71
-
" <th>star_rating</th>\n",
72
-
" <th>helpful_votes</th>\n",
73
-
" <th>total_votes</th>\n",
74
-
" <th>vine</th>\n",
75
-
" <th>verified_purchase</th>\n",
76
-
" <th>review_headline</th>\n",
77
-
" <th>review_body</th>\n",
78
-
" <th>review_date</th>\n",
79
-
" <th>year</th>\n",
80
-
" </tr>\n",
81
-
" </thead>\n",
82
-
" <tbody>\n",
83
-
" <tr>\n",
84
-
" <th>0</th>\n",
85
-
" <td>US</td>\n",
86
-
" <td>52670295</td>\n",
87
-
" <td>RGPOFKORD8RTU</td>\n",
88
-
" <td>B0002CZPPG</td>\n",
89
-
" <td>867256265</td>\n",
90
-
" <td>5</td>\n",
91
-
" <td>105</td>\n",
92
-
" <td>107</td>\n",
93
-
" <td>N</td>\n",
94
-
" <td>N</td>\n",
95
-
" <td>Excellent Gift Idea</td>\n",
96
-
" <td>I wonder if the other reviewer actually read t...</td>\n",
97
-
" <td>2005-02-08</td>\n",
98
-
" <td>2005</td>\n",
99
-
" </tr>\n",
100
-
" <tr>\n",
101
-
" <th>1</th>\n",
102
-
" <td>US</td>\n",
103
-
" <td>29964102</td>\n",
104
-
" <td>R2U8X8V5KPB4J3</td>\n",
105
-
" <td>B00H5BMF00</td>\n",
106
-
" <td>373287760</td>\n",
107
-
" <td>5</td>\n",
108
-
" <td>0</td>\n",
109
-
" <td>0</td>\n",
110
-
" <td>N</td>\n",
111
-
" <td>Y</td>\n",
112
-
" <td>Five Stars</td>\n",
113
-
" <td>convenience is the name of the game.</td>\n",
114
-
" <td>2015-05-03</td>\n",
115
-
" <td>2015</td>\n",
116
-
" </tr>\n",
117
-
" <tr>\n",
118
-
" <th>2</th>\n",
119
-
" <td>US</td>\n",
120
-
" <td>25173351</td>\n",
121
-
" <td>R15XV3LXUMLTXL</td>\n",
122
-
" <td>B00PG40CO4</td>\n",
123
-
" <td>137115061</td>\n",
124
-
" <td>5</td>\n",
125
-
" <td>0</td>\n",
126
-
" <td>0</td>\n",
127
-
" <td>N</td>\n",
128
-
" <td>Y</td>\n",
129
-
" <td>Birthday Gift</td>\n",
130
-
" <td>This gift card was handled with accuracy in de...</td>\n",
131
-
" <td>2015-05-03</td>\n",
132
-
" <td>2015</td>\n",
133
-
" </tr>\n",
134
-
" <tr>\n",
135
-
" <th>3</th>\n",
136
-
" <td>US</td>\n",
137
-
" <td>12516181</td>\n",
138
-
" <td>R3G6G7H8TX4H0T</td>\n",
139
-
" <td>B0002CZPPG</td>\n",
140
-
" <td>867256265</td>\n",
141
-
" <td>5</td>\n",
142
-
" <td>6</td>\n",
143
-
" <td>6</td>\n",
144
-
" <td>N</td>\n",
145
-
" <td>N</td>\n",
146
-
" <td>Love 'em.</td>\n",
147
-
" <td>Gotta love these iTunes Prepaid Card thingys. ...</td>\n",
0 commit comments