Player FM - Internet Radio Done Right
32 subscribers
Checked 8M ago
Toegevoegd vier jaar geleden
Inhoud geleverd door PyTorch, Edward Yang, and Team PyTorch. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door PyTorch, Edward Yang, and Team PyTorch of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.
Player FM - Podcast-app
Ga offline met de app Player FM !
Ga offline met de app Player FM !
Podcasts die het beluisteren waard zijn
GESPONSORDE
Y
Young and Profiting with Hala Taha (Entrepreneurship, Sales, Marketing)


1 Reid Hoffman: LinkedIn Co-Founder on Building and Scaling Massively Valuable Companies Fast | Entrepreneurship | E332 51:40
Despite having a strong product idea, Reid Hoffman’s first startup collapsed, forcing him to return investors’ capital. This tough experience reshaped his approach to entrepreneurship. By embracing failure, iterating quickly, and adapting relentlessly, he went on to become a leader at PayPal and later, the co-founder of LinkedIn. In this episode, Reid shares the concept of blitzscaling, which prioritizes speed over perfection, smart strategies for taking risks, and insights on achieving rapid market dominance. In this episode, Hala and Reid will discuss: (00:00) Introduction (01:32) Building Impact-Driven Businesses (02:56) Why We Need More Entrepreneurs (04:31) The Vision Behind LinkedIn’s Success (06:43) Lessons from a Failed Startup (09:26) Making Quick, Intense Decisions at PayPal (12:39) Blitzscaling: Prioritizing Speed Over Efficiency (18:10) Maintaining Company Culture While Scaling (21:20) The Power of Early Market Dominance (25:01) The Five Stages of Company Growth (28:54) Strategies for Taking Intelligent Risks (31:44) Why Product Perfection Delays Success (33:25) Pivoting Early to Seize New Opportunities (36:18) Entrepreneurship as a Team Sport Reid Hoffman is an entrepreneur, investor, partner at Greylock, and co-founder of LinkedIn and Inflection AI. He was an executive at PayPal and a founding investor in several companies, including OpenAI. Reid actively supports various non-profits and has received numerous accolades, including an honorary CBE from the Queen of England and the Salute to Greatness Award from the Martin Luther King Jr. Center for his philanthropic efforts. Resources Mentioned: Reid’s Book, Blitzscaling: The Lightning-Fast Path to Building Massively Valuable Companies : amzn.to/4jnQkfQ Sponsored By: OpenPhone - Get 20% off 6 months at openphone.com/PROFITING Shopify - Sign up for a one-dollar-per-month trial period at youngandprofiting.co/shopify Airbnb - Your home might be worth more than you think. Find out how much at airbnb.com/host Rocket Money - Cancel your unwanted subscriptions and reach your financial goals faster with Rocket Money. Go to rocketmoney.com/profiting Indeed - Get a $75 job credit at indeed.com/profiting RobinHood - Receive your 3% boost on annual IRA contributions, sign up at robinhood.com/gold Active Deals - youngandprofiting.com/deals Key YAP Links Reviews - ratethispodcast.com/yap Youtube - youtube.com/c/YoungandProfiting LinkedIn - linkedin.com/in/htaha/ Instagram - instagram.com/yapwithhala/ Social + Podcast Services: yapmedia.com Transcripts - youngandprofiting.com/episodes-new All Show Keywords: Entrepreneurship, entrepreneurship podcast, Business, Business podcast, Self Improvement, Self-Improvement, Personal development, Starting a business, Strategy, Investing, Sales, Selling, Psychology, Productivity, Entrepreneurs, AI, Artificial Intelligence, Technology, Marketing, Negotiation, Money, Finance, Side hustle, Startup, mental health, Career, Leadership, Mindset, Health, Growth mindset. Career, Success, Entrepreneurship, Productivity, Careers, Startup, Entrepreneurs, Business Ideas, Growth Hacks, Career Development, Money Management, Opportunities, Professionals, Workplace, Career podcast, Entrepreneurship podcast…
Double backwards
Manage episode 296913575 series 2921809
Inhoud geleverd door PyTorch, Edward Yang, and Team PyTorch. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door PyTorch, Edward Yang, and Team PyTorch of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.
Double backwards is PyTorch's way of implementing higher order differentiation. Why might you want it? How does it work? What are some of the weird things that happen when you do this?
Further reading.
- Epic PR that added double backwards support for convolution initially https://github.com/pytorch/pytorch/pull/1643
83 afleveringen
Manage episode 296913575 series 2921809
Inhoud geleverd door PyTorch, Edward Yang, and Team PyTorch. Alle podcastinhoud, inclusief afleveringen, afbeeldingen en podcastbeschrijvingen, wordt rechtstreeks geüpload en geleverd door PyTorch, Edward Yang, and Team PyTorch of hun podcastplatformpartner. Als u denkt dat iemand uw auteursrechtelijk beschermde werk zonder uw toestemming gebruikt, kunt u het hier beschreven proces https://nl.player.fm/legal volgen.
Double backwards is PyTorch's way of implementing higher order differentiation. Why might you want it? How does it work? What are some of the weird things that happen when you do this?
Further reading.
- Epic PR that added double backwards support for convolution initially https://github.com/pytorch/pytorch/pull/1643
83 afleveringen
Alle afleveringen
×Compiler collectives are a PT2 feature where by compiler instances across multiple ranks use NCCL collectives to communicate information to other instances. This is used to ensure we consistently decide if inputs or static or dynamic across all ranks. See also PR at https://github.com/pytorch/pytorch/pull/130935…
TORCH_TRACE and tlparse are a structured log and log parser for PyTorch 2. It gives useful information about what code was compiled and what the intermediate build products look like.
Higher order operators are a special form of operators in torch.ops which have relaxed input argument requirements: in particular, they can accept any form of argument, including Python callables. Their name is based off of their most common use case, which is to represent higher order functions like control flow operators. However, they are also used to implement other variants of basic operators and can also be used to smuggle in Python data that is quite unusual. They are implemented using a Python dispatcher.…
The post-grad FX passes in Inductor run after AOTAutograd has functionalized and normalized the input program into separate forward/backward graphs. As such, they generally can assume that the graph in question is functionalized, except for some mutations to inputs at the end of the graph. At the end of post-grad passes, there are special passes that reintroduce mutation into the graph before going into the rest of Inductor lowering which is generally aware of passes. The post-grad FX passes are varied but are typically domain specific passes making local changes to specific parts of the graph.…
CUDA graph trees are the internal implementation of CUDA graphs used in PT2 when you say mode="reduce-overhead". Their primary innovation is that they allow the reuse of memory across multiple CUDA graphs, as long as they form a tree structure of potential paths you can go down with the CUDA graph. This greatly reduced the memory usage of CUDA graphs in PT2. There are some operational implications to using CUDA graphs which are described in the podcast.…
The min-cut partitioner makes decisions about what to save for backwards when splitting the forward and backwards graph from the joint graph traced by AOTAutograd. Crucially, it doesn't actually do a "split"; instead, it is deciding how much of the joint graph should be used for backwards. I also talk about the backward retracing problem.…
AOTInductor is a feature in PyTorch that lets you export an inference model into a self-contained dynamic library, which can subsequently be loaded and used to run optimized inference. It is aimed primarily at CUDA and CPU inference applications, for situations when your model export once to be exported once while your runtime may still get continuous updates. One of the big underlying organizing principles is a limited ABI which does not include libtorch, which allows these libraries to stay stable over updates to the runtime. There are many export-like use cases you might be interested in using AOTInductor for, and some of the pieces should be useful, but AOTInductor does not necessarily solve them.…
Tensor subclasses allow you to add extend PyTorch with new types of tensors without having to write any C++. They have been used to implement DTensor, FP8, Nested Jagged Tensor and Complex Tensor. Recent work by Brian Hirsh means that we can compile tensor subclasses in PT2, eliminating their overhead. The basic mechanism by which this compilation works is a desugaring process in AOTAutograd. There are some complications involving views, dynamic shapes and tangent metadata mismatch.…
Compiled autograd is an extension to PT2 that permits compiling the entirety of a backward() call in PyTorch. This allows us to fuse accumulate grad nodes as well as trace through arbitrarily complicated Python backward hooks. Compiled autograd is an important part of our plans for compiled DDP/FSDP as well as for whole-graph compilation.…
We discuss some extension points for customizing PT2 behavior across Dynamo, AOTAutograd and Inductor.
Define-by-run IR is how Inductor defines the internal compute of a pointwise/reduction operation. It is characterized by a function that calls a number of functions in the 'ops' namespace, where these ops can be overridden by different handlers depending on what kind of semantic analysis you need to do. The ops Inductor supports include regular arithmetic operators, but also memory load/store, indirect indexing, masking and collective operations like reductions.…
Traditionally, unsigned integer support in PyTorch was not great; we only support uint8. Recently, we added support for uint16, uint32 and uint64. Bare bones functionality works, but I'm entreating the community to help us build out the rest. In particular, for most operations, we plan to use PT2 to build anything else. But if you have an eager kernel you really need, send us a PR and we'll put it in. While most of the implementation was straightforward, there are some weirdnesses related to type promotion inconsistencies with numpy and dealing with the upper range of uint64. There is also upcoming support for sub-byte dtypes uint1-7, and these will exclusively be implemented via PT2.…
Inductor IR is an intermediate representation that lives between ATen FX graphs and the final Triton code generated by Inductor. It was designed to faithfully represent PyTorch semantics and accordingly models views, mutation and striding. When you write a lowering from ATen operators to Inductor IR, you get a TensorBox for each Tensor argument which contains a reference to the underlying IR (via StorageBox, and then a Buffer/ComputedBuffer) that says how the Tensor was computed. The inner computation is represented via define-by-run, which allows for compact definition of IR representation, while still allowing you to extract an FX graph out if you desire. Scheduling then takes buffers of inductor IR and decides what can be fused. Inductor IR may have too many nodes, this would be a good thing to refactor in the future.…
I talk about VariableTracker in Dynamo. VariableTracker is Dynamo's representation of the Python. I talk about some recent changes, namely eager guards and mutable VT. I also tell you how to find the functionality you care about in VariableTracker ( https://docs.google.com/document/d/1XDPNK3iNNShg07jRXDOrMk2V_i66u1hEbPltcsxE-3E/edit#heading=h.i6v7gqw5byv6 ).…
This podcast goes over the basics of unbacked SymInts. You might want to listen to this one before listening to https://pytorch-dev-podcast.simplecast.com/episodes/zero-one-specialization Some questions we answer (h/t from Gregory Chanan): - Are unbacked symints only for export? Because otherwise I could just break / wait for the actual size. But maybe I can save some retracing / graph breaks perf if I have them too? So the correct statement is "primarily" for export? - Why am I looking into the broadcasting code at all? Naively, I would expect the export graph to be just a list of ATen ops strung together. Why do I recurse that far down? Why can't I annotate DONT_TRACE_ME_BRO? - How does 0/1 specialization fit into this? I understand we may want to 0/1 specialize in a dynamic shape regime in "eager" mode (is there a better term?), but that doesn't seem to matter for export? - So far we've mainly been talking about how to handle our own library code. There is a worry about pushing complicated constraints downstream, similar to torchscript. What constraints does this actually push?…
P
PyTorch Developer Podcast

Mikey Dagistes joins me to ask some questions about the recent recent composability sync https://www.youtube.com/watch?v=NJV7YFbtoR4 where we discussed 0/1 specialization and its implications on export in PT2. What's the fuss all about? What do I need to understand about PT2 to understand why 0/1 specialization is a thing?…
P
PyTorch Developer Podcast

What is torchdynamo? From a bird's eye view, what exactly does it do? What are some important things to know about it? How does it differ from other graph capture mechanisms? For more reading, check out https://docs.google.com/document/d/13K03JN4gkbr40UMiW4nbZYtsw8NngQwrTRnL3knetGM/edit#
P
PyTorch Developer Podcast

Soumith's keynote on PT2.0: https://youtu.be/vbtGZL7IrAw?t=1037 PT2 Manifesto: https://docs.google.com/document/d/1tlgPcR2YmC3PcQuYDPUORFmEaBPQEmo8dsh4eUjnlyI/edit# PT2 Architecture: https://docs.google.com/document/d/1wpv8D2iwGkKjWyKof9gFdTf8ISszKbq1tsMVm-3hSuU/edit#
P
PyTorch Developer Podcast

Join me with Richard Zou to talk about the history of functorch. What was the thought process behind the creation of functorch? How did it get started? JAX’s API and model is fairly different from PyTorch’s, how did we validate that it would work in PyTorch? Where did functorch go after the early user studies? Where is it going next?…
P
PyTorch Developer Podcast

What’s a learning rate? Why might you want to schedule it? How does the LR scheduler API in PyTorch work? What the heck is up with the formula implementation? Why is everything terrible?
P
PyTorch Developer Podcast

What are they good for? (Caches. Private fields.) C++ side support, how it’s implemented / release resources. Python side support, how it’s implemented. Weak ref tensor hazard due to resurrection. Downsides of weak references in C++. Scott Wolchok’s release resources optimization. Other episodes to listen to first: https://pytorch-dev-podcast.simplecast.com/episodes/reference-counting https://pytorch-dev-podcast.simplecast.com/episodes/pyobject-preservation…
P
PyTorch Developer Podcast

Mike Ruberry has an RFC about stride-agnostic operator semantics ( https://github.com/pytorch/pytorch/issues/78050 ), so let's talk about strides. What are they? How are they used to implement views and memory format? How do you handle them properly when writing kernels? In what sense are strides overspecified, and therefore, not worth slavishly reimplementing in a system like PrimTorch? What does Edward think we should do about them? My blog post that covers strides along with other topics can be found at http://blog.ezyang.com/2019/05/pytorch-internals/…
P
PyTorch Developer Podcast

AOTAutograd is a cool new feature in functorch for capturing both forward and backward traces of PyTorch operators, letting you run them through a compiler and then drop the compiled kernels back into a normal PyTorch eager program. Today, Horace joins me to tell me how it works, what it is good to use for, and what our future plans for it are.…
Sherlock recently joined the PyTorch team, having previously worked on ONNX Runtime at Microsoft, and Sherlock’s going to ask me some questions about the dispatcher, and I’m going to answer them. We talked about the history of the dispatcher, how to override dispatching order, multiple dispatch, how to organize various dispatch keys and torch function mode. The companion video is at https://youtu.be/_qB2Ho1O3u4…
P
PyTorch Developer Podcast

PyTorch recently moved all of its CI from CircleCI to GitHub Actions. There were a lot of improvements in the process, making my old podcast about CI obsolete! Today, Eli Uriegas joins me to talk about why we moved to GitHub Actions, how the new CI system is put together, and what some cool features about our new CI.…
P
PyTorch Developer Podcast

C++ has exceptions, Python has exceptions. But they’re not the same thing! How do exceptions work in CPython, how do we translate exceptions from C++ to Python (hint: it’s different for direct bindings versus pybind11), and what do warnings (which we also translate from C++ to Python) have in common with this infrastructure?…
P
PyTorch Developer Podcast

PyTorch’s torch API is the Python API everyone knows and loves, but there’s also another API, the ATen API, which most of PyTorch’s internal subsystems are built on. How to tell them apart? What implications do these have on our graph mode IR design? Also, a plug for PrimTorch, a new set of operators, not designed for eager mode, that is supposed to be even lower level than ATen.…
P
PyTorch Developer Podcast

PyTorch is in the business of shipping numerical software that can run fast on your CUDA-enabled NVIDIA GPU, but it turns out there is a lot of heterogeneity in NVIDIA’s physical GPU offering and when it comes to what is fast and what is slow, what specific GPU you have on hand matters quite a bit. Yet there are literally hundreds of distinct NVIDIA GPU models on the market, how do you make sense of the madness? Today, Natalia Gimelshein joins me to talk about everything that’s going on in the NVIDIA GPU market, and what, as a framework developer, you have to care about to make sense of it all. Further reading. NVIDIA microarchitectures on Wikipedia https://en.wikipedia.org/wiki/Category:Nvidia_microarchitectures A slightly old post about matching SM to architecture https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/…
A lot of recent work going in PyTorch is all about adding new and interesting Tensor subclasses, and this all leads up to the question of, what exactly is OK to make a tensor subclass? One answer to this question comes from an old principle from Barbara Liskov called the Liskov substitution principle, which informally can be stated as S is a subtype of T if anywhere you have T, it can be replaced with S without altering "desirable" properties of this program. In this podcast I'll talk about LSP and how it relates to the design of Tensor subclasses and a hypothetical "abstract Tensor specification" which really doesn't exist but which sort of implicitly exists in the corpus of existing PyTorch programs. Further reading: This is a cool interview with Barbara Liskov that I quote in the podcast https://www.youtube.com/watch?v=-Z-17h3jG0A Max Balandat talking about linear operators in PyTorch https://github.com/pytorch/pytorch/issues/28341 At the end I talk a little bit about multiple dispatch; an earlier discussion about this topic is in this podcast https://pytorch-dev-podcast.simplecast.com/episodes/multiple-dispatch-in-torch-function…
P
PyTorch Developer Podcast

In this episode I talk about reduced precision floating point formats float16 (aka half precision) and bfloat16. I'll discuss what floating point numbers are, how these two formats vary, and some of the practical considerations that arise when you are working with numeric code in PyTorch that also needs to work in reduced precision. Did you know that we do all CUDA computations in float32, even if the source tensors are stored as float16? Now you know! Further reading. The Wikipedia article on IEEE floating point is pretty great https://en.wikipedia.org/wiki/IEEE_754 How bfloat16 works out when doing training https://arxiv.org/abs/1905.12322 Definition of acc_type in PyTorch https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h…
Welkom op Player FM!
Player FM scant het web op podcasts van hoge kwaliteit waarvan u nu kunt genieten. Het is de beste podcast-app en werkt op Android, iPhone en internet. Aanmelden om abonnementen op verschillende apparaten te synchroniseren.